Enhancing Credit Risk Management with Machine Learning: A Comparative Study of Predictive Models for Credit Default Prediction

Quoc Giang Nguyen; Linh Hoang Nguyen; Md Monir Hosen; Mohammad Rasel; Jannatul Ferdous Shorna; Md Sakib Mia; Sajidul Islam Khan

doi:10.37547/tajas/Volume07Issue01-04

Authors

Quoc Giang Nguyen
IEEE Professional Community, IEEE, USA
Linh Hoang Nguyen
FPT Americas, USA
Md Monir Hosen
MS in Business Analytics, St.Francis college, USA
Mohammad Rasel
Masters in Business Analytics, International American University, LA, California, USA
Jannatul Ferdous Shorna
College of Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida
Md Sakib Mia
MSc in Business Analytics, Trine University, USA
Sajidul Islam Khan
MSc in Business Analytics, Trine University, USA

DOI:

https://doi.org/10.37547/tajas/Volume07Issue01-04

Keywords:

machine learning credit risk management predictive analytics

Abstract

This study investigates the application of machine learning algorithms for predictive analytics in credit risk management, aiming to enhance the accuracy of predicting credit defaults. The research compares multiple machine learning models, including logistic regression, decision trees, random forests, gradient boosting, XGBoost, and LightGBM, using a real-world credit risk dataset. The study focuses on evaluating the models' performance based on metrics such as accuracy, precision, recall, and F1-score. The results show that ensemble models, particularly XGBoost and LightGBM, outperform traditional algorithms in terms of predictive accuracy and computational efficiency, demonstrating their ability to effectively handle complex datasets. The comparative analysis highlights the strengths and weaknesses of each model, providing insights into the trade-offs between interpretability and predictive power. XGBoost and LightGBM are found to be highly effective for credit risk prediction, though challenges such as model interpretability and overfitting remain. The findings suggest that machine learning offers a promising approach for improving credit risk management, with implications for the financial industry to make more informed, data-driven lending decisions. The study underscores the importance of addressing interpretability concerns and data quality issues in real-world applications, paving the way for future advancements in machine learning for credit risk prediction.

The American Journal of Applied Sciences

17

https://www.theamericanjournals.com/index.php/tajas

TYPE

Original Research

PAGE NO.

21-30

DOI

10.37547/tajas/Volume07Issue01-04

OPEN ACCESS

SUBMITED

25 October 2024

ACCEPTED

25 December 2024

PUBLISHED

30 January 2025

VOLUME

Vol.07 Issue01 2025

CITATION

Quoc Giang Nguyen, Linh Hoang Nguyen, Md Monir Hosen, Mohammad
Rasel, Jannatul Ferdous Shorna, Md Sakib Mia, & Sajidul Islam Khan.
(2025). Enhancing Credit Risk Management with Machine Learning: A
Comparative Study of Predictive Models for Credit Default Prediction.
The American Journal of Applied Sciences, 7(01), 21

–

30.

https://doi.org/10.37547/tajas/Volume07Issue01-04

COPYRIGHT

© 2025 Original content from this work may be used under the
terms of the creative commons attributes 4.0 License.

Enhancing Credit Risk
Management with
Machine Learning: A
Comparative Study of
Predictive Models for
Credit Default Prediction

Quoc Giang Nguyen

1

, Linh Hoang Nguyen

2

, Md

Monir Hosen

3

, Mohammad Rasel

4

, Jannatul

Ferdous Shorna

5

, Md Sakib Mia

6

, Sajidul Islam

Khan

7

1

IEEE Professional Community, IEEE, USA

2

FPT Americas, USA

3

MS in Business Analytics, St.Francis college, USA

4

Masters in Business Analytics, International American University,

LA, California, USA

5

College of Engineering and Computer Science, Florida Atlantic

University, Boca Raton, Florida

6

MSc in Business Analytics, Trine University, USA

7

MSc in Business Analytics, Trine University, USA

Abstract:

This study investigates the application of

machine learning algorithms for predictive analytics
in credit risk management, aiming to enhance the
accuracy of predicting credit defaults. The research
compares multiple machine learning models,
including logistic regression, decision trees, random
forests, gradient boosting, XGBoost, and LightGBM,
using a real-world credit risk dataset. The study
focuses on evaluating the models' performance
based on metrics such as accuracy, precision, recall,
and F1-score. The results show that ensemble
models, particularly XGBoost and LightGBM,
outperform traditional algorithms in terms of
predictive accuracy and computational efficiency,
demonstrating their ability to effectively handle
complex datasets. The comparative analysis
highlights the strengths and weaknesses of each
model, providing insights into the trade-offs
between interpretability and predictive power.
XGBoost and LightGBM are found to be highly
effective for credit risk prediction, though

The American Journal of Applied Sciences

22

https://www.theamericanjournals.com/index.php/tajas

The American Journal of Applied Sciences

challenges such as model interpretability and
overfitting remain. The findings suggest that machine
learning offers a promising approach for improving
credit risk management, with implications for the
financial industry to make more informed, data-driven
lending decisions. The study underscores the
importance of addressing interpretability concerns and
data quality issues in real-world applications, paving
the way for future advancements in machine learning
for credit risk prediction.

Keywords

: machine learning, credit risk management,

predictive analytics, XGBoost, LightGBM, decision
trees, logistic regression, model evaluation, accuracy,
predictive power, data preprocessing, feature
selection, overfitting, interpretability.

Introduction:

In recent years, credit risk management

has become an essential aspect of financial institutions
as they strive to mitigate the risks associated with
lending. Traditional methods of assessing credit risk
primarily rely on expert knowledge and historical
financial data. However, these methods are often
insufficient in handling complex and large datasets.
With the rapid advancement of machine learning
techniques, financial institutions are now leveraging
these technologies to improve the accuracy and
efficiency of credit risk assessments. Predictive
analytics, powered by machine learning, can identify
potential credit defaults more effectively by analyzing
large volumes of structured and unstructured data,
providing deeper insights into borrower behavior, and
detecting patterns that may not be evident through
conventional methods.

Machine learning algorithms such as logistic regression,
decision trees, random forests, gradient boosting,
support vector machines, XGBoost, and LightGBM have
shown remarkable success in various domains,
including credit risk modeling. These algorithms can
learn from historical data, automatically adapt to new
patterns, and make data-driven decisions. As a result,
they are increasingly used in predictive modeling to
forecast the likelihood of default, thereby enabling
financial institutions to make informed decisions
regarding loan approvals and risk management
strategies.

This study aims to explore the application of machine
learning algorithms in credit risk prediction and provide
a comparative analysis of their performance. We
evaluate several popular algorithms based on key
performance metrics such as accuracy, precision, recall,
F1-score, and AUC-ROC. The goal is to determine which
algorithm provides the best balance between accuracy
and interpretability for practical use in credit risk
management.

LITERATURE REVIEW

The application of machine learning in credit risk
management has been a subject of growing interest
over the past few decades. Early research focused on
the traditional statistical methods such as logistic
regression (Altman, 1968) and discriminant analysis
(Ohlson, 1980), which laid the foundation for the field
of credit scoring. These models used a limited set of
financial ratios and historical data to predict the
likelihood of default. However, these models often
struggled to capture complex relationships between
variables and faced challenges in handling large and
unstructured datasets (Zhao, 2018).

With the advent of machine learning, the landscape of
credit risk management began to shift. Machine
learning models, such as decision trees (Breiman, 1986)
and random forests (Breiman, 2001), provided a more
flexible and scalable alternative. Decision trees
modelled data through a hierarchical structure, where
each node represents a decision based on a feature,
and the branches represent possible outcomes.
Random forests, an ensemble method, combined
multiple decision trees to improve accuracy and reduce
overfitting. These models quickly gained popularity in
credit risk modeling due to their ability to handle large
datasets and capture non-linear relationships between
variables.

Gradient boosting, another ensemble technique, was
introduced to further improve predictive performance.
It builds a series of weak learners, where each model
corrects the errors of the previous one, allowing for
high levels of accuracy and robustness (Friedman,
2001). This technique, implemented in models like
XGBoost (Chen & Guestrin, 2016) and LightGBM (Ke et
al., 2017), has become one of the most effective
approaches for credit risk prediction. XGBoost, in
particular, is known for its speed, scalability, and ability
to handle missing data and imbalanced classes, making
it ideal for financial applications.

A number of studies have demonstrated the
effectiveness of machine learning algorithms in credit
risk prediction. For instance, Gangan et al. (2020) used
XGBoost for predicting credit default risk and found
that it outperformed traditional statistical methods in
terms of accuracy and F1-score. Similarly, Liao et al.
(2018) employed LightGBM for credit scoring and
reported superior performance compared to other
machine learning algorithms, particularly in terms of
speed and accuracy in large datasets.

However, while machine learning models have shown
promise, challenges remain in their adoption in real-
world credit risk applications. Interpretability and
transparency of machine learning models are crucial in

The American Journal of Applied Sciences

23

https://www.theamericanjournals.com/index.php/tajas

The American Journal of Applied Sciences

financial institutions, as regulators and stakeholders
require explanations for the model's decisions
(Caruana et al., 2015). Techniques such as SHAP
(SHapley Additive exPlanations) and LIME (Local
Interpretable Model-agnostic Explanations) have been
developed to address this issue and provide
explanations for complex models.

Despite these advancements, the integration of
machine learning models into credit risk management
is not without its limitations. One challenge is the
potential for overfitting, particularly when using highly
complex models such as deep learning (Bengio et al.,
2013). To mitigate this risk, regularization techniques
and careful model selection are essential. Additionally,
data quality and the handling of missing or incomplete
information remain significant challenges for machine
learning models in financial applications.

In conclusion, while traditional credit risk models have
provided a foundation for financial decision-making,
machine

learning

algorithms

offer

significant

advantages in terms of accuracy, scalability, and the
ability to handle large, complex datasets. Recent
studies have shown that algorithms such as XGBoost
and LightGBM outperform traditional models, making
them promising candidates for future credit risk
modeling.

However,

challenges

interpretability, overfitting, and data quality must be

addressed to ensure the successful implementation of
machine learning in credit risk management.

METHODOLOGY

Data Collection

The dataset for this study was carefully curated from
multiple reliable sources to ensure the inclusion of
diverse attributes relevant to credit risk assessment.
Primary data was obtained from publicly available
financial repositories and anonymized datasets shared
by financial institutions. These datasets included
detailed information on customer demographics,
financial behavior, and credit history, which are crucial
for predicting credit risk. The data spanned a wide
range of loan products, such as personal loans, home
loans, and credit cards, to provide a comprehensive
understanding of credit risk across different financial
contexts.

In total, the dataset contained 10,000 records with
both numerical and categorical variables. Each record
represented

a

unique

customer

and

their

corresponding financial attributes. The dataset was
subjected to an initial exploratory data analysis (EDA)
to understand its structure and distribution, identifying
patterns, anomalies, and potential data quality issues.

Below is the table summarizing the dataset attributes:

Attribute

Description

Type

Example

Customer_ID

Unique identifier for each customer

Categorical C001, C002

Age

Age of the customer

Numerical

35, 42

Gender

Gender of the customer

Categorical Male, Female

Income

Annual income of the customer

Numerical

45,000, 65,000

Credit_History_Length Duration of credit history (in years)

Numerical

5, 10

Credit_Utilization

Percentage of credit limit used

Numerical

40%, 75%

Debt_to_Income_Ratio Ratio of total debt to annual income

Numerical

0.3, 0.5

Repayment_Status

Status of repayments (on-time, late, defaulted)

Categorical On-time, Defaulted

Loan_Amount

Amount of the loan or credit issued

Numerical

20,000, 50,000

Loan_Purpose

Purpose of the loan

Categorical Home, Education

Default_Status

Whether the customer defaulted (Target Variable) Categorical Yes, No

Data Processing

The data processing phase was a crucial step to ensure
the quality, consistency, and usability of the dataset for
building machine learning models. The raw dataset,
while comprehensive, contained several imperfections,
including missing values, outliers, inconsistent formats,
and class imbalance issues. Each of these challenges
was addressed systematically to prepare the data for
analysis and modeling.

The first step involved handling missing values, which
were prevalent in both numerical and categorical
attributes. Missing data can lead to biased outcomes if
not managed appropriately. For numerical features,

such as Income and Credit_History_Length, the mean
of the respective column was used for imputation. This
approach preserved the central tendency of the data
without introducing significant bias. For categorical
attributes, such as Gender and Repayment_Status, the
mode of each column was utilized to fill in missing
values, as it represented the most frequent category
and maintained the categorical distribution.

Outlier detection and treatment formed the next
critical stage of data processing. Extreme values,
particularly in attributes like Loan_Amount and
Debt_to_Income_Ratio, were identified using the
interquartile range (IQR) method. These values were
visualized through box plots to confirm their deviation

The American Journal of Applied Sciences

24

https://www.theamericanjournals.com/index.php/tajas

The American Journal of Applied Sciences

from normal distributions. Rather than discarding
outliers outright, a capping strategy was employed,
where values beyond the 1st and 99th percentiles were
adjusted to lie within these limits. This ensured that
significant variations in the data were preserved while
reducing the impact of extreme values that could
distort model performance.

Encoding categorical variables into numerical
representations was another essential task. The
dataset contained categorical attributes, such as
Gender, Repayment_Status, and Loan_Purpose, which
required transformation for compatibility with
machine learning algorithms. Binary attributes, like
Gender, were encoded into numerical values (e.g., 0 for
Male and 1 for Female). For multi-class variables, such
as Loan_Purpose, one-hot encoding was applied to
create separate binary columns for each category,
effectively capturing the categorical information in a
numerical format.

To ensure uniformity in data representation, numerical
attributes were scaled to a standard range. This step
addressed the issue of varying scales among features,
such as Income and Credit_Utilization. Min-Max scaling
was used to normalize these attributes, transforming
them to a common range between 0 and 1. Scaling
prevented

larger

numerical

ranges

from

disproportionately influencing the performance of
distance-based algorithms like Support Vector
Machines and Gradient Boosting.

Another critical challenge was the class imbalance in
the target variable, Default_Status, which is a common
issue in credit risk datasets. The dataset exhibited a
skewed distribution, with significantly more instances
of non-defaults compared to defaults. To address this
imbalance, the Synthetic Minority Oversampling
Technique (SMOTE) was employed. SMOTE generated
synthetic samples for the minority class by
interpolating between existing samples, effectively
balancing the class distribution and enhancing the
model's ability to detect credit defaults.

Finally, the preprocessed dataset was split into training
and testing subsets. A standard 80:20 split was
employed, with the larger portion designated for
training the machine learning models. This ensured
that the models could learn from a comprehensive
dataset while leaving a representative subset for
unbiased evaluation. Care was taken to apply
consistent preprocessing steps to both training and
testing datasets, preserving the integrity of the
evaluation process.

Through these detailed processing steps, the dataset
was transformed into a structured and clean format,
ready for feature selection, engineering, and model

development. This meticulous approach ensured that
the subsequent analyses and predictions were built on
a solid foundation of reliable data.

Feature Selection

Feature selection is a critical step in the machine
learning pipeline, as it identifies the most relevant
attributes from the dataset that contribute significantly
to the predictive power of the model. By selecting the
most impactful features, the process reduces
dimensionality, mitigates overfitting, and enhances the
model's interpretability. In this study, feature selection
was performed using a combination of statistical
methods, domain knowledge, and algorithmic
approaches.

Initially, correlation analysis was conducted to measure
the linear relationships between numerical features
and the target variable, Default_Status. Features with a
high correlation coefficient (either positive or negative)
were prioritized for inclusion in the model. Heatmaps
were generated to visualize these correlations, helping
to identify potential redundancies among predictors.
Attributes

like

Debt_to_Income_Ratio

and

Credit_Utilization showed strong correlations with
credit default likelihood, warranting their inclusion.

For categorical variables, Chi-square tests were applied
to assess their statistical dependence on the target
variable. Variables with significant p-values were
considered relevant. Additionally, domain knowledge
was incorporated to ensure that features with practical
importance,

such

as

Loan_Purpose

and

Repayment_Status, were not excluded based solely on
statistical metrics.

Recursive Feature Elimination (RFE) was employed as
an advanced feature selection technique. Using
machine learning algorithms, such as Random Forest
and Gradient Boosting, RFE iteratively removed less
important features, retaining only those that
contributed the most to model accuracy. This
automated method ensured that the feature selection
process was robust, and data driven.

Feature Engineering

Feature engineering further refined the dataset by
creating new features and transforming existing ones
to capture more meaningful patterns and relationships.
This process aimed to improve the model's ability to
distinguish between defaults and non-defaults by
enhancing the informativeness of the predictors.

One of the first steps involved creating interaction
terms between features that exhibited strong
correlations. For instance, the interaction between
Debt_to_Income_Ratio and Credit_Utilization was
explored, as these attributes together could provide

The American Journal of Applied Sciences

25

https://www.theamericanjournals.com/index.php/tajas

The American Journal of Applied Sciences

deeper insights into a customer's financial behavior.
Polynomial features were also introduced for key
numerical

variables,

such

as

Income

and

Credit_History_Length,

to

capture

non-linear

relationships.

Normalization and scaling techniques were applied to
the engineered features to maintain consistency across
the dataset. Continuous variables, including newly
created features, were transformed using logarithmic
scaling to reduce skewness and emphasize relative
differences.

Binning techniques were used to group numerical
attributes into categorical ranges. For example, Age
was divided into brackets (e.g., young, middle-aged,
senior) to simplify its relationship with credit risk.
Similarly, Loan_Amount was categorized into small,
medium, and large loans to highlight patterns specific
to different loan sizes.

Categorical features were further enriched through
one-hot encoding, while ordinal encoding was applied
to variables with an inherent order, such as
Credit_History_Length. Feature engineering also
included deriving composite variables, such as
Credit_Utilization_to_Income_Ratio,

which

encapsulated financial stress in a single metric.

Model Development

The model development phase involved selecting,
training, and fine-tuning multiple machine learning
algorithms to predict credit risk effectively. A range of
supervised learning techniques was considered,
including logistic regression, decision trees, random
forests, gradient boosting (XGBoost, LightGBM), and
support vector machines (SVM). Each model was
chosen for its unique strengths in handling structured
datasets and addressing imbalanced classes.

Before training, hyperparameter tuning was conducted
using grid search and random search methods. For
instance, parameters such as the learning rate,
maximum tree depth, and number of estimators were
optimized for boosting algorithms, while regularization
terms were adjusted for logistic regression. The
optimization process aimed to strike a balance
between model complexity and generalizability.

Cross-validation was employed to evaluate model
stability and prevent overfitting. A stratified k-fold
approach was chosen, ensuring that each fold retained
the same class proportions as the original dataset. This
technique provided a robust assessment of model
performance across different subsets of data.

Ensemble methods, such as stacking, were also
explored to combine the predictive power of multiple
algorithms. By leveraging the strengths of diverse

models, the ensemble approach enhanced accuracy
and robustness. Each model's predictions were
weighted according to its performance, and a meta-
model was trained to aggregate these outputs for final
predictions.

Model Evaluation

Model evaluation focused on assessing the
performance of each algorithm using a comprehensive
set of metrics tailored to the problem of credit risk
prediction. Since the dataset was imbalanced, accuracy
alone was insufficient to gauge model effectiveness.
Metrics such as precision, recall, F1-score, and area
under the receiver operating characteristic curve (AUC-
ROC) were prioritized.

The confusion matrix provided detailed insights into
the distribution of true positives, true negatives, false
positives, and false negatives. This allowed for a
thorough understanding of how well the model
differentiated between default and non-default cases.
Special emphasis was placed on minimizing false
negatives, as failing to identify a defaulter poses a
significant risk to financial institutions.

The AUC-ROC curve was used to compare the
discriminative power of the models across different
thresholds. A higher AUC value indicated a model's
superior ability to distinguish between the two classes.
Additionally, the precision-recall (PR) curve was
analyzed to assess the trade-off between precision and
recall, particularly for the minority class.

The evaluation also included testing the models on
unseen data to validate their generalizability. This step
simulated real-world scenarios, ensuring that the
selected model could perform consistently in practical
applications.

After rigorous evaluation, the best-performing model
was selected based on its balance of precision, recall,
and overall robustness. This model was then deployed
for credit risk prediction, offering a reliable tool for
identifying high-risk customers.

Results

The results of this study are presented in detail,
including an overall performance summary of the
machine learning models, a comparative analysis of
their effectiveness, and a discussion of which model
demonstrated the best predictive capabilities for credit
risk management.

Overall Results

The performance of each model was evaluated using a
range of metrics, including accuracy, precision, recall,
F1-score, and the Area Under the Receiver Operating
Characteristic Curve (AUC-ROC). These metrics
provided a comprehensive assessment of the models'

The American Journal of Applied Sciences

26

https://www.theamericanjournals.com/index.php/tajas

The American Journal of Applied Sciences

ability to predict credit defaults accurately while
minimizing false positives and negatives. Table 1

summarizes the performance metrics for all the tested
models.

Table 1: Performance Metrics of Machine Learning Models

Model

Accuracy Precision Recall F1-Score AUC-ROC

Logistic Regression

83.2%

78.5%

76.4% 77.4%

0.85

Decision Tree

81.7%

76.2%

74.8% 75.5%

0.82

Random Forest

89.5%

84.3%

86.7% 85.5%

0.92

Gradient Boosting

91.3%

88.5%

87.8% 88.1%

0.94

Support Vector Machine 84.9%

79.7%

78.4% 79.0%

0.86

XGBoost

92.4%

89.6%

89.0% 89.3%

0.95

LightGBM

93.1%

90.2%

90.1% 90.1%

0.96

Chart 1: Model Evaluation of Different machine learning algorithm

83.20%

81.70%

89.50%

91.30%

84.90%

92.40%

93.10%

78.50%

76.20%

84.30%

88.50%

79.70%

89.60%

90.20%

76.40%

74.80%

86.70%

87.80%

78.40%

89.00%

90.10%

77.40%

75.50%

85.50%

88.10%

79.00%

89.30%

90.10%

0.85

0.82

0.92

0.94

0.86

0.95

0.96

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

Logistic Regression

Decision Tree

Random Forest

Gradient Boosting

Support Vector Machine

XGBoost

LightGBM

Model Evaluation

AUC-ROC

F1-Score

Recall

Precision

Accuracy

The American Journal of Applied Sciences

27

https://www.theamericanjournals.com/index.php/tajas

The American Journal of Applied Sciences

Comparative Study

In the chart 1 comparative analysis reveals distinct
strengths and weaknesses across the evaluated
models:

1.

Logistic Regression:

Logistic Regression served as a baseline model,
providing a foundation for comparing other algorithms.
It achieved an accuracy of 83.2% and an AUC-ROC of
0.85, indicating decent performance for a linear model.
Its primary advantage lies in simplicity and
interpretability, making it suitable for quick
implementation. However, its limited capacity to
capture non-linear relationships in the dataset
hindered its predictive power compared to more
advanced methods.

2.

Decision Tree:

The Decision Tree model demonstrated slightly lower
performance, with an accuracy of 81.7% and an AUC-
ROC of 0.82. While it offered high interpretability and
ease of implementation, its tendency to overfit the
training data reduced its generalization capabilities.
Pruning techniques and hyperparameter tuning can
mitigate overfitting, but the model remained less
competitive overall.

3.

Random Forest:

Random Forest improved the results significantly,
achieving an accuracy of 89.5% and an AUC-ROC of
0.92. By combining multiple decision trees through
bagging, the model reduced overfitting and enhanced
robustness. This ensemble method effectively captured
complex patterns in the data, making it a reliable choice
for credit risk prediction.

4.

Gradient Boosting:

Gradient Boosting outperformed Random Forest with
an accuracy of 91.3% and an AUC-ROC of 0.94. Its
iterative optimization approach, which builds weak
learners sequentially to minimize errors, allowed it to
model intricate relationships in the dataset. While
computationally more intensive, Gradient Boosting
demonstrated superior predictive capabilities, making
it highly suitable for this domain.

5.

Support Vector Machine (SVM):

The SVM model performed reasonably well, achieving
an accuracy of 84.9% and an AUC-ROC of 0.86. Its ability
to find optimal decision boundaries using kernel
functions contributed to its performance. However, its
sensitivity to hyperparameter selection and higher
computational cost for large datasets limited its
applicability in practical scenarios.

6.

XGBoost:

XGBoost emerged as one of the top-performing
models, with an accuracy of 92.4% and an AUC-ROC of
0.95. Its advanced gradient boosting mechanism,
combined with effective handling of missing data and
regularization techniques, made it highly effective for
credit risk prediction. Its capacity to mitigate class
imbalance further enhanced its performance.

7.

LightGBM:

LightGBM delivered the best overall results, achieving
the highest accuracy of 93.1% and an AUC-ROC of 0.96.
Its speed, efficiency, and ability to handle large datasets
and categorical features contributed to its exceptional
performance. Additionally, its leaf-wise tree growth
strategy allowed it to optimize resource allocation and
model complex relationships effectively.

The comparative analysis clearly indicates that
ensemble methods, particularly LightGBM and
XGBoost, outperformed traditional models such as
Logistic Regression and Decision Trees. LightGBM's
ability to handle both categorical and numerical data
efficiently, combined with its gradient-based learning
approach, positioned it as the best model for this
application.

Gradient Boosting and Random Forest also showed
strong results, demonstrating the effectiveness of
ensemble techniques in capturing complex patterns.
On the other hand, SVM and Logistic Regression, while
useful, were less competitive due to their limitations in
scalability and handling imbalanced data.

Overall, LightGBM proved to be the most effective
model for credit risk prediction in this study, delivering
the highest accuracy and AUC-ROC values. Its
performance highlights the importance of leveraging
advanced ensemble techniques to address the
challenges of credit risk management, including class
imbalance, large feature spaces, and intricate data
patterns.

The results underscore the need for financial
institutions to adopt state-of-the-art machine learning
models like LightGBM to improve decision-making,
minimize risks, and enhance operational efficiency in
credit risk assessment. Future work can explore
integrating these models with real-time decision
systems to provide dynamic and adaptive risk
evaluations.

CONCLUSION

In this study, we explored the application of machine
learning algorithms for predictive analytics in credit risk
management. The primary aim was to evaluate and
compare the performance of various machine learning
models, including logistic regression, decision trees,
random forests, gradient boosting, XGBoost, and

The American Journal of Applied Sciences

28

https://www.theamericanjournals.com/index.php/tajas

The American Journal of Applied Sciences

LightGBM, in predicting credit defaults. By utilizing a
real-world dataset, we applied a comprehensive
methodology

encompassing

data

collection,

preprocessing, feature selection, feature engineering,
model development, and evaluation.

The results demonstrated that machine learning
algorithms

significantly

outperform

traditional

methods in terms of accuracy, precision, recall, and F1-
score. Among the models tested, XGBoost and
LightGBM showed superior performance, providing
highly accurate predictions while maintaining
computational efficiency. These models' ability to
handle large, complex datasets and capture intricate
patterns within the data positions them as ideal
candidates for deployment in real-world credit risk
management systems.

Despite their promising results, challenges such as
model interpretability and overfitting must be
addressed to ensure their practical applicability.
Techniques such as SHAP and LIME can offer valuable
insights into model decisions, increasing transparency
and trust among stakeholders. Additionally, issues
related to data quality, such as missing values and
outliers, require careful attention during the data
preprocessing phase to avoid model degradation.

DISCUSSION

The findings of this study reinforce the growing
importance of machine learning in the field of credit
risk management. Traditional credit scoring models,
such as logistic regression, have served as the backbone
of financial institutions' credit risk assessments for
decades. However, these models struggle to adapt to
the increasing complexity and volume of data
generated in the modern financial landscape. Machine
learning models, on the other hand, offer significant
advantages in terms of scalability, adaptability, and
predictive power.

Among the machine learning algorithms evaluated,
XGBoost and LightGBM consistently outperformed the
others in terms of accuracy, precision, and recall. This
is consistent with recent literature, which highlights the
superiority of gradient boosting algorithms in credit
scoring tasks (Gangan et al., 2020; Liao et al., 2018).
These models' ability to reduce bias and variance
through ensemble methods makes them particularly
well-suited for handling imbalanced datasets, which is
often the case in credit risk prediction where defaulters
represent a small proportion of the total population.

The comparative study also revealed that decision trees
and random forests, while effective, did not match the
performance of XGBoost and LightGBM in terms of
computational efficiency and predictive accuracy.
These models, however, remain valuable due to their

simplicity and interpretability, which are important in
regulatory environments where financial institutions
must justify their decisions. Logistic regression, while
historically popular, was found to be less effective in
capturing the complex relationships in the data and
performed poorly compared to more advanced
machine learning models.

While machine learning models offer substantial
improvements in predictive accuracy, challenges
related to interpretability and overfitting persist.
XGBoost and LightGBM, while effective in prediction,
are considered "black-box" models, meaning that
understanding why a model made a particular decision
can be difficult. This is a crucial concern in the financial
industry, where regulators and stakeholders require
transparency and the ability to explain model
outcomes. Techniques such as SHAP and LIME are
emerging as valuable tools to provide explanations for
complex machine learning models and offer insights
into the key features driving predictions.

Overfitting is another concern, particularly with
complex models like gradient boosting, which can lead
to overly optimistic results during training but perform
poorly on unseen data. To address this, regularization
techniques, such as early stopping, pruning, and cross-
validation, can help prevent overfitting and improve
generalization.Data quality also plays a significant role
in the performance of machine learning models.
Missing data, outliers, and noise can degrade model
performance, emphasizing the importance of thorough
data preprocessing. Techniques such as imputation,
normalization, and outlier detection are critical to
ensure that the data fed into the model is clean and
representative of real-world scenarios.

In conclusion, machine learning represents a
transformative approach to credit risk management.
The ability to analyze large datasets and identify
patterns that traditional models may overlook enables
financial institutions to make more accurate and
informed lending decisions. However, further research
is needed to improve model interpretability, address
overfitting,

and

optimize

data

preprocessing

techniques to ensure the successful implementation of
machine learning in credit risk management. By
overcoming these challenges, machine learning can
significantly enhance the ability of financial institutions
to predict credit defaults and manage risk effectively,
contributing to the overall stability of the financial
system.

Future Directions

Future research could explore the integration of deep
learning models, such as neural networks, into credit
risk prediction. These models have the potential to

The American Journal of Applied Sciences

29

https://www.theamericanjournals.com/index.php/tajas

The American Journal of Applied Sciences

capture even more complex relationships in data, but
they also come with challenges related to
interpretability

and

training

time.

Moreover,

combining machine learning techniques with domain
expertise could help develop hybrid models that offer
both predictive accuracy and transparency.

Another promising direction is the use of alternative
data sources, such as social media activity, transaction
history, and customer behavior data, to further
enhance credit risk prediction. With the increasing
availability of big data, machine learning models could
benefit from incorporating these unstructured data
sources to gain a more comprehensive understanding
of borrower behavior and risk.

In summary, while the use of machine learning in credit
risk management has made significant strides, there
are still opportunities for further refinement and
innovation. Continued research and development in
this area will be key to unlocking the full potential of
machine learning for financial institutions and ensuring
that these models are both effective and trustworthy.

Acknowledgement:

All the Author Contributed Equally.

REFERENCE

Altman, E. I. (1968). Financial ratios, discriminant
analysis, and the prediction of corporate bankruptcy.
The

Journal

of

Finance,

23(4),

589-609.

https://doi.org/10.1111/j.1540-6261.1968.tb00843.x

Bengio, Y., Courville, A., & Vincent, P. (2013). Learning
deep architectures for AI. Foundations and Trends in
Machine

Learning,

2(1),

1-127.

https://doi.org/10.1561/2200000006

Breiman, L. (1986). Bagging predictors. Machine
Learning,

24(2),

123-140.

https://doi.org/10.1007/BF00116837

Breiman, L. (2001). Random forests. Machine Learning,
45(1),

5-32.

https://doi.org/10.1023/A:1010933404324

Caruana, R., Gehrke, J., Koch, P., & Sturm, M. (2015).
The importance of model interpretability in credit
scoring. Proceedings of the 2015 IEEE International
Conference

on

Data

Mining,

567-576.

https://doi.org/10.1109/ICDM.2015.61

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree
boosting system. Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and
Data

Mining,

785-794.

https://doi.org/10.1145/2939672.2939785

Friedman, J. H. (2001). Greedy function approximation:
A gradient boosting machine. Annals of Statistics, 29(5),
1189-1232. https://doi.org/10.1214/aos/1013203451

Gangan, A., Bhattacharyya, D., & Gupta, P. (2020).

Credit scoring using XGBoost: A comparison of machine
learning

approaches.

International

Journal of

Computer

Applications,

175(13),

1-6.

https://doi.org/10.5120/ijca2020919469

Ke, G., Meng, Q., & Finley, T. (2017). LightGBM: A highly
efficient gradient boosting decision tree. Proceedings
of the 31st International Conference on Neural
Information

Processing

Systems,

3146-3154.

https://doi.org/10.5555/3295222.3295268

Liao, S. H., & Lu, C. C. (2018). Predicting credit scoring
using LightGBM: An empirical study. Sustainable
Computing: Informatics and Systems, 19, 1-7.
https://doi.org/10.1016/j.suscom.2017.11.003

Ohlson, J. A. (1980). Financial ratios and the
probabilistic prediction of bankruptcy. Journal of
Accounting

Research,

18(1),

109-131.

https://doi.org/10.2307/2490395

Zhao, Z. (2018). An analysis of credit risk prediction
using machine learning. Journal of Computer Science
and

Technology,

33(5),

987-1003.

https://doi.org/10.1007/s11390-018-1825-2

Md Jamil Ahmmed, Md Mohibur Rahman, Ashim
Chandra Das, Pritom Das, Tamanna Pervin, Sadia Afrin,
Sanjida Akter Tisha, Md Mehedi Hassan, & Nabila
Rahman. (2024). COMPARATIVE ANALYSIS OF
MACHINE LEARNING ALGORITHMS FOR BANKING
FRAUD DETECTION: A STUDY ON PERFORMANCE,
PRECISION,

AND

REAL-TIME

APPLICATION.

International Journal of Computer Science &
Information

System,

9(11),

31

–

44.

https://doi.org/10.55640/ijcsis/Volume09Issue11-04

Das, A. C., Mozumder, M. S. A., Hasan, M. A., Bhuiyan,
M., Islam, M. R., Hossain, M. N., ... & Alam, M. I. (2024).
MACHINE LEARNING APPROACHES FOR DEMAND
FORECASTING:

THE

IMPACT

OF

CUSTOMER

SATISFACTION ON PREDICTION ACCURACY. The
American Journal of Engineering and Technology,
6(10), 42-53.

Md Risalat Hossain Ontor, Asif Iqbal, Emon Ahmed,
Tanvirahmedshuvo, & Ashequr Rahman. (2024).
LEVERAGING DIGITAL TRANSFORMATION AND SOCIAL
MEDIA ANALYTICS FOR OPTIMIZING US FASHION

BRANDS’ PERFORMANCE: A MACHINE LEARNING

APPROACH. International Journal of Computer Science
&

Information

System,

9(11),

45

–

56.

https://doi.org/10.55640/ijcsis/Volume09Issue11-05

Rahman, A., Iqbal, A., Ahmed, E., & Ontor, M. R. H.
(2024). PRIVACY-PRESERVING MACHINE LEARNING:
TECHNIQUES, CHALLENGES, AND FUTURE DIRECTIONS
IN SAFEGUARDING PERSONAL DATA MANAGEMENT.
International journal of business and management
sciences, 4(12), 18-32.

The American Journal of Applied Sciences

30

https://www.theamericanjournals.com/index.php/tajas

The American Journal of Applied Sciences

Shak, M. S., Uddin, A., Rahman, M. H., Anjum, N., Al
Bony, M. N. V., Alam, M., ... & Pervin, T. (2024).
INNOVATIVE MACHINE LEARNING APPROACHES TO
FOSTER FINANCIAL INCLUSION IN MICROFINANCE.
International Interdisciplinary Business Economics
Advancement Journal, 5(11), 6-20.

Naznin, R., Sarkar, M. A. I., Asaduzzaman, M., Akter, S.,
Mou, S. N., Miah, M. R., ... & Sajal, A. (2024).
ENHANCING

SMALL

BUSINESS

MANAGEMENT

THROUGH MACHINE LEARNING: A COMPARATIVE
STUDY OF PREDICTIVE MODELS FOR CUSTOMER
RETENTION,

FINANCIAL

FORECASTING,

AND

INVENTORY

OPTIMIZATION.

International

Interdisciplinary Business Economics Advancement
Journal, 5(11), 21-32.

Bhattacharjee, B., Mou, S. N., Hossain, M. S., Rahman,
M. K., Hassan, M. M., Rahman, N., ... & Haque, M. S. U.
(2024). MACHINE LEARNING FOR COST ESTIMATION
AND FORECASTING IN BANKING: A COMPARATIVE
ANALYSIS OF ALGORITHMS. Frontline Marketing,
Management and Economics Journal, 4(12), 66-83.

Rahman, A., Iqbal, A., Ahmed, E., & Ontor, M. R. H.
(2024). PRIVACY-PRESERVING MACHINE LEARNING:
TECHNIQUES, CHALLENGES, AND FUTURE DIRECTIONS
IN SAFEGUARDING PERSONAL DATA MANAGEMENT.
Frontline Marketing, Management and Economics
Journal, 4(12), 84-106.

Al Mamun, A., Hossain, M. S., Rishad, S. S. I., Rahman,
M. M., Shakil, F., Choudhury, M. Z. M. E., ... & Sultana,
S. (2024). MACHINE LEARNING FOR STOCK MARKET
SECURITY MEASUREMENT: A COMPARATIVE ANALYSIS
OF SUPERVISED, UNSUPERVISED, AND DEEP LEARNING
MODELS. The American Journal of Engineering and
Technology, 6(11), 63-76.

Das, A. C., Rishad, S. S. I., Akter, P., Tisha, S. A., Afrin, S.,
Shakil, F., ... & Rahman, M. M. (2024). ENHANCING
BLOCKCHAIN SECURITY WITH MACHINE LEARNING: A
COMPREHENSIVE STUDY OF ALGORITHMS AND
APPLICATIONS. The American Journal of Engineering
and Technology, 6(12), 150-162.

Miah, J., Khan, R. H., Ahmed, S., & Mahmud, M. I. (2023,
June). A comparative study of detecting covid 19 by
using chest X-ray images

–

A deep learning approach. In

2023 IEEE World AI IoT Congress (AIIoT) (pp. 0311-
0316). IEEE.

Miah, J. (2024). HOW FAMILY DNA CAN CAUSE LUNG
CANCER USING MACHINE LEARNING. International
Journal of Medical Science and Public Health Research,
5(12), 8-14.

Rahman, M. M., Akhi, S. S., Hossain, S., Ayub, M. I.,
Siddique, M. T., Nath, A., ... & Hassan, M. M. (2024).

EVALUATING MACHINE LEARNING MODELS FOR
OPTIMAL CUSTOMER SEGMENTATION IN BANKING: A
COMPARATIVE STUDY. The American Journal of
Engineering and Technology, 6(12), 68-83.

Das, P., Pervin, T., Bhattacharjee, B., Karim, M. R.,
Sultana, N., Khan, M. S., ... & Kamruzzaman, F. N. U.
(2024). OPTIMIZING REAL-TIME DYNAMIC PRICING
STRATEGIES IN RETAIL AND E-COMMERCE USING
MACHINE LEARNING MODELS. The American Journal of
Engineering and Technology, 6(12), 163-177.

Hossain, M. N., Hossain, S., Nath, A., Nath, P. C., Ayub,
M. I., Hassan, M. M., ... & Rasel, M. (2024). ENHANCED
BANKING FRAUD DETECTION: A COMPARATIVE
ANALYSIS OF SUPERVISED MACHINE LEARNING
ALGORITHMS. American Research Index Library, 23-35.

Ahmmed, M. J., Rahman, M. M., Das, A. C., Das, P.,
Pervin, T., Afrin, S., ... & Rahman, N. (2024).
COMPARATIVE ANALYSIS OF MACHINE LEARNING
ALGORITHMS FOR BANKING FRAUD DETECTION: A
STUDY ON PERFORMANCE, PRECISION, AND REAL-TIME
APPLICATION. American Research Index Library, 31-44.

Al Bony, M. N. V., Das, P., Pervin, T., Shak, M. S., Akter,
S., Anjum, N., ... & Rahman, M. K. (2024).
COMPARATIVE PERFORMANCE ANALYSIS OF MACHINE
LEARNING ALGORITHMS FOR BUSINESS INTELLIGENCE:
A STUDY ON CLASSIFICATION AND REGRESSION
MODELS. Frontline Marketing, Management and
Economics Journal, 4(11), 72-92.

Das, A. C., Rishad, S. S. I., Akter, P., Tisha, S. A., Afrin, S.,
Shakil, F., ... & Rahman, M. M. (2024). ENHANCING
BLOCKCHAIN SECURITY WITH MACHINE LEARNING: A
COMPREHENSIVE STUDY OF ALGORITHMS AND
APPLICATIONS. The American Journal of Engineering
and Technology, 6(12), 150-162.

Ahmed, M. P., Das, A. C., Akter, P., Mou, S. N., Tisha, S.
A., Shakil, F., ... & Ahmed, A. (2024). HARNESSING
MACHINE LEARNING MODELS FOR ACCURATE
CUSTOMER

LIFETIME

VALUE

PREDICTION:

A

COMPARATIVE STUDY IN MODERN BUSINESS
ANALYTICS. American Research Index Library, 06-22.

Akter, P., Hossain, S., Siddique, M. T., Ayub, M. I., Nath,
A., Nath, P. C., ... & Hassan, M. M. (2025). Sentiment
Analysis of Consumer Feedback and Its Impact on
Business Strategies by Machine Learning. The American
Journal of Applied sciences, 7(01), 6-16.

Hossain, M. S., Khan, A., Das, P., Haque, M. S. U.,
Kamruzzaman, F., Akter, S., ... & Miah, M. R. (2025).
Enhanced market trend forecasting using machine
learning models: a study with external factor
integration. International Interdisciplinary Business
Economics Advancement Journal, 6(01), 5-12.

References

Altman, E. I. (1968). Financial ratios, discriminant analysis, and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589-609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x

Bengio, Y., Courville, A., & Vincent, P. (2013). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1-127. https://doi.org/10.1561/2200000006

Breiman, L. (1986). Bagging predictors. Machine Learning, 24(2), 123-140. https://doi.org/10.1007/BF00116837

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324

Caruana, R., Gehrke, J., Koch, P., & Sturm, M. (2015). The importance of model interpretability in credit scoring. Proceedings of the 2015 IEEE International Conference on Data Mining, 567-576. https://doi.org/10.1109/ICDM.2015.61

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. https://doi.org/10.1145/2939672.2939785

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189-1232. https://doi.org/10.1214/aos/1013203451

Gangan, A., Bhattacharyya, D., & Gupta, P. (2020). Credit scoring using XGBoost: A comparison of machine learning approaches. International Journal of Computer Applications, 175(13), 1-6. https://doi.org/10.5120/ijca2020919469

Ke, G., Meng, Q., & Finley, T. (2017). LightGBM: A highly efficient gradient boosting decision tree. Proceedings of the 31st International Conference on Neural Information Processing Systems, 3146-3154. https://doi.org/10.5555/3295222.3295268

Liao, S. H., & Lu, C. C. (2018). Predicting credit scoring using LightGBM: An empirical study. Sustainable Computing: Informatics and Systems, 19, 1-7. https://doi.org/10.1016/j.suscom.2017.11.003

Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 18(1), 109-131. https://doi.org/10.2307/2490395

Zhao, Z. (2018). An analysis of credit risk prediction using machine learning. Journal of Computer Science and Technology, 33(5), 987-1003. https://doi.org/10.1007/s11390-018-1825-2

Md Jamil Ahmmed, Md Mohibur Rahman, Ashim Chandra Das, Pritom Das, Tamanna Pervin, Sadia Afrin, Sanjida Akter Tisha, Md Mehedi Hassan, & Nabila Rahman. (2024). COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR BANKING FRAUD DETECTION: A STUDY ON PERFORMANCE, PRECISION, AND REAL-TIME APPLICATION. International Journal of Computer Science & Information System, 9(11), 31–44. https://doi.org/10.55640/ijcsis/Volume09Issue11-04

Das, A. C., Mozumder, M. S. A., Hasan, M. A., Bhuiyan, M., Islam, M. R., Hossain, M. N., ... & Alam, M. I. (2024). MACHINE LEARNING APPROACHES FOR DEMAND FORECASTING: THE IMPACT OF CUSTOMER SATISFACTION ON PREDICTION ACCURACY. The American Journal of Engineering and Technology, 6(10), 42-53.

Md Risalat Hossain Ontor, Asif Iqbal, Emon Ahmed, Tanvirahmedshuvo, & Ashequr Rahman. (2024). LEVERAGING DIGITAL TRANSFORMATION AND SOCIAL MEDIA ANALYTICS FOR OPTIMIZING US FASHION BRANDS’ PERFORMANCE: A MACHINE LEARNING APPROACH. International Journal of Computer Science & Information System, 9(11), 45–56. https://doi.org/10.55640/ijcsis/Volume09Issue11-05

Rahman, A., Iqbal, A., Ahmed, E., & Ontor, M. R. H. (2024). PRIVACY-PRESERVING MACHINE LEARNING: TECHNIQUES, CHALLENGES, AND FUTURE DIRECTIONS IN SAFEGUARDING PERSONAL DATA MANAGEMENT. International journal of business and management sciences, 4(12), 18-32.

Shak, M. S., Uddin, A., Rahman, M. H., Anjum, N., Al Bony, M. N. V., Alam, M., ... & Pervin, T. (2024). INNOVATIVE MACHINE LEARNING APPROACHES TO FOSTER FINANCIAL INCLUSION IN MICROFINANCE. International Interdisciplinary Business Economics Advancement Journal, 5(11), 6-20.

Naznin, R., Sarkar, M. A. I., Asaduzzaman, M., Akter, S., Mou, S. N., Miah, M. R., ... & Sajal, A. (2024). ENHANCING SMALL BUSINESS MANAGEMENT THROUGH MACHINE LEARNING: A COMPARATIVE STUDY OF PREDICTIVE MODELS FOR CUSTOMER RETENTION, FINANCIAL FORECASTING, AND INVENTORY OPTIMIZATION. International Interdisciplinary Business Economics Advancement Journal, 5(11), 21-32.

Bhattacharjee, B., Mou, S. N., Hossain, M. S., Rahman, M. K., Hassan, M. M., Rahman, N., ... & Haque, M. S. U. (2024). MACHINE LEARNING FOR COST ESTIMATION AND FORECASTING IN BANKING: A COMPARATIVE ANALYSIS OF ALGORITHMS. Frontline Marketing, Management and Economics Journal, 4(12), 66-83.

Rahman, A., Iqbal, A., Ahmed, E., & Ontor, M. R. H. (2024). PRIVACY-PRESERVING MACHINE LEARNING: TECHNIQUES, CHALLENGES, AND FUTURE DIRECTIONS IN SAFEGUARDING PERSONAL DATA MANAGEMENT. Frontline Marketing, Management and Economics Journal, 4(12), 84-106.

Al Mamun, A., Hossain, M. S., Rishad, S. S. I., Rahman, M. M., Shakil, F., Choudhury, M. Z. M. E., ... & Sultana, S. (2024). MACHINE LEARNING FOR STOCK MARKET SECURITY MEASUREMENT: A COMPARATIVE ANALYSIS OF SUPERVISED, UNSUPERVISED, AND DEEP LEARNING MODELS. The American Journal of Engineering and Technology, 6(11), 63-76.

Das, A. C., Rishad, S. S. I., Akter, P., Tisha, S. A., Afrin, S., Shakil, F., ... & Rahman, M. M. (2024). ENHANCING BLOCKCHAIN SECURITY WITH MACHINE LEARNING: A COMPREHENSIVE STUDY OF ALGORITHMS AND APPLICATIONS. The American Journal of Engineering and Technology, 6(12), 150-162.

Miah, J., Khan, R. H., Ahmed, S., & Mahmud, M. I. (2023, June). A comparative study of detecting covid 19 by using chest X-ray images–A deep learning approach. In 2023 IEEE World AI IoT Congress (AIIoT) (pp. 0311-0316). IEEE.

Miah, J. (2024). HOW FAMILY DNA CAN CAUSE LUNG CANCER USING MACHINE LEARNING. International Journal of Medical Science and Public Health Research, 5(12), 8-14.

Rahman, M. M., Akhi, S. S., Hossain, S., Ayub, M. I., Siddique, M. T., Nath, A., ... & Hassan, M. M. (2024). EVALUATING MACHINE LEARNING MODELS FOR OPTIMAL CUSTOMER SEGMENTATION IN BANKING: A COMPARATIVE STUDY. The American Journal of Engineering and Technology, 6(12), 68-83.

Das, P., Pervin, T., Bhattacharjee, B., Karim, M. R., Sultana, N., Khan, M. S., ... & Kamruzzaman, F. N. U. (2024). OPTIMIZING REAL-TIME DYNAMIC PRICING STRATEGIES IN RETAIL AND E-COMMERCE USING MACHINE LEARNING MODELS. The American Journal of Engineering and Technology, 6(12), 163-177.

Hossain, M. N., Hossain, S., Nath, A., Nath, P. C., Ayub, M. I., Hassan, M. M., ... & Rasel, M. (2024). ENHANCED BANKING FRAUD DETECTION: A COMPARATIVE ANALYSIS OF SUPERVISED MACHINE LEARNING ALGORITHMS. American Research Index Library, 23-35.

Ahmmed, M. J., Rahman, M. M., Das, A. C., Das, P., Pervin, T., Afrin, S., ... & Rahman, N. (2024). COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR BANKING FRAUD DETECTION: A STUDY ON PERFORMANCE, PRECISION, AND REAL-TIME APPLICATION. American Research Index Library, 31-44.

Al Bony, M. N. V., Das, P., Pervin, T., Shak, M. S., Akter, S., Anjum, N., ... & Rahman, M. K. (2024). COMPARATIVE PERFORMANCE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR BUSINESS INTELLIGENCE: A STUDY ON CLASSIFICATION AND REGRESSION MODELS. Frontline Marketing, Management and Economics Journal, 4(11), 72-92.

Das, A. C., Rishad, S. S. I., Akter, P., Tisha, S. A., Afrin, S., Shakil, F., ... & Rahman, M. M. (2024). ENHANCING BLOCKCHAIN SECURITY WITH MACHINE LEARNING: A COMPREHENSIVE STUDY OF ALGORITHMS AND APPLICATIONS. The American Journal of Engineering and Technology, 6(12), 150-162.

Ahmed, M. P., Das, A. C., Akter, P., Mou, S. N., Tisha, S. A., Shakil, F., ... & Ahmed, A. (2024). HARNESSING MACHINE LEARNING MODELS FOR ACCURATE CUSTOMER LIFETIME VALUE PREDICTION: A COMPARATIVE STUDY IN MODERN BUSINESS ANALYTICS. American Research Index Library, 06-22.

Akter, P., Hossain, S., Siddique, M. T., Ayub, M. I., Nath, A., Nath, P. C., ... & Hassan, M. M. (2025). Sentiment Analysis of Consumer Feedback and Its Impact on Business Strategies by Machine Learning. The American Journal of Applied sciences, 7(01), 6-16.

Hossain, M. S., Khan, A., Das, P., Haque, M. S. U., Kamruzzaman, F., Akter, S., ... & Miah, M. R. (2025). Enhanced market trend forecasting using machine learning models: a study with external factor integration. International Interdisciplinary Business Economics Advancement Journal, 6(01), 5-12.