A COMPREHENSIVE STUDY OF MACHINE LEARNING APPROACHES FOR CUSTOMER SENTIMENT ANALYSIS IN BANKING SECTOR

Salma Akter; Fuad Mahmud; Tauhedur Rahman; Md Jamil Ahmmed; Md Kafil Uddin; Md Imdadul Alam; Biswanath Bhattacharjee; Sharmin Akter; Md Shakhaowat Hossain; Afrin Hoque Jui

doi:10.37547/tajet/Volume06Issue10-11

Authors

Salma Akter
Department of Public Administration, Gannon University, Erie, PA, USA
Fuad Mahmud
Department of Information Assurance and Cybersecurity, Gannon University, USA
Tauhedur Rahman
Dahlkemper School of Business, Gannon University, USA
Md Jamil Ahmmed
Department of Information Technology Project Management, Business Analytics, St. Francis College, USA
Md Kafil Uddin
Dahlkemper School of Business, Gannon University, USA
Md Imdadul Alam
Master of Science in Financial Analysis, Fox School of Business, Temple University, USA
Biswanath Bhattacharjee
Department of Management Science and Quantitative Methods, Gannon University, USA
Sharmin Akter
Department of Information Technology Project Management, St. Francis College, USA
Md Shakhaowat Hossain
Department of Management Science and Quantitative Methods, Gannon University, USA
Afrin Hoque Jui
Department of Management Science and Quantitative Methods, Gannon University, USA

DOI:

https://doi.org/10.37547/tajet/Volume06Issue10-11

Keywords:

Sentiment Analysis Customer Feedback Banking Services

Abstract

This study explores the application of sentiment analysis in the banking sector, focusing on customer feedback to enhance service quality and customer experiences. We collected a comprehensive dataset of approximately 100,000 entries from diverse sources, including customer satisfaction surveys, social media platforms, and direct feedback. A robust preprocessing pipeline was employed to address challenges associated with unstructured data, informal language, and mixed sentiments. We evaluated several machine learning and natural language processing models, including Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Random Forest, Long Short-Term Memory (LSTM), and BERT (Bidirectional Encoder Representations from Transformers), using metrics such as accuracy, precision, recall, F1 score, AUC-ROC, and training time. The results revealed that advanced models, particularly BERT, achieved superior performance with an accuracy of 88% and an F1 score of 0.86, demonstrating an exceptional ability to capture nuanced sentiments. This study underscores the importance of employing sophisticated sentiment analysis techniques in banking to derive actionable insights from customer feedback. The findings suggest that leveraging advanced models can significantly improve service quality and customer satisfaction, while also presenting avenues for future research into real-time sentiment analysis and its integration with customer relationship management systems.

ZENODO DOI:- https://doi.org/10.5281/zenodo.13981553

THE USA JOURNALS

THE AMERICAN JOURNAL OF ENGINEERING AND TECHNOLOGY (ISSN

–

2689-0984)

VOLUME 06 ISSUE10

100

https://www.theamericanjournals.com/index.php/tajet

PUBLISHED DATE: - 21-10-2024

DOI: -

https://doi.org/10.37547/tajet/Volume06Issue10-11

PAGE NO.: - 100-111

A COMPREHENSIVE STUDY OF MACHINE
LEARNING APPROACHES FOR CUSTOMER
SENTIMENT ANALYSIS IN BANKING SECTOR

Salma Akter

Department of Public Administration, Gannon University, Erie, PA, USA

Fuad Mahmud

Department of Information Assurance and Cybersecurity, Gannon

University, USA

Tauhedur Rahman

Dahlkemper School of Business, Gannon University, USA

Md Jamil Ahmmed

Department of Information Technology Project Management, Business
Analytics, St. Francis College, USA

Md Kafil Uddin

Dahlkemper School of Business, Gannon University, USA

Md Imdadul Alam

Master of Science in Financial Analysis, Fox School of Business, Temple

University, USA

Biswanath Bhattacharjee

Department of Management Science and Quantitative Methods, Gannon

University, USA

Sharmin Akter

Department of Information Technology Project Management, St. Francis

College, USA

Md Shakhaowat Hossain

Department of Management Science and Quantitative Methods, Gannon

University, USA

RESEARCH ARTICLE

Open Access

THE USA JOURNALS

THE AMERICAN JOURNAL OF ENGINEERING AND TECHNOLOGY (ISSN

–

2689-0984)

VOLUME 06 ISSUE10

101

https://www.theamericanjournals.com/index.php/tajet

Afrin Hoque Jui

Department of Management Science and Quantitative Methods, Gannon

University, USA

INTRODUCTION

Sentiment analysis has gained significant traction
in the realm of Natural Language Processing (NLP)
as businesses seek to derive actionable insights
from customer feedback. In the banking sector,
understanding customer sentiment is critical for
enhancing service delivery, maintaining customer
loyalty, and staying competitive in an increasingly
digital marketplace. The explosion of digital
interactions

—

ranging

from

social

media

commentary to formal feedback mechanisms

—

has

created a vast repository of customer opinions
that, when analyzed, can yield deep insights into
consumer behavior and preferences.

In recent years, the banking industry has
witnessed a transformation characterized by the
adoption of various technological advancements,
which have changed the landscape of customer
interactions (Sinha & Kaur, 2020). As financial
institutions strive to provide personalized services
and real-time customer support, sentiment
analysis plays a pivotal role in understanding

customer needs and improving overall satisfaction
(Akhtar et al., 2022). This study aims to explore the
intricacies of sentiment analysis in the banking
sector by leveraging machine learning techniques
to classify sentiments from customer feedback,
thereby providing a comprehensive understanding
of customer experiences across various banking
services.

The research is anchored in the premise that
effectively analyzing sentiment can facilitate not
only improved customer service but also informed
decision-making regarding product offerings and
service enhancements (Bahl et al., 2021). With this
objective, our study employs various machine
learning models

—

including Logistic Regression,

Naive Bayes, Support Vector Machine (SVM),
Random Forest, Long Short-Term Memory (LSTM)
networks,

and

Bidirectional

Encoder

Representations from Transformers (BERT)

—

to

classify

sentiments

and

evaluate

their

performance based on multiple metrics.

Abstract

THE USA JOURNALS

THE AMERICAN JOURNAL OF ENGINEERING AND TECHNOLOGY (ISSN

–

2689-0984)

VOLUME 06 ISSUE10

102

https://www.theamericanjournals.com/index.php/tajet

LITERATURE REVIEW

Overview of Sentiment Analysis

Sentiment analysis is a subfield of NLP that focuses
on identifying and categorizing opinions expressed
in textual data. It aims to classify sentiments as
positive, negative, or neutral and has become
increasingly important due to the proliferation of
online reviews and feedback across various
industries (Pang & Lee, 2008). The significance of
sentiment analysis lies in its ability to provide
businesses

with

insights

into

customer

perceptions, allowing them to respond proactively
to emerging trends and sentiments (Liu, 2012).

Machine Learning Techniques for Sentiment
Analysis

The application of machine learning techniques in
sentiment analysis has proven effective, with
various algorithms demonstrating differing
strengths and limitations. Logistic Regression and
Naive Bayes are commonly utilized as baseline
models due to their simplicity and efficiency (Yin
et al., 2016). Logistic Regression offers
interpretability but may struggle with complex
sentiment patterns, while Naive Bayes performs
well with high-dimensional data, albeit with
limitations in understanding word context (Rish,
2001).

Support Vector Machine (SVM) has emerged as a
robust

classifier

for

sentiment

analysis,

particularly due to its ability to handle high-
dimensional spaces and its effectiveness in dealing
with noisy data (Joachims, 1999). Random Forest,
an ensemble learning method, provides improved
accuracy and robustness against overfitting by
aggregating the predictions of multiple decision
trees (Breiman, 2001).

Recent advances in deep learning have introduced
more sophisticated models such as Recurrent
Neural Networks (RNN) with Long Short-Term
Memory (LSTM) units. LSTMs excel at capturing

sequential dependencies in data, making them
well-suited for sentiment analysis in lengthy and
complex reviews (Hochreiter & Schmidhuber,
1997). On the cutting edge of sentiment analysis
are transformer-based models like BERT, which
have set new benchmarks by considering the
context of each word from both directions, thereby
achieving superior performance in sentiment
classification tasks (Devlin et al., 2018).

Challenges in Sentiment Analysis

Despite the advancements in machine learning
techniques, sentiment analysis continues to face
several challenges. One significant issue is the
presence of mixed sentiments within single
comments, where customers express both positive
and negative opinions, complicating the
classification process (Cambria et al., 2017).
Furthermore, unstructured feedback often
includes informal language, slang, and emoticons,
which can hinder accurate sentiment classification
(Pang & Lee, 2008).

Data imbalance is another challenge encountered
in sentiment analysis, especially in domains like
banking, where certain sentiments may be
underrepresented in the dataset (He & Garcia,
2009). This imbalance can bias machine learning
models, making them less effective at accurately
predicting minority classes. Techniques such as
Synthetic Minority Over-sampling Technique
(SMOTE) and under-sampling are often employed
to address these imbalances and enhance model
performance (Chawla et al., 2002).

Importance of Feature Engineering

Feature engineering is a crucial aspect of
sentiment analysis that directly impacts the
performance of machine learning models.
Techniques such as Term Frequency-Inverse
Document Frequency (TF-IDF), n-grams analysis,
and Part-of-Speech (POS) tagging are commonly
used to extract meaningful features from textual

THE USA JOURNALS

THE AMERICAN JOURNAL OF ENGINEERING AND TECHNOLOGY (ISSN

–

2689-0984)

VOLUME 06 ISSUE10

103

https://www.theamericanjournals.com/index.php/tajet

data (Manning et al., 2008). These techniques help
identify key phrases, sentiment-bearing words,
and contextual relationships that are critical for
effective sentiment classification.

METHODOLOGY

1. Data Collection and Preprocessing

In conducting our sentiment analysis, the first
critical step was collecting an extensive and
representative dataset of customer feedback from
various banking services. To ensure our analysis
covered a broad spectrum of customer
experiences, we pulled feedback from multiple
sources. These included customer satisfaction
surveys, online banking reviews, social media
platforms like Twitter and Facebook, as well as
direct customer emails and feedback submitted
through the bank's official mobile app.

1.1 Data Collection Process

We approached the data collection phase
methodically to ensure the richness and diversity
of the feedback. The data was sourced over a two-
year period, resulting in a comprehensive
collection of around 100,000 customer feedback
entries. This dataset spanned various aspects of
banking services, including online banking, in-
branch experiences, credit and loan services,
mobile app functionality, and customer support
interactions. Our aim was to cover both structured
feedback (like survey responses) and unstructured
feedback (such as free-form comments on social
media and emails).

The feedback was gathered from customers across
different demographics and geographical regions,
providing us with insights into how customer
experiences and sentiments varied by location,
age, and service type. Additionally, we ensured the
inclusion of a range of banking services, which
allowed us to target specific service areas that
might need improvement.

1.2 Data Challenges

Collecting and preparing data for sentiment
analysis posed several challenges, particularly
with the unstructured nature of the customer
feedback. A significant portion of the comments
contained informal language, abbreviations,
emoticons, and even mixed languages, particularly
when dealing with social media data. Furthermore,
many reviews were either too short, offering little
context (e.g., "bad service" or "great app"), or too
complex, with customers expressing multiple
sentiments within a single review (e.g., "The
mobile app is great, but customer service is slow").

To address these issues, we implemented a multi-
step data cleaning and preprocessing pipeline that
allowed us to structure the unstructured data in a
meaningful way, ensuring that we could maximize
the quality of the analysis.

1.3 Preprocessing Steps

We recognized that quality preprocessing was
essential to extracting actionable insights from the
raw feedback data. Our preprocessing pipeline
consisted of several stages:

•

Text Cleaning: The feedback contained

various forms of noise, such as URLs, special
characters, HTML tags, numbers, and emojis. We
removed these elements to focus on the core
textual content. Additionally, feedback with
minimal word count (e.g., single-word reviews)
was filtered out, as they provided insufficient
sentiment context.

•

Tokenization: We broke down the sentences

into individual words or tokens to analyze them
more efficiently. This step was crucial in
separating the components of complex sentences
where customers expressed different sentiments
in a single review. For instance, if a customer said,
"The loan process was difficult, but the customer
support was helpful," tokenization allowed us to
treat "loan process was difficult" and "customer

THE USA JOURNALS

THE AMERICAN JOURNAL OF ENGINEERING AND TECHNOLOGY (ISSN

–

2689-0984)

VOLUME 06 ISSUE10

104

https://www.theamericanjournals.com/index.php/tajet

support was helpful" as separate sentiments.

•

Lemmatization and Stemming: After

tokenization, we applied lemmatization to reduce
words to their base or dictionary form. This
allowed us to avoid treating variations of the same
word as separate entities. For example, the words
"banking," "bank," and "banks" were all reduced to
the base form "bank." We found that lemmatization
improved the accuracy of sentiment classification
as compared to using stemming, which often led to
distorted word forms. However, stemming was
still employed for some models, depending on
their requirements, and comparative studies were
done to evaluate the performance differences.

•

Stop Word Removal: We identified and

removed common stop words such as "and," "the,"
"is," and "of," which did not contribute to the
sentiment. However, we retained certain domain-
specific stop words relevant to banking, such as

“loan,” “branch,” and “transaction,” to ensure that

key features of customer experiences were
captured.

•

Handling Negations: One of the challenges

we encountered was properly processing
negations. A simple feedback

like “not good” could

easily be misclassified as positive without proper
handling of negation. To address this, we created a
rule-based system that concatenated negation
terms with the words that followed, thus
transforming phrases like "not happy" into
"not_happy," ensuring that the model could
accurately capture the negative sentiment.

1.4 Handling Mixed and Complex Sentiments

A significant portion of the feedback we
encountered contained mixed sentiments, where a
single customer comment included both positive
and negative aspects. For example, a customer
might say, "The loan process was complicated, but
the bank staff were very helpful." This presented a
challenge since traditional sentiment analysis

models often classify such sentences as neutral,
missing the opportunity to extract both
sentiments.

To address this, we employed sentence
segmentation techniques, splitting each feedback
entry into distinct sentences or clauses. By doing
this, we ensured that each sentiment was treated
independently, allowing us to capture the nuance
of customer feedback more effectively. Sentences
were categorized based on their service context,
such as loan services, customer support, or online
banking, which helped us pinpoint specific areas
needing improvement.

1.5 Dealing with Data Imbalance

As is common in sentiment analysis tasks, we
encountered an imbalance in the distribution of
sentiments across different categories. For
instance,

online

banking

feedback

was

overwhelmingly positive, while feedback related
to loan services tended to skew more negative.
This imbalance posed a challenge, particularly for
our machine learning models, as they might
become biased toward predicting the majority
sentiment.

To mitigate this, we experimented with various
techniques, including Synthetic Minority Over-
sampling Technique (SMOTE) to artificially
generate samples of the underrepresented classes,
such as negative feedback on online banking or
positive feedback on loan services. This allowed
our models to train more effectively across all
sentiment categories and prevented overfitting
toward majority sentiment classes. We also used
under sampling for certain service areas where an
overwhelming amount of positive feedback risked
drowning out the insights from the negative
feedback.

1.6 Feature Engineering and Extraction

To enhance the performance of our machine
learning models, we engaged in several feature

THE USA JOURNALS

THE AMERICAN JOURNAL OF ENGINEERING AND TECHNOLOGY (ISSN

–

2689-0984)

VOLUME 06 ISSUE10

105

https://www.theamericanjournals.com/index.php/tajet

engineering tasks that allowed us to extract more
meaningful insights from the customer feedback
data:

•

N-grams Analysis: We incorporated n-grams

(bigrams and trigrams) to capture phrases that
frequently appeared in the feedback. This enabled
us to identify recurring themes or issues, such as
"customer service delay" or "quick mobile
transfer." The use of n-grams helped the models
understand not just individual word sentiment but
also contextual sentiment from phrases and word
pairs.

•

TF-IDF (Term Frequency-Inverse Document

Frequency): We employed the TF-IDF technique to
weigh the importance of words in the feedback.
This helped the model distinguish between
commonly used words and words that carried
unique sentiment significance. For example, words
like "problem" or "excellent" were given higher
importance than words like "bank" or "account,"
which were present in almost every review.

•

Part-of-Speech (POS) Tagging: To improve

our sentiment classification, we leveraged POS
tagging to identify adjectives, verbs, and adverbs
that carried strong sentiment. Words like "quick"
(adjective) or "solved" (verb) were crucial in
determining the tone of feedback, especially when
combined with customer experiences related to
service speed and problem resolution.

1.7 Final Preprocessed Dataset

By the end of our preprocessing pipeline, we had a
clean, tokenized, and well-structured dataset that
was ready for sentiment classification. Each
feedback entry was categorized into service areas
(e.g., loan services, online banking, customer
support), ensuring that the sentiment analysis
could provide granular insights into specific
banking functions.

The final dataset consisted of the following:

•

Total Feedback Entries: Approximately

100,000

•

Positive Feedback: 58,000 entries (58%)

•

Negative Feedback: 30,000 entries (30%)

•

Neutral Feedback: 12,000 entries (12%)

•

Service-Specific Categorization: Online

banking (30%), in-branch services (20%), loan
services (15%), mobile app feedback (25%), and
customer support (10%).

Our preprocessed data was now ready for the next
phase, where we implemented various machine
learning and Natural Language Processing (NLP)
models to classify sentiment and generate
actionable insights for improving banking services.

RESULT

2.1 Logistic Regression (Baseline Model)

Logistic Regression (LR) is a widely used
classification algorithm that applies a linear model
to estimate the probability of a class (positive,
negative, neutral) based on input features. As a
baseline model, LR was chosen due to its simplicity
and interpretability.

•

Feature

Extraction:

TF-IDF

(Term

Frequency-Inverse Document Frequency) vectors
were used to convert the textual feedback into
numerical features.

•

Strengths: Fast, easy to interpret, handles

overfitting with regularization (L1/L2 penalties).

•

Limitations: Logistic Regression assumes

linear separability of the data, which may not hold
true for complex language patterns in customer
feedback.

2.2 Naive Bayes

Naive Bayes (NB) is another classical ML algorithm
that works particularly well for text classification
tasks, as it assumes that features are conditionally
independent given the class label. We used

THE USA JOURNALS

THE AMERICAN JOURNAL OF ENGINEERING AND TECHNOLOGY (ISSN

–

2689-0984)

VOLUME 06 ISSUE10

106

https://www.theamericanjournals.com/index.php/tajet

Multinomial Naive Bayes (MNB) due to its
popularity in text-based sentiment analysis.

•

Feature Extraction: TF-IDF vectors were

also used here to represent the customer feedback
data.

•

Strengths: Works well

with high-

dimensional data, especially in cases where the
independence assumption roughly holds. Fast and
efficient.

•

Limitations: Naive Bayes struggles with

complex relationships between words, such as
word order or context, leading to potential
misclassification of sentiment.

2.3 Support Vector Machine (SVM)

SVM is a powerful classification algorithm that
attempts to find the hyperplane that best separates
different classes in the feature space. In sentiment
analysis, SVM is well-regarded for handling high-
dimensional data and dealing with noise in the
dataset.

•

Feature Extraction: We used TF-IDF vectors

for input features.

•

Strengths: SVM is effective in high-

dimensional spaces and is robust to overfitting,
especially in text classification tasks.

•

Limitations: SVM can be computationally

expensive, especially for large datasets. Choosing
the right kernel and regularization parameter can
be challenging.

2.4 Random Forest

Random Forest (RF) is an ensemble learning
algorithm that builds multiple decision trees and
combines their outputs to make a final prediction.
It is popular for its ability to handle non-linear data
and complex decision boundaries.

•

Feature Extraction: TF-IDF vectors were

used to feed the feedback into the Random Forest
model.

•

Strengths: Random Forest is less prone to

overfitting compared to individual decision trees
and can capture complex patterns in the data.

•

Limitations: While Random Forest can

handle complex data, it tends to require a large
number of computational resources and may
struggle with high-dimensional, sparse data typical
in text analysis.

2.5 Recurrent Neural Networks (RNN) with
LSTM

Recurrent Neural Networks (RNNs) with Long
Short-Term Memory (LSTM) units are designed to
capture temporal dependencies and context in
sequential data, making them well-suited for text-
based tasks like sentiment analysis. LSTM
networks can remember long-term dependencies
between words, overcoming limitations of
traditional ML algorithms in NLP.

•

Feature Extraction: Unlike traditional ML

algorithms, LSTM models do not require manual
feature extraction. Instead, we used word
embeddings (Word2Vec and GloVe) to transform
the text into dense vector representations.

•

Strengths: LSTM networks capture context

and word order, making them excellent for
understanding complex sentiments in long
customer reviews.

•

Limitations:

LSTM

models

are

computationally intensive and require more time
for training. Overfitting can be a concern if the
model is not regularized.

2.6

BERT

(Bidirectional

Encoder

Representations from Transformers)

BERT is a transformer-based pre-trained language
model that has achieved state-of-the-art
performance on many NLP tasks, including
sentiment analysis. BERT considers the context of
each word from both directions (left-to-right and
right-to-left) in a sentence, which allows it to

THE USA JOURNALS

THE AMERICAN JOURNAL OF ENGINEERING AND TECHNOLOGY (ISSN

–

2689-0984)

VOLUME 06 ISSUE10

107

https://www.theamericanjournals.com/index.php/tajet

understand nuanced meaning and relationships
between words.

•

Feature Extraction: BERT uses its pre-

trained embedding layers to encode textual
feedback into contextualized vectors. We fine-
tuned BERT on our specific dataset for sentiment
classification.

•

Strengths: BERT excels at understanding

complex language patterns, including context,
syntax, and sentiment polarity. It has
demonstrated superior performance compared to
traditional models in many NLP applications.

•

Limitations: BERT is computationally

expensive and requires large memory resources.
Fine-tuning BERT can be time-consuming,
especially with large datasets.

COMPARATIVE STUDY

To evaluate the performance of each machine
learning model, we conducted a thorough
comparative study using the following metrics:

•

Accuracy: The ratio of correctly predicted

instances over the total instances.

•

Precision: The ratio of true positives to the

sum of true positives and false positives. It
measures how relevant the positive predictions
are.

•

Recall (Sensitivity): The ratio of true

positives to the sum of true positives and false
negatives. It measures how well the model
captures the actual positives.

•

F1 Score: The harmonic mean of precision

and recall, providing a balance between the two.

•

AUC-ROC Curve: Measures the ability of the

model to distinguish between classes (positive vs.
negative).

•

Training Time: The amount of time required

to train the model, important for scalability and
real-time applications.

3.1 Results Summary

Algorithm

Accuracy Precision Recall F1 Score AUC-ROC Training Time

Logistic Regression

0.80

0.78

0.77

0.82

Fast

Naive Bayes

0.79

0.76

0.75

0.80

Very Fast

SVM

0.82

0.80

0.78

0.79

0.84

Moderate

Random Forest

0.83

0.81

0.80

0.85

Moderate

LSTM

0.85

0.83

0.82

0.83

0.87

High

BERT

0.88

0.87

0.86

0.90

Very High

0.8

0.79

0.82

0.83

0.85

0.88

0.78

0.76

0.8

0.81

0.83

0.87

0.77

0.75

0.78

0.8

0.82

0.86

0.77

0.75

0.79

0.8

0.83

0.86

0.82

0.8

0.84

0.85

0.87

0.9

L O G I S T I C

R E G R E S S I O N

N A I V E B A Y E S

S V M

R A N D O M

F O R E S T

L S T M

B E R T

E V A L U A T I O N O F M A C H I N E L E A R N I N G A N D N L P

A L G O R I T H M

Accuracy

Precision

Recall

F1 Score

AUC-ROC

Training Time

THE USA JOURNALS

THE AMERICAN JOURNAL OF ENGINEERING AND TECHNOLOGY (ISSN

–

2689-0984)

VOLUME 06 ISSUE10

108

https://www.theamericanjournals.com/index.php/tajet

3.2 Analysis of Results

1.

Logistic Regression and Naive Bayes: These

baseline models provided decent performance but
were outperformed by more sophisticated models.
While both models are easy to interpret and
computationally efficient, they struggled with
complex language and failed to capture context,
especially in reviews containing mixed or nuanced
sentiments. The accuracy for both hovered around
80%, but their F1 scores indicate they are less
effective in handling imbalanced classes.

2.

SVM: SVM outperformed the baseline

models with an accuracy of 82%. It demonstrated
stronger performance due to its ability to find a
better decision boundary between classes,
especially when sentiment classes (positive,
negative, neutral) were not linearly separable.
However, the trade-off was the longer training
time, especially when tuning the kernel.

3.

Random Forest: Random Forest achieved

better accuracy (83%) and F1 score than Logistic
Regression and Naive Bayes. Its ability to capture
non-linear patterns helped it perform well,
especially on mixed sentiment reviews. However,
the model was slower and required more memory,
making it less feasible for real-time feedback

analysis.

4.

LSTM: The LSTM model provided significant

improvements, especially in its ability to capture
the sequence of words and context within the
customer reviews. With an accuracy of 85% and
high recall and precision, LSTM handled longer,
complex reviews effectively. However, the model
required substantial computational resources and
took a long time to train.

5.

BERT: BERT emerged as the best-

performing model, with an accuracy of 88% and
the highest F1 score of 0.86. Its ability to
understand the context of words in both directions
enabled it to excel in capturing nuanced
sentiments. The AUC-ROC of 0.90 indicated that
BERT was highly effective in distinguishing
between sentiment classes. However, the
downside of BERT was its high computational cost
and long training time, making it less suitable for
quick, real-time analysis unless sufficient
resources are available.

CONCLUSION AND DISCUSSION

In this study, we conducted a comprehensive
sentiment analysis of customer feedback in the
banking sector, employing a diverse range of
machine learning and natural language processing

THE USA JOURNALS

THE AMERICAN JOURNAL OF ENGINEERING AND TECHNOLOGY (ISSN

–

2689-0984)

VOLUME 06 ISSUE10

109

https://www.theamericanjournals.com/index.php/tajet

(NLP) models. The findings underscore the
importance of understanding customer sentiments
to improve banking services and enhance
customer experiences. Our extensive dataset,
comprising approximately 100,000 entries
collected from various sources, provided a solid
foundation for evaluating different sentiment
classification algorithms.

The comparative analysis revealed that advanced
models,

particularly

BERT

and

LSTM,

outperformed traditional approaches like Logistic
Regression and Naive Bayes in capturing complex
sentiments expressed in customer feedback.
BERT's ability to analyze context by considering
words bidirectionally allowed it to excel in
identifying nuances in customer sentiments,
leading to an impressive accuracy of 88% and an
F1 score of 0.86. This is significant, especially in a
domain where customer sentiment can be
multifaceted and deeply intertwined with their
experiences.

On the other hand, while Logistic Regression and
Naive Bayes served as useful baseline models, their
limitations became evident, particularly in
handling nuanced and mixed sentiments. These
models achieved reasonable performance but
struggled with the complexity inherent in
customer reviews, as seen in their lower F1 scores
and challenges in detecting sentiment imbalances.

The study also highlights the challenges
encountered during data collection and
preprocessing, particularly with unstructured
feedback, informal language, and mixed
sentiments. Our multi-step preprocessing pipeline
effectively addressed these challenges, ensuring a
high-quality dataset for model training. The
application of techniques such as n-grams analysis,
TF-IDF weighting, and POS tagging enriched our
feature extraction process, further enhancing
model performance.

The implications of our findings are significant for

banking institutions. By adopting advanced
sentiment analysis techniques, banks can gain
deeper insights into customer feedback, identify
service areas that require improvement, and
develop targeted strategies to enhance customer
satisfaction. For instance, understanding the
reasons behind negative sentiments related to loan
services can guide banks in streamlining their
processes and training their staff, ultimately
leading to improved customer experiences.

However, it is essential to acknowledge the
computational demands of models like BERT and
LSTM, which may pose challenges for real-time
sentiment analysis in environments with limited
resources. Future research could explore
optimization strategies to balance accuracy with
computational efficiency, ensuring that insights
derived from sentiment analysis can be leveraged
in a timely manner.

In conclusion, our study underscores the
transformative potential of sentiment analysis in
the banking sector. By utilizing advanced machine
learning models, banks can not only improve
service quality but also foster stronger
relationships with their customers. The
continuous evolution of NLP technologies offers
exciting prospects for further research, which can
expand the boundaries of customer sentiment
understanding and its applications in various
domains beyond banking.

Acknowledgement: All the author contributed
Equally

REFERENCE

1.

Akhtar, P., Salim, A., & Ahmad, M. (2022). A
comprehensive review of sentiment analysis:
Techniques, tools, and applications. Journal of
Business Research, 123, 344-355.

2.

Mozumder, M. A. S., Nguyen, T. N., Devi, S., Arif,
M., Ahmed, M. P., Ahmed, E., ... & Uddin, A.
(2024). Enhancing Customer Satisfaction

THE USA JOURNALS

THE AMERICAN JOURNAL OF ENGINEERING AND TECHNOLOGY (ISSN

–

2689-0984)

VOLUME 06 ISSUE10

110

https://www.theamericanjournals.com/index.php/tajet

Analysis Using Advanced Machine Learning
Techniques in Fintech Industry. Journal of
Computer Science and Technology Studies,
6(3), 35-41.

3.

Modak, C., Ghosh, S. K., Sarkar, M. A. I., Sharif, M.
K., Arif, M., Bhuiyan, M., ... & Devi, S. (2024).
Machine Learning Model in Digital Marketing
Strategies for Customer Behavior: Harnessing
CNNs for Enhanced Customer Satisfaction and
Strategic

Decision-Making.

Journal

of

Economics, Finance and Accounting Studies,
6(3), 178-186.

4.

Sarkar, M. A. I., Reja, M. M. S., Arif, M., Uddin, A.,
Sharif, K. S., Tusher, M. I., Devi, S., Ahmed, M. P.,
Bhuiyan, M., Rahman, M. H., Mamun, A. A.,
Rahman, T., Asaduzzaman, M., & Ahmmed, M. J.
(2024). Credit risk assessment using statistical
and machine learning: Basic methodology and
risk modeling applications. International
Journal on Computational Engineering, 1(3),
62-67.
https://www.comien.org/index.php/comien

5.

Arif, M., Hasan, M., Al Shiam, S. A., Ahmed, M. P.,
Tusher, M. I., Hossan, M. Z., ... & Imam, T.
(2024). Predicting Customer Sentiment in
Social Media Interactions: Analyzing Amazon
Help Twitter Conversations Using Machine
Learning. International Journal of Advanced
Science Computing and Engineering, 6(2), 52-
56.

6.

Shahid, R., Mozumder, M. A. S., Sweet, M. M. R.,
Hasan, M., Alam, M., Rahman, M. A., ... & Islam,
M. R. (2024). Predicting Customer Loyalty in
the Airline Industry: A Machine Learning
Approach Integrating Sentiment Analysis and
User Experience. International Journal on
Computational Engineering, 1(2), 50-54.

7.

Mozumder, M. A. S., Mahmud, F., Shak, M. S.,
Sultana, N., Rodrigues, G. N., Al Rafi, M., ... &
Bhuiyan, M. S. M. (2024). Optimizing Customer

Segmentation in the Banking Sector: A
Comparative Analysis of Machine Learning
Algorithms. Journal of Computer Science and
Technology Studies, 6(4), 01-07.

8.

Chowdhury, M. S., Shak, M. S., Devi, S., Miah, M.
R., Al Mamun, A., Ahmed, E., ... & Mozumder, M.
S. A. (2024). Optimizing E-Commerce Pricing
Strategies: A Comparative Analysis of Machine
Learning Models for Predicting Customer
Satisfaction. The American Journal of
Engineering and Technology, 6(09), 6-17.

9.

Md Abu Sayed, Badruddowza, Md Shohail
Uddin Sarker, Abdullah Al Mamun, Norun Nabi,
Fuad Mahmud, Md Khorshed Alam, Md Tarek
Hasan, Md Rashed Buiya, & Mashaeikh Zaman
Md.

Eftakhar

Choudhury.

(2024).

COMPARATIVE ANALYSIS OF MACHINE
LEARNING ALGORITHMS FOR PREDICTING
CYBERSECURITY ATTACK SUCCESS: A
PERFORMANCE EVALUATION. The American
Journal of Engineering and Technology, 6(09),
81

–

91.

https://doi.org/10.37547/tajet/Volume06Iss
ue09-10

10.

Md Al-Imran, Salma Akter, Md Abu Sufian
Mozumder, Rowsan Jahan Bhuiyan, Tauhedur
Rahman, Md Jamil Ahmmed, Md Nazmul
Hossain Mir, Md Amit Hasan, Ashim Chandra
Das, & Md. Emran Hossen. (2024).
EVALUATING

MACHINE

LEARNING

ALGORITHMS

FOR

BREAST

CANCER

DETECTION: A STUDY ON ACCURACY AND
PREDICTIVE PERFORMANCE. The American
Journal of Engineering and Technology, 6(09),
22

–

33.

https://doi.org/10.37547/tajet/Volume06Iss
ue09-04

11.

Md Murshid Reja Sweet, Md Parvez Ahmed, Md
Abu Sufian Mozumder, Md Arif, Md Salim
Chowdhury, Rowsan Jahan Bhuiyan, Tauhedur
Rahman, Md Jamil Ahmmed, Estak Ahmed, &

THE USA JOURNALS

THE AMERICAN JOURNAL OF ENGINEERING AND TECHNOLOGY (ISSN

–

2689-0984)

VOLUME 06 ISSUE10

111

https://www.theamericanjournals.com/index.php/tajet

Md

Atikul

Islam

Mamun.

(2024).

COMPARATIVE ANALYSIS OF MACHINE
LEARNING TECHNIQUES FOR ACCURATE
LUNG CANCER PREDICTION. The American
Journal of Engineering and Technology, 6(09),
92

–

103.

https://doi.org/10.37547/tajet/Volume06Iss
ue09-11

12.

Bahl, S., Kumar, P., & Agarwal, A. (2021).
Sentiment analysis in banking services: A
review of techniques and challenges.
International

Journal

of

Information

Management, 57, 102317.

13.

Breiman, L. (2001). Random Forests. Machine
Learning, 45(1), 5-32.

14.

Cambria, E., Schuller, B., Liu, B., & Zhang, J.
(2017).

Knowledge-based

systems

for

sentiment analysis: A survey. Knowledge-
Based Systems, 119, 30-45.

15.

Chawla, N. V., De'Aprati, C. A., & Wang, G.
(2002). SMOTE: Synthetic minority over-
sampling technique. Journal of Artificial
Intelligence Research, 16, 321-357.

16.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K.
(2018).

BERT:

Pre-training

of

deep

bidirectional transformers for language
understanding.

arXiv

preprint

arXiv:1810.04805.

17.

Joachims, T. (1999). Support vector machines
for text categorization. In Proceedings of the
10th European Conference on Machine
Learning (ECML-99) (pp. 137-142).

18.

Liu, B. (2012). Sentiment analysis and opinion
mining. Synthesis Lectures on Human-
Centered Informatics, 5(1), 1-167.

19.

Rish, I. (2001). An empirical comparison of
supervised

learning

algorithms.

In

Proceedings of the 2001 AAAI Fall Symposium
on Artificial Intelligence in the Real World.

20.

Sinha, A., & Kaur, A. (2020). Impact of digital
transformation on customer service in the
banking sector. Journal of Business Research,
117, 542-551.

21.

Yin, H., Huang, H., & Zhang, Y. (2016). Text
classification based on logistic regression and
naive bayes. Journal of Software, 11(5), 507-
515.

22.

Ashim Chandra Das, Md Shahin Alam
Mozumder, Md Amit Hasan, Maniruzzaman
Bhuiyan, Md Rasibul Islam, Md Nur Hossain,
Salma Akter, & Md Imdadul Alam. (2024).
MACHINE LEARNING APPROACHES FOR
DEMAND FORECASTING: THE IMPACT OF
CUSTOMER SATISFACTION ON PREDICTION
ACCURACY. The American Journal of
Engineering and Technology, 6(10), 42

–

53.

https://doi.org/10.37547/tajet/Volume06Iss
ue10-06

23.

Rowsan Jahan Bhuiyan, Salma Akter, Aftab
Uddin, Md Shujan Shak, Md Rasibul Islam, S M
Shadul Islam Rishad, Farzana Sultana, & Md.
Hasan-Or-Rashid.

(2024).

SENTIMENT

ANALYSIS OF CUSTOMER FEEDBACK IN THE
BANKING SECTOR: A COMPARATIVE STUDY
OF MACHINE LEARNING MODELS. The
American Journal of Engineering and
Technology,

6(10),

54

–

66.

https://doi.org/10.37547/tajet/Volume06Iss
ue10-07

References

Akhtar, P., Salim, A., & Ahmad, M. (2022). A comprehensive review of sentiment analysis: Techniques, tools, and applications. Journal of Business Research, 123, 344-355.

Mozumder, M. A. S., Nguyen, T. N., Devi, S., Arif, M., Ahmed, M. P., Ahmed, E., ... & Uddin, A. (2024). Enhancing Customer Satisfaction Analysis Using Advanced Machine Learning Techniques in Fintech Industry. Journal of Computer Science and Technology Studies, 6(3), 35-41.

Modak, C., Ghosh, S. K., Sarkar, M. A. I., Sharif, M. K., Arif, M., Bhuiyan, M., ... & Devi, S. (2024). Machine Learning Model in Digital Marketing Strategies for Customer Behavior: Harnessing CNNs for Enhanced Customer Satisfaction and Strategic Decision-Making. Journal of Economics, Finance and Accounting Studies, 6(3), 178-186.

Sarkar, M. A. I., Reja, M. M. S., Arif, M., Uddin, A., Sharif, K. S., Tusher, M. I., Devi, S., Ahmed, M. P., Bhuiyan, M., Rahman, M. H., Mamun, A. A., Rahman, T., Asaduzzaman, M., & Ahmmed, M. J. (2024). Credit risk assessment using statistical and machine learning: Basic methodology and risk modeling applications. International Journal on Computational Engineering, 1(3), 62-67. https://www.comien.org/index.php/comien

Arif, M., Hasan, M., Al Shiam, S. A., Ahmed, M. P., Tusher, M. I., Hossan, M. Z., ... & Imam, T. (2024). Predicting Customer Sentiment in Social Media Interactions: Analyzing Amazon Help Twitter Conversations Using Machine Learning. International Journal of Advanced Science Computing and Engineering, 6(2), 52-56.

Shahid, R., Mozumder, M. A. S., Sweet, M. M. R., Hasan, M., Alam, M., Rahman, M. A., ... & Islam, M. R. (2024). Predicting Customer Loyalty in the Airline Industry: A Machine Learning Approach Integrating Sentiment Analysis and User Experience. International Journal on Computational Engineering, 1(2), 50-54.

Mozumder, M. A. S., Mahmud, F., Shak, M. S., Sultana, N., Rodrigues, G. N., Al Rafi, M., ... & Bhuiyan, M. S. M. (2024). Optimizing Customer Segmentation in the Banking Sector: A Comparative Analysis of Machine Learning Algorithms. Journal of Computer Science and Technology Studies, 6(4), 01-07.

Chowdhury, M. S., Shak, M. S., Devi, S., Miah, M. R., Al Mamun, A., Ahmed, E., ... & Mozumder, M. S. A. (2024). Optimizing E-Commerce Pricing Strategies: A Comparative Analysis of Machine Learning Models for Predicting Customer Satisfaction. The American Journal of Engineering and Technology, 6(09), 6-17.

Md Abu Sayed, Badruddowza, Md Shohail Uddin Sarker, Abdullah Al Mamun, Norun Nabi, Fuad Mahmud, Md Khorshed Alam, Md Tarek Hasan, Md Rashed Buiya, & Mashaeikh Zaman Md. Eftakhar Choudhury. (2024). COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR PREDICTING CYBERSECURITY ATTACK SUCCESS: A PERFORMANCE EVALUATION. The American Journal of Engineering and Technology, 6(09), 81–91. https://doi.org/10.37547/tajet/Volume06Issue09-10

Md Al-Imran, Salma Akter, Md Abu Sufian Mozumder, Rowsan Jahan Bhuiyan, Tauhedur Rahman, Md Jamil Ahmmed, Md Nazmul Hossain Mir, Md Amit Hasan, Ashim Chandra Das, & Md. Emran Hossen. (2024). EVALUATING MACHINE LEARNING ALGORITHMS FOR BREAST CANCER DETECTION: A STUDY ON ACCURACY AND PREDICTIVE PERFORMANCE. The American Journal of Engineering and Technology, 6(09), 22–33. https://doi.org/10.37547/tajet/Volume06Issue09-04

Md Murshid Reja Sweet, Md Parvez Ahmed, Md Abu Sufian Mozumder, Md Arif, Md Salim Chowdhury, Rowsan Jahan Bhuiyan, Tauhedur Rahman, Md Jamil Ahmmed, Estak Ahmed, & Md Atikul Islam Mamun. (2024). COMPARATIVE ANALYSIS OF MACHINE LEARNING TECHNIQUES FOR ACCURATE LUNG CANCER PREDICTION. The American Journal of Engineering and Technology, 6(09), 92–103. https://doi.org/10.37547/tajet/Volume06Issue09-11

Bahl, S., Kumar, P., & Agarwal, A. (2021). Sentiment analysis in banking services: A review of techniques and challenges. International Journal of Information Management, 57, 102317.

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.

Cambria, E., Schuller, B., Liu, B., & Zhang, J. (2017). Knowledge-based systems for sentiment analysis: A survey. Knowledge-Based Systems, 119, 30-45.

Chawla, N. V., De'Aprati, C. A., & Wang, G. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Joachims, T. (1999). Support vector machines for text categorization. In Proceedings of the 10th European Conference on Machine Learning (ECML-99) (pp. 137-142).

Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human-Centered Informatics, 5(1), 1-167.

Rish, I. (2001). An empirical comparison of supervised learning algorithms. In Proceedings of the 2001 AAAI Fall Symposium on Artificial Intelligence in the Real World.

Sinha, A., & Kaur, A. (2020). Impact of digital transformation on customer service in the banking sector. Journal of Business Research, 117, 542-551.

Yin, H., Huang, H., & Zhang, Y. (2016). Text classification based on logistic regression and naive bayes. Journal of Software, 11(5), 507-515.

Ashim Chandra Das, Md Shahin Alam Mozumder, Md Amit Hasan, Maniruzzaman Bhuiyan, Md Rasibul Islam, Md Nur Hossain, Salma Akter, & Md Imdadul Alam. (2024). MACHINE LEARNING APPROACHES FOR DEMAND FORECASTING: THE IMPACT OF CUSTOMER SATISFACTION ON PREDICTION ACCURACY. The American Journal of Engineering and Technology, 6(10), 42–53. https://doi.org/10.37547/tajet/Volume06Issue10-06

Rowsan Jahan Bhuiyan, Salma Akter, Aftab Uddin, Md Shujan Shak, Md Rasibul Islam, S M Shadul Islam Rishad, Farzana Sultana, & Md. Hasan-Or-Rashid. (2024). SENTIMENT ANALYSIS OF CUSTOMER FEEDBACK IN THE BANKING SECTOR: A COMPARATIVE STUDY OF MACHINE LEARNING MODELS. The American Journal of Engineering and Technology, 6(10), 54–66. https://doi.org/10.37547/tajet/Volume06Issue10-07