Comparative Performance of Large Language Models for Sentiment Analysis of Consumer Feedback in the Banking Sector: Accuracy, Efficiency, and Practical Deployment
In the rapidly evolving banking sector, understanding consumer sentiment is crucial for informed decision-making and enhancing customer experiences. This study investigates the efficacy of large language models (LLMs) for sentiment analysis of consumer feedback within the banking domain. We systematically evaluate five state-of-the-art LLMs—DistilBERT, BERT-base, RoBERTa-base, GPT-3.5, and GPT-4—on a domain-specific dataset of 10,000 consumer feedback entries collected from online banking forums and customer reviews. Each model is rigorously assessed in terms of accuracy, precision, recall, F1-score, and computational cost. Our findings reveal that GPT-4 delivers the highest accuracy and performance across all evaluation metrics but requires significant computational resources, making it less feasible for real-time deployment in cost-sensitive scenarios. In contrast, RoBERTa-base and BERT-base strike a balance between accuracy and resource efficiency, while DistilBERT emerges as the most cost-effective and computationally efficient solution. These results highlight the trade-offs between performance and practical deployment considerations in real-world banking environments. The study underscores the transformative potential of LLM-driven sentiment analysis in the financial sector, offering valuable insights for banks and financial institutions aiming to leverage AI for strategic decision-making and customer satisfaction improvements.