Awesome Sentiment Analysis

A curated list of awesome sentiment analysis frameworks, libraries, software (by language), and of course academic papers and methods. In addition NLP lib useful in sentiment analysis. Inspired by awesome-machine-learning.

Latest Update (April 2026): Comprehensive update covering 2021-2026 advances including:

Large Language Models (GPT-4, Claude, Llama, Gemini, Mixtral, DeepSeek)
Modern Transformers (RoBERTa, DistilBERT, ALBERT, XLM-RoBERTa, ModernBERT)
Multimodal Sentiment Analysis (vision-language models)
Multilingual and Cross-lingual Methods (Brand24/MMS — NeurIPS 2023, SemEval-2026)
NEW: LLM Techniques — Prompt Engineering, CoT, RAG, LoRA/QLoRA, RLHF, DPO
NEW: LLM Evaluation & Benchmarks — SentiEval, stability metrics, model leaderboard
NEW: Explainable Sentiment Analysis — SHAP, LIME, ModernBERT-XAI, attention viz
NEW: LLM Reliability & Safety — Hallucination, bias, uncertainty quantification
Recent Benchmarks and Datasets (2023-2026)
Domain-Specific Applications (Financial, Healthcare, Social Media)

If you want to contribute to this list (please do), send me a pull request or contact me @luk_augustyniak

Libraries
Resources
Multimodal Sentiment Analysis
Multilingual and Cross-lingual Sentiment Analysis
LLM Techniques for Sentiment Analysis
LLM Evaluation & Benchmarks for Sentiment Analysis
Explainable Sentiment Analysis
- Methods & Tools
- Survey Papers
LLM Reliability & Safety in Sentiment Analysis
International Workshops
Papers
Tutorials
Books
Demos
API
Related Studies

Libraries

Modern Transformer-based Libraries (2023-2026)

Python, Hugging Face Transformers - State-of-the-art Natural Language Processing library with 215+ sentiment analysis models. Supports BERT, RoBERTa, DistilBERT, ALBERT, XLNet, and all modern transformer architectures, with simple integration for sentiment analysis using pre-trained models via an easy-to-use API.
Python, cardiffnlp/twitter-roberta-base-sentiment-latest - RoBERTa model fine-tuned for Twitter sentiment analysis, achieving state-of-the-art performance on social media text (updated 2024).
Python, ModernFinBERT - Financial sentiment analysis model based on ModernBERT architecture (released July 2025), specialized for financial texts, earnings calls, and analyst reports, and reporting improved performance over earlier FinBERT variants on multiple financial sentiment benchmarks (see model card for details).
Python, tabularisai/multilingual-sentiment-analysis - Multilingual sentiment analysis project targeting support for multiple languages; see the Hugging Face page for current availability and details.
Python, Flair - Modern NLP framework with multilingual support and state-of-the-art sentiment analysis models, particularly strong for cross-lingual tasks.
Python, Stanza - Stanford NLP library with multilingual support for 60+ languages, includes sentiment analysis capabilities.

Traditional Libraries

Python, Textlytics - set of sentiment analysis examples based on Amazon Data, SemEval, IMDB etc.
Java, Polish Sentiment Model - Sentiment analysis for polish language using SVM and BoW - within Docker.
Python, Spacy - Industrial-Strength Natural Language Processing in Python, one of the best and the fastest libs for NLP. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. Independent research has confirmed that spaCy is the fastest in the world. If your application needs to process entire web dumps, spaCy is the library you want to be using.
Python, TextBlob - TextBlob allows you to specify which algorithms you want to use under the hood of its simple API.
Python, pattern - The pattern.en module contains a fast part-of-speech tagger for English (identifies nouns, adjectives, verbs, etc. in a sentence), sentiment analysis, tools for English verb conjugation and noun singularization & pluralization, and a WordNet interface.
Java, CoreNLP by Stanford - NLP toolkit with Deeply Moving: Deep Learning for Sentiment Analysis.
R, TM - R text mining module including tm.plugin.sentiment.
Software, GATE - GATE is open source software capable of solving almost any text processing problem.
Java, LingPipe - LingPipe is tool kit for processing text using computational linguistics.
Python, NLTK - Natural Language Toolkit.
C++, MITIE - MIT Information Extraction.
Software, KNIME - KNIME® Analytics Platform is the leading open solution for data-driven innovation, helping you discover the potential hidden in your data, mine for fresh insights, or predict new futures. Our enterprise-grade, open source platform is fast to deploy, easy to scale and intuitive to learn. With more than 1000 modules, hundreds of ready-to-run examples, a comprehensive range of integrated tools, and the widest choice of advanced algorithms available, KNIME Analytics Platform is the perfect toolbox for any data scientist. Our steady course on unrestricted open source is your passport to a global community of data scientists, their expertise, and their active contributions.
Software, RapidMiner - software capable of solving almost any text processing problem. processing text using computational linguistics.
JAVA, OpenNLP - The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
Dragon Sentiment Classifier C# - Dragon Sentiment API is a C# implementation of the Naive Bayes Sentiment Classifier to analyze the sentiment of a text corpus.
sentiment: Tools for Sentiment Analysis in R - sentiment is an R package with tools for sentiment analysis including bayesian classifiers for positivity/negativity and emotion classification.
ASUM Java - Aspect and Sentiment Unification Model for Online Review Analysis.
AFINN-based sentiment analysis for Node.js - Sentiment is a Node.js module that uses the AFINN-165 wordlist and Emoji Sentiment Ranking to perform sentiment analysis on arbitrary blocks of input text.
SentiMental - Putting the Mental in Sentimental in js - Sentiment analysis tool for node.js based on the AFINN-111 wordlist. Version 1.0 introduces performance improvements making it both the first, and now fastest, AFINN backed Sentiment Analysis tool for node.

Back to Top

Aspect-based Sentiment Analysis

Twitter-sent-dnn - Deep Neural Network for Sentiment Analysis on Twitter.
Aspect Based Sentiment Analysis - System that participated in Semeval 2014 task 4: Aspect Based Sentiment Analysis.
Aspect Based Sentiment Analysis using End-to-End Memory Networks - TensorFlow implementation of Tang et al.'s EMNLP 2016 work.
Generating Reviews and Discovering Sentiment - Code for Learning to Generate Reviews and Discovering Sentiment (Alec Radford, Rafal Jozefowicz, Ilya Sutskever).
Sentiment Analysis with Social Attention - Code for the TACL paper Overcoming Language Variation in Sentiment Analysis with Social Attention
Neural Sentiment Classification - Neural Sentiment Classification aims to classify the sentiment in a document with neural models, which has been the state-of-the-art methods for sentiment classification. In this project, we provide our implementations of NSC, NSC+LA and NSC+UPA [Chen et al., 2016] in which user and product information is considered via attentions over different semantic levels.

Back to Top

Resources

Lexicons

Multidomain Sentiment Lexicons - lexicons from 10 domains based on Amazon Product Dataset extracted using method described in paper and used in paper.
AFINN - AFINN is a list of English words rated for valence with an integer between minus five (negative) and plus five (positive). The words have been manually labeled by Finn Årup Nielsen in 2009-2011.
SentiWordNet [paper] - Lexical resource based on WordNet
SentiWords - Collection of 155,000 English words with a sentiment score included between -1 and 1. Words are in the form lemma#PoS and are aligned with WordNet lists that include adjectives, nouns, verbs and adverbs.
SenticNet [API] - Words with a sentiment score included between -1 and 1.
WordStat - Context-specific sentiment analysis dictionary with categories Negative, Positive, Uncertainty, Litigiousness and Modal. This dataset is inspired from two papers, written by Loughran and McDonald (2011) and Young and Soroka (2011).
MPQA (Multi-Perspective Question Answering) Subjectivity Lexicon - The MPQA (Multi-Perspective Question Answering) Subjectivity Lexicon is a list of subjectivity clues that is part of OpinionFinder and also helps to determine text polarity.
NRC-Canada Lexicons - the web page lists various word association lexicons that capture word-sentiment, word-emotion, and word-colour associations.
Sentiment140 - One of the NRC-Canada team lexicon - the Sentiment140 Lexicon is a list of words and their associations with positive and negative sentiment. The lexicon is provides sentiment score for unigrams, bigrams and unigram-bigram pairs.
MSOL - Macquarie Semantic Orientation Lexicon.
SemEval-2015 English Twitter Sentiment Lexicon - The lexicon was used as an official test set in the SemEval-2015 shared Task #10: Subtask E. The phrases in this lexicon include at least one of these negators.
SemEval-2016 Arabic Twitter Sentiment Lexicon - The lexicon was used as an official test set in the SemEval-2016 shared Task #7: Detecting Sentiment Intensity of English and Arabic Phrases. The phrases in this lexicon include at least one of these negators.
SemEval-2016 English Twitter Mixed Polarity Lexicon - This SCL, referred to as the Sentiment Composition Lexicon of Opposing Polarity Phrases (SCL-OPP), includes phrases that have at least one positive and at least one negative word—for example, phrases such as happy accident, best winter break, couldn’t stop smiling, and lazy sundays. We refer to such phrases as opposing polarity phrases. SCL-OPP has 265 trigrams, 311 bigrams, and 602 unigrams annotated with real-valued sentiment association scores through Best-Worst scaling (aka MaxDiff).
SemEval-2016 General English Sentiment Modifiers Lexicon - Sentiment Composition Lexicon of Negators, Modals, and Adverbs (SCL-NMA). Negators, modals, and degree adverbs can significantly affect the sentiment of the words they modify. We manually annotate a set of phrases that include negators (such as no and cannot), modals (such as would have been and could), degree adverbs (such as quite and less), and their combinations. Both the phrases and their constituent content words are annotated with real-valued scores of sentiment intensity using the technique Best–Worst Scaling (aka MaxDiff), which provides reliable annotations. We refer to the resulting lexicon as Sentiment Composition Lexicon of Negators, Modals, and Adverbs (SCL-NMA). The lexicon was used as an official test set in the SemEval-2016 shared Task #7: Detecting Sentiment Intensity of English and Arabic Phrases. The objective of that task was to automatically predict sentiment intensity scores for multi-word phrases.
The NRC Valence, Arousal, and Dominance Lexicon - The NRC Valence, Arousal, and Dominance (VAD) Lexicon includes a list of more than 20,000 English words and their valence, arousal, and dominance scores. For a given word and a dimension (V/A/D), the scores range from 0 (lowest V/A/D) to 1 (highest V/A/D). The lexicon with its fine-grained real-valued scores was created by manual annotation using Best--Worst Scaling.
EmoLex NRC Word-Emotion Association Lexicon - the NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing.
WN-Affect emotion Lexicon - WordNet-Affect is an extension of WordNet Domains, including a subset of synsets suitable to represent affective concepts correlated with affective words. Similarly to our method for domain labels, we assigned to a number of WordNet synsets one or more affective labels (a-labels). In particular, the affective concepts representing emotional state are individuated by synsets marked with the a-label emotion. There are also other a-labels for those concepts representing moods, situations eliciting emotions, or emotional responses.
EmoLex NRC Word-Emotion Association Lexicon - the NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing.
Multidimensional Stance Lexicon - A Multidimensional Lexicon for Interpersonal Stancetaking. Pavalanathan, Fitzpatrick, Kiesling, and Eisenstein. ACL 2017.
WN-Affect emotion Lexicon - WordNet-Affect is an extension of WordNet Domains, including a subset of synsets suitable to represent affective concepts correlated with affective words. Similarly to our method for domain labels, we assigned to a number of WordNet synsets one or more affective labels (a-labels). In particular, the affective concepts representing emotional state are individuated by synsets marked with the a-label emotion. There are also other a-labels for those concepts representing moods, situations eliciting emotions, or emotional responses.

Back to Top

Datasets

Classic Benchmarks

Stanford Sentiment Treebank [paper] - Sentiment dataset with fine-grained sentiment annotations. The Rotten Tomatoes movie review dataset is a corpus of movie reviews used for sentiment analysis, originally collected by Pang and Lee. In their work on sentiment treebanks, Socher et al. used Amazon's Mechanical Turk to create fine-grained labels for all parsed phrases in the corpus. This competition presents a chance to benchmark your sentiment-analysis ideas on the Rotten Tomatoes dataset. You are asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others make this task very challenging.
Amazon Product Dataset - This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). The updated version of dataset - update as for 2018 is availalbe here https://nijianmo.github.io/amazon/index.html.
IMDB Movies Reviews Dataset - This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. Authors provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing.
Sentiment Labelled Sentences Dataset The dataset contains sentences labelled with positive or negative sentiment. This dataset was created for the following paper. It contains sentences labelled with positive or negative sentiment. Score is either 1 (for positive) or 0 (for negative) The sentences come from three different websites/fields: imdb.com, amazon .com, yelp.com. For each website, there exist 500 positive and 500 negative sentences. Those were selected randomly for larger datasets of reviews. We attempted to select sentences that have a clearly positive or negative connotaton, the goal was for no neutral sentences to be selected.
sentic.net - concept-level sentiment analysis, that is, performing tasks such as polarity detection and emotion recognition by leveraging on semantics and linguistics in stead of solely relying on word co-occurrence frequencies.

Recent Datasets (2023-2026)

Brand24/MMS - Massively Multilingual Sentiment Corpus [arXiv] [NeurIPS 2023] [pdf] [github] [benchmark] - The most extensive open massively multilingual corpus for training sentiment models. Accepted to NeurIPS 2023 Datasets and Benchmarks Track. Contains 79 manually selected high-quality datasets from over 350 sources covering 27 languages across 6 language families with 6,164,762 training samples. Features rich linguistic metadata including morphological, syntactic, and functional properties, plus data quality confidence scores. Presents multi-faceted sentiment classification benchmark with hundreds of experiments on different base models, training objectives, and fine-tuning strategies. Languages include: Arabic, Bulgarian, Chinese, Czech, Dutch, English, Spanish, French, Japanese, Polish, Portuguese, Russian, and 15 others. Class distribution: Positive (56.7%), Neutral (21.8%), Negative (21.6%); percentages are rounded and may not sum exactly to 100%. License: CC BY-NC 4.0.
TweetEval - Part of ACL initiative for semantic evaluation. Widely used benchmark for Twitter sentiment analysis and text classification tasks (2020-2025).
TweetFinSent - Financial sentiment dataset from Twitter. State-of-the-art models achieve 69.54% accuracy and 65.72% macro F1-score with adversarial training (2023-2024).
IMDB Deep Context Reviews - Extended version capturing movie reviews with richer contextual information from IMDB's vast user base (2024-2025).
Large-scale English Comment Dataset - Collection of 241,000+ English-language comments from various online platforms (updated 2025).
MLDoc Dataset - Multilingual document classification corpus used for cross-lingual sentiment analysis. State-of-the-art adversarial training achieves 88.48% average accuracy (2024).
PAWS-X - Paraphrase Adversaries from Word Scrambling, cross-lingual dataset achieving 86.63% accuracy with recent methods (2024).
Kurdish Medical Corpus - Specialized medical sentiment dataset for Kurdish text classification achieving 92% accuracy and 92% F1-score with multilingual BERT [paper] (Badawi, 2023).

Domain-Specific Datasets

Financial Sentiment
- Financial PhraseBank - Sentences from financial news categorized by sentiment
- TweetFinSent - Twitter financial sentiment with 69.54% SOTA accuracy (2023)
Healthcare/Mental Health
- Mental Health sentiment datasets for student wellbeing analysis (2024-2025)
- Clinical sentiment corpora for patient feedback analysis
Restaurant Reviews
- Multilingual restaurant review datasets achieving 91.9% accuracy with XLM-RSA (2024)

Back to Top

Word Embeddings

WordNet2Vec - Corpora Agnostic Word Vectorization Method based on WordNet.
GloVe [paper] - Algorithm for obtaining word vectors. Pretrained word vectors available for download.
Word2Vec by Mikolov [paper] - Google's original code and pretrained word embeddings.
Word2Vec Python lib - Google's word2vec reimplementation written in Python (cython). There are also doc2vec and topic modelling method.

Back to Top

Pretrained Language Models

Large Language Models (2023-2026)

GPT Family (OpenAI)
- GPT-4 - Advanced large language model with strong sentiment analysis capabilities, particularly for complex emotional nuances and context-dependent sentiment (2023-2024)
- GPT-4o - Multimodal version with enhanced performance (2024)
- GPT-3.5 Turbo - Cost-effective alternative for sentiment analysis tasks
Claude Family (Anthropic)
- Claude 4.5 - Advanced large language model widely used for sentiment and emotion analysis tasks, with strong performance on contemporary benchmarks
- Claude 3.5 Sonnet - High-performance model for nuanced sentiment understanding
Llama Family (Meta)
- Llama 3.1 - Open-source LLM with strong sentiment analysis performance in multilingual contexts (2024)
- Llama 2 - Widely used for fine-tuning on domain-specific sentiment tasks (2023)
Gemini (Google)
- Gemini Pro - Multimodal LLM with sentiment analysis capabilities across text and images (2024-2025)
Mixtral (Mistral AI)
- Mixtral 8x7B - Mixture-of-experts model showing competitive performance in sentiment classification (2024)
Grok (xAI)
- Grok 4 - Large language model by xAI that can be applied to sentiment and trend analysis tasks, including social media data.

Encoder-based Transformers (BERT Family)

BERT (Bidirectional Encoder Representations from Transformers)
- BERT-base, BERT-large - Original TensorFlow implementation (Google, 2018)
- Multilingual BERT (mBERT) - Supports 104 languages
- Typical performance: 87.8% accuracy on sentiment tasks
RoBERTa (Robustly Optimized BERT)
- RoBERTa-base, RoBERTa-large - Improved BERT training achieving 88.5-96.30% accuracy (Facebook AI, 2019)
- twitter-roberta-base-sentiment - Fine-tuned for social media (69.54% on TweetFinSent)
- Often outperforms BERT on sentiment benchmarks; reported F1-scores can exceed 90% and approach 98% on specific datasets and tasks (results are highly dataset- and setup-dependent)
DistilBERT
- DistilBERT - 40% smaller, 60% faster than BERT while retaining 97% of performance (Hugging Face, 2019; as reported in the original paper Sanh et al., 2019)
ALBERT (A Lite BERT)
- ALBERT - Parameter-efficient version of BERT with reduced memory consumption (Google, 2019)
XLNet
- XLNet - Generalized autoregressive pretraining outperforming BERT on several benchmarks (Google/CMU, 2019)

Multilingual Transformers

XLM-RoBERTa
- XLM-RoBERTa - Trained on 100 languages, achieves 91.9% accuracy on multilingual sentiment tasks (Facebook AI, 2020)
- Outperforms other cross-lingual approaches by 3%+ in zero-shot settings on XNLI and MLQA benchmarks (Conneau et al., 2020)
mBERT (Multilingual BERT)
- mBERT - Supports cross-lingual sentiment analysis with 92% accuracy on specialized corpora

Domain-Specific Models

Financial Sentiment
- FinBERT - BERT fine-tuned on financial texts (ProsusAI)
- ModernFinBERT - Latest financial sentiment model based on the ModernBERT architecture; reports improved performance over earlier FinBERT variants (see model card for benchmarks, accessed July 2025)
- BloombergGPT - 50B parameter LLM for financial NLP including sentiment analysis
Healthcare/Mental Health
- MentalRoBERTa - RoBERTa-base model trained on mental health-related posts from Reddit for multi-label classification of mental health topics (e.g., depression, anxiety, PTSD, self-harm, suicidal ideation); intended for research use only and not as a diagnostic tool or crisis service

Decoder-based Models

GPT Family
- GPT-2 - Decoder-based transformer (OpenAI, 2019)
- GPT-Neo, GPT-J - Open-source GPT alternatives (EleutherAI, 2021)

Hybrid Architectures (2023-2025)

BERT-LSTM Hybrid - Combining BERT contextual embeddings with BiLSTM for improved sequence dependencies
RoBERTa-GRU - Hybrid models combining transformers with recurrent networks achieving 96.77% accuracy
BERT-Attention - Multi-layered attention mechanisms with BERT for comprehensive sentiment dissection

Back to Top

Multimodal Sentiment Analysis

Multimodal sentiment analysis combines text, images, video, and audio to understand sentiment more comprehensively than text-only approaches.

Overview

Multimodal Aspect-based Sentiment Analysis (MABSA) has become a core NLP task as user-generated content increasingly includes multiple modalities (text, images, video) (2024-2025)
Vision-language models demonstrate remarkable potential by integrating visual and textual information to enhance sentiment classification accuracy
Critical challenges include capturing key information across modalities, achieving cross-modal alignment, and narrowing the semantic gap between image and text

Recent Models and Frameworks (2024-2025)

Sentiment Analysis Engine (SAE) - End-to-end multimodal model addressing challenges in capturing emotional changes across modalities [paper]
RoBERTa-AOBERT Multi-modal Model - Combines RoBERTa with aspect-oriented BERT for image-text sentiment analysis [paper]
Multimodal GRU with Directed Pairwise Cross-Modal Attention - Advanced architecture for cross-modal sentiment understanding [paper]
FDR-MSA (Feature Disentanglement and Reconstruction) - Novel approach to multimodal sentiment analysis through feature separation and reconstruction [paper]
Image-Text Sentiment Analysis with Multi-Channel Multi-Modal Joint Learning - Advanced fusion techniques for analyzing sentiment across image-text pairs [paper]

Multimodal LLMs for Sentiment Analysis

LLaVA (Large Language and Vision Assistant) - Demonstrates strong capabilities in multimodal aspect-based sentiment analysis
GPT-4V (Vision) - Multimodal GPT-4 variant for analyzing sentiment in images and text
Gemini Pro - Google's multimodal LLM with sentiment analysis across modalities

Key Research Findings (2024-2025)

Survey: "Large language models meet text-centric multimodal sentiment analysis" - comprehensive review of LLM applications to multimodal SA [paper]
Uncertainty exists about LLM adaptability to multimodal aspect-based sentiment analysis (MABSA), though recent advances show promise
Multimodal models with multi-layer feature fusion and multi-task learning achieve state-of-the-art results [paper]

Applications

Social media sentiment analysis (Twitter, Instagram, TikTok)
Video content sentiment detection
Customer feedback analysis with images and text
Product review analysis combining text and product images

Back to Top

Multilingual and Cross-lingual Sentiment Analysis

Analysis of sentiment across multiple languages and transfer of sentiment models between languages.

State-of-the-Art Models (2024-2025)

XLM-RoBERTa (XLM-R) - Large multilingual Transformer for cross-lingual and zero-shot sentiment classification [Hugging Face]
- Trained on 100 languages
- Achieves 91.9% accuracy on multilingual sentiment tasks
XLM-RSA - Novel multilingual model based on XLM-RoBERTa with Aspect-Focused Attention
- 91.9% accuracy on restaurant reviews (2024)
- Surpasses BERT (87.8%) and RoBERTa (88.5%)
Multilingual BERT (mBERT) - Pretrained on 104 Wikipedia languages [Devlin et al., 2019]
- 92% accuracy on specialized corpora (e.g., Kurdish Medical Corpus)
- Effective for cross-lingual embedding with MUSE, BiCVM, BiSkip

Recent Approaches and Techniques

Ensemble Methods - Combining transformers and LLMs for cross-lingual sentiment by translating to base language (English) [paper]
Prompt-based Fine-tuning - Language-independent sentiment analysis using prompt engineering with multilingual transformers [paper]
Adaptive Self-alignment - Bridging resource gaps with data augmentation and transfer learning [paper]
Zero-shot and Few-shot Learning - Small Multilingual Language Models (SMLMs) show superior zero-shot performance vs LLMs; LLMs demonstrate enhanced adaptive potential in few-shot settings (2024)

Performance Benchmarks

Brand24/MMS Benchmark: Large-scale multilingual benchmark with 79 datasets, 27 languages, 6.16M samples (NeurIPS 2023) [dataset] [interactive benchmark] [arXiv] [NeurIPS]
MLDoc Dataset: 88.48% average accuracy with adversarial training (2024)
PAWS-X Dataset: 86.63% accuracy for cross-lingual paraphrase detection (2024)
Restaurant Reviews: 91.9% with XLM-RSA across multiple languages (2024)

Supported Languages

Recent models support extensive language coverage including:

Brand24/MMS covers 27 languages: Arabic, Bulgarian, Chinese, Czech, Dutch, English, Spanish, French, Japanese, Polish, Portuguese, Russian, and 15 others across 6 language families
Major languages: English, Chinese, Spanish, Arabic, French, German, Italian, Portuguese, Russian, Japanese
Specialized models: Hindi, Korean, Turkish, Kurdish, Polish, and 90+ additional languages

Applications

Social media monitoring across global markets
Customer sentiment analysis for international brands
Multilingual chatbot sentiment understanding
Cross-border e-commerce review analysis

Back to Top

LLM Techniques for Sentiment Analysis

A comprehensive guide to applying Large Language Models to sentiment analysis using modern prompting, retrieval, and fine-tuning strategies.

Prompt Engineering

Designing effective prompts is the fastest route to high-accuracy sentiment classification with LLMs—no retraining required.

Techniques

Zero-Shot Prompting — Ask the model to classify sentiment directly with no examples. Surprisingly competitive on simple polarity tasks.
Few-Shot Prompting — Prepend 3–8 labeled examples. GPT-4o with few-shot + CoT achieves 84.54% F1 (text classification) and 99% F1 (sentiment analysis) [source].
Chain-of-Thought (CoT) — Instruct the model to reason step-by-step before producing a label. Boosts irony detection by up to 46% on Gemini-1.5-flash [paper].
Multi-Chain CoT — Aggregates multiple reasoning paths to resolve ambiguous sentiment cues [paper].
Domain Knowledge CoT (DK-CoT) — Injects domain knowledge (e.g. financial terminology) into the reasoning chain before classification [paper].
Self-Consistency — Sample multiple completions and take the majority vote. Reduces variance caused by stochastic decoding.
Sentiment-Controlled Prompts — Steer output emotion via prompt phrasing; few-shot with human-written examples is the most effective control strategy [paper].

Key Findings (2025-2026)

GPT-4o without CoT outperforms all tested models on zero-shot financial sentiment (GPT-4o, GPT-4.1, o3-mini comparison) [paper].
Negative prompts reduce factual accuracy and amplify bias; positive prompts increase verbosity [paper].
Accuracy can fluctuate ±10% across identical runs — prompt stability matters as much as prompt design.

Tools & Guides

In-Context Learning & Few-Shot Methods

Zero-shot SLM Ensembles — Combining multiple Small Language Models rivals proprietary LLMs at a fraction of the cost [paper].
Multi-Agent LLMs — Route different sentiment sub-tasks (coarse polarity, fine-grained emotion, irony) to specialist agents; demonstrated for social media in 2026 [paper].
LLM-Infused Multi-Module Transformer — Injects LLM representations into a smaller model for few-shot emotion-aware sentiment [paper].

Retrieval-Augmented Generation (RAG)

RAG grounds LLM sentiment predictions in external knowledge, reducing hallucinations and enabling domain-specific adaptation without retraining.

Architectures

Naive RAG — Retrieve → Read → Generate. Baseline architecture; FAISS or Elasticsearch for retrieval.
Modular RAG — Separate, swappable retrieval, reranking, and generation modules.
Self-RAG / Corrective RAG (CRAG) — Model iteratively decides when to retrieve and critiques its own output before producing a label (2025).
Agentic RAG — Embeds autonomous agents into the pipeline for planning, multi-hop retrieval, and tool use [paper].

Frameworks & Tools

LangChain — Modular LLM application framework; LCEL pipeline syntax makes sentiment pipelines composable. Introduced LangGraph for complex reasoning workflows (2025).
LlamaIndex — Data framework for LLM apps; 300+ integrations, 35% retrieval accuracy boost in 2025. Best for document-heavy sentiment pipelines.
LangGraph — Graph-based workflow orchestration for multi-step and agentic sentiment reasoning (2025).

Key Statistics (2025)

1,200+ RAG papers published on arXiv in 2024 alone vs. <100 in 2023.
63.6% of enterprise RAG deployments use GPT-based models.
RAG evaluation survey: [arXiv 2504.14891].

Parameter-Efficient Fine-Tuning (PEFT)

Fine-tune LLMs for sentiment without updating all parameters — dramatically reduces memory and compute.

Methods

LoRA (Low-Rank Adaptation) — Freezes base weights, trains low-rank decomposition matrices. ~27–30 GB training memory.
- LLaMA-3 + LoRA: 86.89% accuracy on Financial PhraseBank [paper].
- PEFT Library (Hugging Face)
QLoRA (Quantized LoRA) — Quantizes backbone to 4-bit, trains LoRA adapters. ~17–18 GB training memory. Enables 65B models on a single 48 GB GPU.
- LLaMA-3 + QLoRA: 91.2% accuracy / 0.908 F1 on IMDB, 85.6% / 0.849 F1 on Twitter [paper].
- QLoRA for Financial SA: up to 48% accuracy improvement over baseline [paper].
- QLoRA Repository
LoRAFusion — Kernel-level QLoRA optimizations targeting 4-bit inference efficiency (EuroSys 2026) [paper].
Multimodal LoRA — Applies LoRA fine-tuning to vision-language LLMs (VLCLNet) for multimodal sentiment analysis [paper].

Tutorials

Instruction Tuning & Alignment

Aligning LLMs to produce correctly-formatted sentiment labels and reliable confidence scores.

Methods

Supervised Fine-Tuning (SFT) — Trains on (instruction, sentiment-label) pairs to steer output format.
RLHF (Reinforcement Learning from Human Feedback) — PPO-based training. SOTA on complex tasks; only 8% unsafe outputs under adversarial testing.
DPO (Direct Preference Optimization) — Simpler, no reward model needed. Outperforms RLHF for sentiment-controlled generation. [paper]
RLAIF — Replaces human annotators with an AI judge; scales sentiment preference data cheaply.

Key Survey

Comprehensive Survey of LLM Alignment: RLHF, RLAIF, PPO, DPO and More (2024)

Back to Top

LLM Evaluation & Benchmarks for Sentiment Analysis

Benchmark Frameworks

SentiEval — Proposed comprehensive LLM evaluation benchmark covering 13 SA task types on 26 datasets. Highlights gap between LLM and fine-tuned SLM on complex tasks. [paper]
TruthfulQA — Tests whether LLMs produce truthful answers; used to cross-reference hallucination rates in sentiment contexts.
HallucinationEval — Dedicated benchmark for measuring LLM hallucination across NLP tasks including sentiment.
SemEval-2025 Task 10 — Multilingual characterization of subjectivity in news articles [proceedings].
SemEval-2026 Task 3 — Dimensional Aspect-Based Sentiment Analysis on Customer Reviews (valence-arousal framework). Co-located with ACL 2026, San Diego. [call for participation]

Evaluation Metrics

Metric	Use Case
Accuracy / F1	Standard classification performance
Macro-F1	Class-balanced evaluation (important for skewed SA datasets)
TARr@N / TARa@N	Inference stability — measures output variance across N identical runs
Confidence Calibration	Whether model confidence correlates with actual accuracy
ROUGE / BLEU	For rationale/explanation quality in generative SA
Perplexity	Language model fit on sentiment corpora

Model Performance Leaderboard (2025-2026)

Model	Overall SA Accuracy	Notes
GPT-4o (few-shot + CoT)	~99% F1	Best on structured tasks [NAACL 2024]
Claude 3.7	79%	Best overall accuracy in 2025 benchmark
Claude 4.5	75% avg / 82% emotion detection	2025 benchmark
GPT-4.1	~75–78%	Varies by domain
GPT-4o (zero-shot)	Best on financial SA	No CoT outperforms CoT variants [paper]
DeepSeek V3	70%	Competitive open-weight model
LLaMA-3 + QLoRA	91.2% on IMDB / 85.6% Twitter	Fine-tuned, not zero-shot

Explainable Sentiment Analysis Dataset

Explainable Sentiment Analysis Dataset — Released February 2025 on IEEE DataPort. Includes Amazon Reviews and IMDB, annotated with ground-truth sentiment labels, model predictions (GPT-4o, GPT-4o-mini, DeepSeek-R1), and fine-grained classifications for explainability evaluation.

Back to Top

Explainable Sentiment Analysis

Understanding why a model produced a sentiment label — essential for production reliability, regulatory compliance, and debugging.

Methods & Tools

Post-hoc Explanations

SHAP (SHapley Additive Explanations)
- Provides both global (feature importance across dataset) and local (single-prediction) explanations.
- Applied layer-by-layer across LLM components (embedding → encoder → decoder → attention) for granular sentiment attribution.
- SHAP Library
- Recent benchmark: SHAP outperforms LIME on consistency and faithfulness [paper].
LIME (Local Interpretable Model-Agnostic Explanations)
- Perturbs input text and fits a local surrogate model to explain individual predictions.
- Widely used to explain chatbot responses and customer sentiment decisions.
- LIME Library
- Limitation: local explanations only; no global view.
ModernBERT-XAI — Fine-tunes ModernBERT on IMDb and integrates SHAP + LIME for interpretable sentiment analysis. Released December 2025. [paper]
Attention Visualization
- Maps which tokens most influenced the sentiment decision.
- Sentence-level attention visualization for LLMs: [NAACL 2025 Demo].

Causal and Counterfactual Methods

Counterfactual Testing — Generate minimally-modified inputs that flip the sentiment label to identify causal features.
Causal Reasoning — Grounding predictions in causal graphs reduces both bias and hallucination.

Survey Papers

LLMs for Explainable AI: A Comprehensive Survey — April 2025. Covers how LLMs can themselves serve as explainers.
Integration of XAI Techniques with LLMs for Enhanced Interpretability for Sentiment Analysis — March 2025.
SHAP and LIME: A Perspective on XAI Methods — 2025.

Practical Guides

Back to Top

LLM Reliability & Safety in Sentiment Analysis

Critical considerations before deploying LLM-based sentiment classifiers in production.

Hallucination

Model	Hallucination Rate	Source
GPT-4	~28.6%	medical systematic reviews
GPT-3.5	~39.6%	medical systematic reviews
Bard	~91.4%	medical systematic reviews
Sentiment SA tasks	Lower	Pre-defined labels constrain generation

Mitigation Strategies:

RAG — Grounds predictions in retrieved evidence.
Multi-LLM Consensus — Vote across 3+ models; agreement increases reliability.
Knowledge Graphs — Inject structured facts at pretraining or inference time.
Self-Consistency Decoding — Sample multiple completions, take majority.
Chain-of-Thought + Verification — Have model verify its own reasoning step.

Survey: Large Language Models Hallucination: A Comprehensive Survey (October 2025)

Bias & Fairness

Five key bias-detection metrics applied to sentiment models:

Counterfactual Testing — Swap demographic attributes; check if sentiment label changes.
Stereotype Detection — Probe for systematically biased associations.
Sentiment & Toxicity Analysis — Measure polarity asymmetry across demographic groups.
Acceptance/Rejection Rates — Track differential response rates per group.
Embedding-Based Metrics — Measure cosine distance between group-specific embeddings.

Survey: Bias in Large Language Models: Origin, Evaluation, and Mitigation (November 2024) Paper: Towards Trustworthy LLMs: Debiasing and Dehallucinating (2024)

Uncertainty & Variability

Model Variability Problem (MVP) — Identical prompts produce different sentiment labels across runs (up to ±10% accuracy).
Epistemic Uncertainty — Model uncertainty due to lack of knowledge; mitigated by larger training sets or RAG.
Aleatoric Uncertainty — Irreducible noise in ambiguous or contradictory sentiment texts.
Stability Metrics — TARr@N and TARa@N measure inference stability across N runs [paper].

Key Paper: Model Uncertainty and Variability in LLM-Based Sentiment Analysis: Challenges, Mitigation Strategies, and the Role of Explainability — Frontiers in AI, 2025.

Domain Instability

LLMs exhibit 12–18% higher accuracy degradation on specialized domains (finance, healthcare, legal) vs. general text. Key causes:

Technical jargon misinterpreted as neutral
Sarcasm and irony patterns differ by domain
Context-dependent sentiment cues absent from training data

Mitigation: Domain-specific fine-tuning (FinBERT, QF-LLM), knowledge-augmented prompting, domain-aware RAG.

Paper: QF-LLM: Financial Sentiment Analysis with Quantized LLM (2025)

Back to Top

International Workshops

SemEval Challenges International Workshop on Semantic Evaluation [site]
SemEval [2014] [2015] [2016] [2017] [2018] -- New challenge for 2018 year, waiting for confirmation about tasks etc.

Back to Top

Papers

Language Models

Sentiment Analysis in the Era of Large Language Models: A Reality Check -- authors evaluate performance across 13 tasks on 26 datasets and compare the large language models (LLMs) such as ChatGPT with the results against small language models (SLMs) trained on domain-specific datasets, and highlight the limitations of current evaluation practices in assessing LLMs’ SA abilities.
XLNet: Generalized Autoregressive Pretraining for Language Understanding -- is a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation
How to Fine-Tune BERT for Text Classification? -- authors conduct exhaustive experiments to investigate different fine-tuning methods of BERT (Bidirectional Encoder Representations from Transformers) on text classification task and provide a general solution for BERT fine-tuning

Prompt Engineering & LLM Methods (2025-2026)

Enhancing Sentiment Classification and Irony Detection through Advanced Prompt Engineering Techniques — Evaluates few-shot, CoT, and self-consistency prompting; CoT boosts irony detection by 46% on Gemini-1.5-flash. arXiv January 2026.
Evaluating Prompt Engineering Strategies for Sentiment Control in AI-Generated Texts — Compares zero-shot, few-shot, and CoT for emotion steering; few-shot with human examples is most effective. arXiv February 2026.
Enhancing Granular Sentiment Classification with Chain-of-Thought Prompting in Large Language Models — Focuses on fine-grained (multi-class) sentiment with CoT prompting. arXiv May 2025.
Prompt Sentiment: The Catalyst for LLM Change — Shows prompt sentiment itself influences model accuracy: negative prompts reduce factual accuracy, positive prompts increase verbosity. arXiv March 2025.
Reasoning or Overthinking: Evaluating LLMs on Financial Sentiment Analysis — GPT-4o, GPT-4.1, o3-mini comparison; GPT-4o without CoT achieves best performance. ACM AI in Finance 2025.
Leveraging LLM as News Sentiment Predictor: A Knowledge-Enhanced Strategy — Domain Knowledge CoT (DK-CoT) improves financial news sentiment prediction with GLM. Springer Discover Computing 2025.
Designing Multi-Agent LLMs for Fine-Grained User Sentiment Detection on Social Media — Routes sub-tasks to specialist agents; PLOS ONE February 2026.
Sentiment Analysis in the Era of Large Language Models: A Reality Check — Benchmarks LLMs across 26 datasets, 13 tasks; introduces SentiEval. NAACL Findings 2024.
Zero- and Few-Shot Prompting with LLMs: A Comparative Study with Fine-tuned Models for Bangla Sentiment Analysis — LREC-COLING 2024; compares Flan-T5, GPT-4, Bloomz.
Exploring Zero-Shot SLM Ensembles as an Alternative to LLMs for Sentiment Analysis — Small Language Model ensembles rival proprietary LLMs at lower cost. Information Fusion 2025.
LLM-Infused Multi-Module Transformer for Emotion-Aware Sentiment Analysis in Few-Shot Scenarios — Injects LLM representations into a smaller model for efficient few-shot SA. Information Fusion 2025.
Multi-Chain of Thought Prompt Learning for Aspect-Based Sentiment Analysis — Multi-path reasoning patterns for nuanced ABSA. Applied Sciences 2025.
QF-LLM: Financial Sentiment Analysis with Quantized LLM — Quantization for cost-effective financial SA deployment. ACM AIDF 2025.

Parameter-Efficient Fine-Tuning (2025-2026)

Parameter-Efficient Fine-Tuning of LLaMA Models for Financial Sentiment Classification — LLaMA-3 + LoRA achieves 86.89% accuracy on Financial PhraseBank. Cluster Computing 2025.
Sentiment Analysis with LLMs: Evaluating QLoRA Fine-Tuning, Instruction Strategies, and Prompt Sensitivity — QLoRA delivers up to 48% accuracy improvement. 2025.
Benchmarking QLoRA-Fine-Tuned LLaMA and DeepSeek Models for Sentiment Analysis — LLaMA-3 QLoRA: 91.2% IMDB, 85.6% Twitter F1. CSAI 2026.
Multimodal Large Language Model with LoRA Fine-Tuning for Multimodal Sentiment Analysis — VLCLNet applies LoRA to vision-language LLMs for MABSA. ACM TIST 2025.
LoRAFusion: Efficient LoRA Fine-Tuning for LLMs — Kernel-level optimizations for 4-bit QLoRA (EuroSys 2026).
Analyzing LLaMA3 Performance on Classification Using LoRA and QLoRA Techniques — Comprehensive LoRA vs QLoRA ablation study. MDPI Applied Sciences March 2025.
QLoRA: Efficient Finetuning of Quantized LLMs — Original QLoRA paper enabling 65B parameter fine-tuning on a single 48 GB GPU.

Explainability & Interpretability (2025-2026)

ModernBERT-XAI: Sentiment Analysis with Layer-Wise Learning and SHAP-LIME Interpretability — Fine-tunes ModernBERT on IMDb with integrated SHAP + LIME. December 2025.
Integration of Explainable AI Techniques with LLMs for Enhanced Interpretability for Sentiment Analysis — Applies SHAP layer-by-layer across LLM components. arXiv March 2025.
LLMs for Explainable AI: A Comprehensive Survey — LLMs as explainers; covers XAI methods across NLP tasks. arXiv April 2025.
A Perspective on Explainable AI Methods: SHAP and LIME — Comparative evaluation of SHAP vs LIME faithfulness and consistency. Advanced Intelligent Systems 2025.

Reliability, Safety & Evaluation (2025-2026)

Model Uncertainty and Variability in LLM-Based Sentiment Analysis: Challenges, Mitigation Strategies, and the Role of Explainability — Introduces TARr@N stability metrics; Frontiers in AI 2025.
Large Language Models Hallucination: A Comprehensive Survey — GPT-4: 28.6%, GPT-3.5: 39.6% hallucination rates. arXiv October 2025.
Bias in Large Language Models: Origin, Evaluation, and Mitigation — Five bias-detection metrics applicable to sentiment models. arXiv November 2024.
Towards Trustworthy LLMs: Debiasing and Dehallucinating — Causal reasoning reduces both bias and hallucinations. AIR 2024.
Comparing LLMs and Human Annotators in Latent Content Analysis of Sentiment, Political Leaning, Emotional Intensity and Sarcasm — Multi-model comparison (GPT-3.5, GPT-4, GPT-4o, Llama-3.1, Mixtral). Scientific Reports 2025.
Evaluating LLMs for Sentiment Analysis on Vaccine Posts from Social Media — Healthcare/public health domain evaluation. JMIR Formative Research 2025.

RAG & Retrieval Methods (2024-2026)

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG — Covers Self-RAG, Corrective RAG, and agent-driven retrieval pipelines for LLM applications. arXiv January 2025.
RAG Evaluation in the Era of LLMs: A Comprehensive Survey — Reviews factual accuracy, safety, and computational efficiency metrics for RAG. arXiv April 2025.
Retrieval-Augmented Generation for Large Language Models: A Survey — The foundational RAG survey (2,000+ citations); covers naive, advanced, and modular RAG.

Transformer Models and RoBERTa (2023-2025)

Improving sentiment classification using a RoBERTa-based hybrid model - Hybrid RoBERTa-GRU model achieving superior performance on sentiment classification (2023)
Advancing Sentiment Analysis: Evaluating RoBERTa against Traditional and Deep Learning Models - Comprehensive comparison showing RoBERTa achieving 96.30% accuracy and F1-scores of 98.11% on specific sentiment benchmarks (see paper for datasets and experimental setup) (2024)
Exploring transformer models for sentiment classification: A comparison of BERT, RoBERTa, ALBERT, DistilBERT, and XLNet - Comparative study of transformer models with RoBERTa consistently outperforming others (2024)
A BERT–LSTM–Attention Framework for Robust Multi-Class Sentiment Analysis on Twitter Data - Hybrid architecture combining BERT with BiLSTM and attention mechanisms for Twitter sentiment (2024)
Emotion-Aware RoBERTa enhanced with emotion-specific attention and TF-IDF gating for fine-grained emotion recognition - Enhanced RoBERTa achieving 96.77% accuracy and weighted F1-score of 0.97 (2025)
Generalizing sentiment analysis: a review of progress, challenges, and emerging directions - Comprehensive review covering advances from traditional ML to Transformers and hybrid architectures (2025)

Multimodal Sentiment Analysis (2024-2025)

Large language models meet text-centric multimodal sentiment analysis: a survey - Comprehensive survey on applying LLMs to multimodal sentiment analysis (2024)
Whether Current Large Language Models is Suitable for Multimodal Aspect-based Sentiment Analysis? - Investigation of LLM adaptability to MABSA tasks including Llama2, LLaVA, and ChatGPT (2024)
Multimodal sentiment analysis based on multi-layer feature fusion and multi-task learning - Novel approach using multi-layer feature fusion for multimodal SA (2025)
FDR-MSA: Enhancing multimodal sentiment analysis through feature disentanglement and reconstruction - Advanced feature processing for multimodal sentiment (2024)
SAE: A Multimodal Sentiment Analysis Large Language Model - End-to-end LLM for multimodal sentiment analysis (2025)

Multilingual and Cross-lingual Sentiment Analysis (2024-2025)

Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark - Łukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, Tomasz Kajdanowicz. NeurIPS 2023 Datasets and Benchmarks Track. [NeurIPS proceedings] [PDF] - The most extensive open massively multilingual corpus with 79 high-quality datasets covering 27 languages (6 language families) and 6.16M training samples. Presents multi-faceted sentiment classification benchmark summarizing hundreds of experiments on different base models, training objectives, dataset collections, and fine-tuning strategies. Addresses challenges in multilingual sentiment analysis with rich linguistic metadata including morphological, syntactic, and functional properties. Available on HuggingFace with interactive benchmark. GitHub
A multimodal approach to cross-lingual sentiment analysis with ensemble of transformer and LLM - Ensemble method combining transformers and LLMs for cross-lingual sentiment (2024)
Prompt-based fine-tuning with multilingual transformers for language-independent sentiment analysis - Novel prompt-based approach for multilingual sentiment (2025)
Bridging resource gaps in cross-lingual sentiment analysis: adaptive self-alignment with data augmentation and transfer learning - Addressing resource constraints in cross-lingual SA (2024)
Multilingual sentiment analysis in restaurant reviews using aspect focused learning - XLM-RSA achieving 91.9% accuracy on multilingual restaurant reviews (2025)
The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models - Comparative study of LLMs vs SMLMs in cross-lingual settings (2024)

Aspect-Based Sentiment Analysis (2024-2025)

A systematic review of aspect-based sentiment analysis: domains, methods, and trends - Comprehensive systematic review of ABSA methods and trends (2024)
Large-Scale Aspect-Based Sentiment Analysis with Reasoning-Infused LLMs - Incorporating reasoning techniques into LLMs for ABSA (2025)
Aspect-based Sentiment Analysis via Synthetic Image Generation - Novel approach generating sentimental images for ABSA (EMNLP 2025)
Unifying aspect-based sentiment analysis BERT and multi-layered graph convolutional networks for comprehensive sentiment dissection - Multi-layered Enhanced Graph Convolutional Networks (MLEGCN) for ABSA (2024)
Triple dimensional psychology knowledge encouraging graph attention networks to exploit aspect-based sentiment analysis - Psychology-informed graph attention networks (VADGAT) for ABSA (2025)
Local interpretation of deep learning models for Aspect-Based Sentiment Analysis - Addressing interpretability in deep learning ABSA models (2025)

Domain-Specific Applications (2024-2025)

An overview of model uncertainty and variability in LLM-based sentiment analysis: challenges, mitigation strategies, and the role of explainability - LLM challenges in specialized domains (finance, healthcare, legal) (2025)
Analyzing student mental health with RoBERTa-Large: a sentiment analysis and data analytics approach - Healthcare application for mental health monitoring (2025)
Comparative analysis of transformer models for sentiment classification of UK CBDC discourse on X - Financial sentiment analysis on social media (2025)

Neural Network based Models

Convolutional Neural Networks for Sentence Classification - convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks.

Lexicon-based Ensembles

Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis - Comparison of several lexicon, supervised learning and ensemble methods for sentiment analysis.
Simpler is better? Lexicon-based ensemble sentiment classification beats supervised methods - lexicon-based ensemble can beat supervised learning.

Back to Top

Tutorials

GPT2 For Text Classification using Hugging Face Transformers - GPT model application for sentiment analysis task
SAS2015 iPython Notebook brief introduction to Sentiment Analysis in Python @ Sentiment Analysis Symposium 2015. Scikit-learn + BoW + SemEval Data.
LingPipe Sentiment - This tutorial covers assigning sentiment to movie reviews using language models. There are many other approaches to sentiment. One we use fairly often is sentence based sentiment with a logistic regression classifier. Contact us if you need more information. For movie reviews we focus on two types of classification problem: Subjective (opinion) vs. Objective (fact) sentences Positive (favorable) vs. Negative (unfavorable) movie reviews
Stanford's cs224d lectures on Deep Learning for Natural Language Processing - course provided by Richard Socher.

Back to Top

Books

Sentiment Analysis: mining sentiments, opinions, and emotions - This book is suitable for students, researchers, and practitioners interested in natural language processing in general, and sentiment analysis, opinion mining, emotion analysis, debate analysis, and intention mining in specific. Lecturers can use the book in class.

Back to Top

Demos

Sentiment TreeBank - demo of Stanford's Treebank Sentiment Analysis
NLTK Demo
GATE Brexit Analyzer
Vivekn's sentiment model and web example
FormTitan

Back to Top

API

Back to Top

Related Studies

Benchmarking Multimodal Sentiment Analysis - multimodal sentiment analysis and emotion detection (text, audio and video).

Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Awesome Sentiment Analysis

Table of Contents

Libraries

Modern Transformer-based Libraries (2023-2026)

Traditional Libraries

Aspect-based Sentiment Analysis

Resources

Lexicons

Datasets

Classic Benchmarks

Recent Datasets (2023-2026)

Domain-Specific Datasets

Word Embeddings

Pretrained Language Models

Large Language Models (2023-2026)

Encoder-based Transformers (BERT Family)

Multilingual Transformers

Domain-Specific Models

Decoder-based Models

Hybrid Architectures (2023-2025)

Multimodal Sentiment Analysis

Overview

Recent Models and Frameworks (2024-2025)

Multimodal LLMs for Sentiment Analysis

Key Research Findings (2024-2025)

Applications

Multilingual and Cross-lingual Sentiment Analysis

State-of-the-Art Models (2024-2025)

Recent Approaches and Techniques

Performance Benchmarks

Supported Languages

Applications

LLM Techniques for Sentiment Analysis

Prompt Engineering

Techniques

Key Findings (2025-2026)

Tools & Guides

In-Context Learning & Few-Shot Methods

Retrieval-Augmented Generation (RAG)

Architectures

Frameworks & Tools

Key Statistics (2025)

Parameter-Efficient Fine-Tuning (PEFT)

Methods

Tutorials

Instruction Tuning & Alignment

Methods

Key Survey

LLM Evaluation & Benchmarks for Sentiment Analysis

Benchmark Frameworks

Evaluation Metrics

Model Performance Leaderboard (2025-2026)

Explainable Sentiment Analysis Dataset

Explainable Sentiment Analysis

Methods & Tools

Post-hoc Explanations

Causal and Counterfactual Methods

Survey Papers

Practical Guides

LLM Reliability & Safety in Sentiment Analysis

Hallucination

Bias & Fairness

Uncertainty & Variability

Domain Instability

International Workshops

Papers

Language Models

Prompt Engineering & LLM Methods (2025-2026)

Parameter-Efficient Fine-Tuning (2025-2026)

Explainability & Interpretability (2025-2026)

Reliability, Safety & Evaluation (2025-2026)

RAG & Retrieval Methods (2024-2026)

Transformer Models and RoBERTa (2023-2025)

Multimodal Sentiment Analysis (2024-2025)

Multilingual and Cross-lingual Sentiment Analysis (2024-2025)

Aspect-Based Sentiment Analysis (2024-2025)

Packages