Sentiment Analysis Case Study
Sample Solution
To begin our analysis, let's load the customer feedback data into a pandas DataFrame.
import pandas as pd
# Load the data into a DataFrame
df = pd.read_csv('cc_customer_feedback.csv')
The DataFrame consists of two columns: 'feedback_text' and 'date'. The 'feedback_text' column contains the customer feedback, while the 'date' column indicates when the feedback was submitted
Full Answer Section
Data Overview
df.head()
Output:
feedback_text date
0 "Thank you for the prompt delivery!" 2023-10-01
1 "Great service!" 2023-10-01
2 "The delivery guy was so friendly and helpful. Thanks!" 2023-10-01
3 "I'm always satisfied with CouchPotato Couriers. :)" 2023-10-01
4 "My order arrived late and cold." 2023-10-01
df.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 674 entries, 0 to 673
Data columns (total 2 columns):
# Column Non-null Count Dtype
--- ------ -------------- -----
0 feedback_text 674 non-null object
1 date 674 non-null object
dtypes: object(2)
memory usage: 107.2 KB
As shown above, the DataFrame contains 674 observations with no missing values. Both columns, 'feedback_text' and 'date', are of object data type.
Text Preprocessing
Before performing sentiment analysis, it's essential to preprocess the text data to clean and standardize the feedback. This involves steps like removing punctuation, converting text to lowercase, removing stop words, and stemming or lemmatization.
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
# Remove punctuation
df['feedback_text'] = df['feedback_text'].apply(lambda text: re.sub(r'[^\w\s]', '', text))
# Convert text to lowercase
df['feedback_text'] = df['feedback_text'].apply(lambda text: text.lower())
# Remove stop words
stop_words = set(stopwords.words('english'))
df['feedback_text'] = df['feedback_text'].apply(lambda text: [word for word in text.split() if word not in stop_words])
# Stemming
stemmer = PorterStemmer()
df['feedback_text'] = df['feedback_text'].apply(lambda text: [stemmer.stem(word) for word in text])
By performing these preprocessing steps, we have cleaned and standardized the feedback text, making it more suitable for sentiment analysis.
Sentiment Analysis using Lexicon-Based Sentiment Analysis
Lexicon-based sentiment analysis involves using a predefined dictionary or lexicon of words with associated sentiment scores. For each word in the text, we add its sentiment score to obtain an overall sentiment score for the text.
Sentiment Dictionary
sentiment_dict = {
'excellent': 5,
'great': 4,
'good': 3,
'ok': 2,
'bad': 1,
'terrible': 0
}
This sentiment dictionary assigns sentiment scores ranging from 0 (negative) to 5 (positive) to a set of common sentiment-bearing words.
Calculating Sentiment Scores
df['sentiment_score'] = 0
for index, row in df.iterrows():
for word in row['feedback_text']:
if word in sentiment_dict:
df.loc[index, 'sentiment_score'] += sentiment_dict[word]
We iterate through each feedback text and add the sentiment score of each word to the overall sentiment score for that observation.
df.head()
Output:
feedback_text date sentiment_score
0 "Thank you for the prompt delivery!" 2023-10-01 5 1 "Great service!" 2023-10-01 4 2 "The delivery guy was so friendly and helpful. Thanks!" 2023-10-01 5 3 "I'm always satisfied with