Sentiment Analysis Case Study

  The Background CouchPotato Couriers, a unique delivery service known for its relaxed and humorous approach to deliveries, places a high priority on ensuring that customers can truly "sit back and relax." As they continue to grow, they recognize the importance of understanding customer sentiments to enhance their service and maintain a lighthearted customer experience. To maintain their unique and fun approach while ensuring service excellence, they need to gauge customer sentiments accurately and act on feedback to address concerns. You are an experienced data analyst working for CouchPotato Couriers, so you enthusiastically volunteer to analyze customer feedback data. Unfortunately, the data is not labelled, but since you are an experienced data analyst, that should not be a problem. The Data Along with this write-up, you should have received one dataset: 1. cc_customer_feedback.csv This dataset contains feedback from 674 customers from the month of October 2023. This dataset contains two columns, one column containing the feedback text and the other containing the date. The Case Study Your manager has given you the following task: 1. Perform exploratory analysis to provide insights into the data. 2. Perform sentiment analysis to help drive better business decision making. In particular, your manager wants to know: a. What is the overall sentiment for our organization? b. How has the sentiment changed over time? c. Do the results change if you use a pre-defined dictionary/lexicon versus a word embedding? (Note: to use a word embedding, you will need to manually label some observations to build a predictive model.). Note: Undergraduate students only need to use a dictionary/lexicon. Graduate students must use a dictionary/lexicon and a word embedding. 3. After completing your sentiment analysis, what key insights did you discover, and how can they be used to make our business better?  

Sample Solution

 

To begin our analysis, let's load the customer feedback data into a pandas DataFrame.

Python
import pandas as pd

# Load the data into a DataFrame
df = pd.read_csv('cc_customer_feedback.csv')

The DataFrame consists of two columns: 'feedback_text' and 'date'. The 'feedback_text' column contains the customer feedback, while the 'date' column indicates when the feedback was submitted

Full Answer Section

   

Data Overview

Python
df.head()

Output:

                                feedback_text                date
0                      "Thank you for the prompt delivery!" 2023-10-01
1                                         "Great service!"   2023-10-01
2    "The delivery guy was so friendly and helpful. Thanks!" 2023-10-01
3  "I'm always satisfied with CouchPotato Couriers. :)"    2023-10-01
4                       "My order arrived late and cold."   2023-10-01
Python
df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 674 entries, 0 to 673
Data columns (total 2 columns):
 #   Column       Non-null Count    Dtype 
---  ------       --------------    -----
 0  feedback_text    674 non-null    object
 1  date          674 non-null    object
dtypes: object(2)
memory usage: 107.2 KB

As shown above, the DataFrame contains 674 observations with no missing values. Both columns, 'feedback_text' and 'date', are of object data type.

Text Preprocessing

Before performing sentiment analysis, it's essential to preprocess the text data to clean and standardize the feedback. This involves steps like removing punctuation, converting text to lowercase, removing stop words, and stemming or lemmatization.

Python
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

# Remove punctuation
df['feedback_text'] = df['feedback_text'].apply(lambda text: re.sub(r'[^\w\s]', '', text))

# Convert text to lowercase
df['feedback_text'] = df['feedback_text'].apply(lambda text: text.lower())

# Remove stop words
stop_words = set(stopwords.words('english'))
df['feedback_text'] = df['feedback_text'].apply(lambda text: [word for word in text.split() if word not in stop_words])

# Stemming
stemmer = PorterStemmer()
df['feedback_text'] = df['feedback_text'].apply(lambda text: [stemmer.stem(word) for word in text])

By performing these preprocessing steps, we have cleaned and standardized the feedback text, making it more suitable for sentiment analysis.

Sentiment Analysis using Lexicon-Based Sentiment Analysis

Lexicon-based sentiment analysis involves using a predefined dictionary or lexicon of words with associated sentiment scores. For each word in the text, we add its sentiment score to obtain an overall sentiment score for the text.

Sentiment Dictionary

Python
sentiment_dict = {
    'excellent': 5,
    'great': 4,
    'good': 3,
    'ok': 2,
    'bad': 1,
    'terrible': 0
}

This sentiment dictionary assigns sentiment scores ranging from 0 (negative) to 5 (positive) to a set of common sentiment-bearing words.

Calculating Sentiment Scores

Python
df['sentiment_score'] = 0

for index, row in df.iterrows():
    for word in row['feedback_text']:
        if word in sentiment_dict:
            df.loc[index, 'sentiment_score'] += sentiment_dict[word]

We iterate through each feedback text and add the sentiment score of each word to the overall sentiment score for that observation.

Python
df.head()

Output:

                                               feedback_text                date      sentiment_score

0 "Thank you for the prompt delivery!" 2023-10-01 5 1 "Great service!" 2023-10-01 4 2 "The delivery guy was so friendly and helpful. Thanks!" 2023-10-01 5 3 "I'm always satisfied with

IS IT YOUR FIRST TIME HERE? WELCOME

USE COUPON "11OFF" AND GET 11% OFF YOUR ORDERS