Machine Learning Project 7: Best ChatGPT Reviews Analysis

Table of Contents


Hey there, everyone! Welcome to our cozy corner of the internet where we’re delving deep into the realm of ChatGPT reviews.

That’s right, we’re exploring what people are saying about the wonders of ChatGPT and how it fares in the realm of AI chatbots. Grab a drink, get settled in, and let’s jump right in!

Why Does It Matter?

Ever been curious about how ChatGPT is holding up in the vast online world? Well, we’ve got the inside scoop. We’re breaking it down:

  • Average Perplexity: Sounds fancy, right? It’s all about how unpredictable the chatbot can be. More surprises, more excitement!
  • Burstiness Scores: You know that friend who talks a mile a minute and then suddenly goes silent? That’s burstiness. We’re seeing if ChatGPT has that same vibe.
  • Predictability: How easy is it to guess what ChatGPT will say next? Spoiler alert: it’s not always what you anticipate! ๐ŸŒŸ๐Ÿ“‹๐Ÿ”๐ŸŒŸ

In-Depth Analysis of ChatGPT Reviews

Importing Libraries ๐Ÿ“ฅ

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
import string
import emoji
import nltk
import spacy
from tqdm import tqdm
from nltk.corpus import stopwords
from gensim.models import Word2Vec
from tensorflow.keras.preprocessing.text import Tokenizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.preprocessing import MinMaxScaler, LabelEncoder, OrdinalEncoder
Importing Data

Dataset Link:

df = pd.read_csv('/kaggle/input/chatgpt-reviews-daily-updated/chatgpt_reviews.csv')
df = df.sample(20000)
19863d7f901e2-3032-4e7c-9ba0-cd8d1443b32dArshavir Mandegarkinda weird and scary but fascinating at the s…501.2024.0732024-03-23 03:08:401.2024.073
67801e05d7901-c968-4f5a-bd00-fbaf4a87e355ayan khalidgreat app, always been helpin me out.501.0.00392023-09-04 16:39:371.0.0039
7130876e6efef-5daa-4a2f-8bc8-aa698f9d5048Archana Rajputkep it in well done50NaN2023-12-17 16:16:52NaN
1227629a1d150d-ff0b-4686-b017-499d46047db3Rakibul HasanGood app50NaN2023-12-08 17:03:22NaN
50227cf94f0c6-186a-4787-8bee-bcd0b5216345Aasim SaquafiSometime it doesn’t work101.2023.3132024-01-05 15:41:411.2023.313

Header View of ChatGPT Reviews

19863kinda weird and scary but fascinating at the s…
67801great app, always been helpin me out.
71308kep it in well done
122762Good app
50227Sometime it doesn’t work

Overview of Data

Distribution of Rows and Columns


(20000, 8)

Data Information
<class 'pandas.core.frame.DataFrame'>
Index: 20000 entries, 19863 to 108190
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   reviewId              20000 non-null  object
 1   userName              19999 non-null  object
 2   content               19998 non-null  object
 3   score                 20000 non-null  int64 
 4   thumbsUpCount         20000 non-null  int64 
 5   reviewCreatedVersion  18313 non-null  object
 6   at                    20000 non-null  object
 7   appVersion            18313 non-null  object
dtypes: int64(2), object(6)
memory usage: 1.4+ MB

Data Description


Sum of Null Values

reviewId                   0
userName                   1
content                    2
score                      0
thumbsUpCount              0
reviewCreatedVersion    1687
at                         0
appVersion              1687
dtype: int64
Data Cleaning ๐Ÿงน

Dropping Duplicates

df = df.drop_duplicates()

Dropping Rows with Null Values

df = df.dropna()

Revised Data Shape

(18214, 8)

Score Value Counts

5    14008
4     2218
1      969
3      714
2      305
Name: count, dtype: int64

ThumbsUpCount Value Counts

0      17444
1        420
2        118
3         47
5         24
128        1
338        1
126        1
47         1
152        1
Name: count, Length: 66, dtype: int64

Visualizing Data

Histogram of Numerical Columns

df.hist(figsize=(15, 5))

Scatter Plot of Numerical Columns

sns.scatterplot(x='score', y='thumbsUpCount', data=df)
Feature Engineering with Machine Learning

Selecting Relevant Features

df.drop(columns=['reviewId', 'userName', 'at'], inplace=True)
19863kinda weird and scary but fascinating at the s…501.2024.0731.2024.073
67801great app, always been helpin me out.501.0.00391.0.0039
50227Sometime it doesn’t work101.2023.3131.2023.313
1650thank you chatgpt501.2024.1221.2024.122
63566It is the best chat AI ..๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos401.2023.2631.2023.263

Encoding Columns

le = LabelEncoder()
df['reviewCreatedVersion'] = le.fit_transform(df['reviewCreatedVersion'])
oe = OrdinalEncoder()
df['appVersion'] = oe.fit_transform(df[['appVersion']])
19863kinda weird and scary but fascinating at the s…504141.0
67801great app, always been helpin me out.5088.0
50227Sometime it doesn’t work102222.0
1650thank you chatgpt504848.0
63566It is the best chat AI ..๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos401212.0

Applying NLP to Review Content

19863     kinda weird and scary but fascinating at the s...
67801                 great app, always been helpin me out.
50227                              Sometime it doesn't work
1650                                      thank you chatgpt
63566     It is the best chat AI ..๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos
38261     this app is very helpful for containing study ...
78273                                         it's the Best
118182                                    mind blowing... ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Converting to Lowercase

df['content'] = df['content'].str.lower()
19863     kinda weird and scary but fascinating at the s...
67801                 great app, always been helpin me out.
50227                              sometime it doesn't work
1650                                      thank you chatgpt
63566     it is the best chat ai ..๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos
38261     this app is very helpful for containing study ...
78273                                         it's the best
118182                                    mind blowing... ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Removing HTML Tags

def remove_html_tags(text):
    clean_text = re.sub('<.*?>', '', text)
    return clean_text
df['content'] = df['content'].apply(remove_html_tags)
19863     kinda weird and scary but fascinating at the s...
67801                 great app, always been helpin me out.
50227                              sometime it doesn't work
1650                                      thank you chatgpt
63566     it is the best chat ai ..๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos
38261     this app is very helpful for containing study ...
78273                                         it's the best
118182                                    mind blowing... ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Removing URLs

def remove_urls(text):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
    clean_text = re.sub(url_pattern, '', text)
    return clean_text
df['content'] = df['content'].apply(remove_urls)
19863     kinda weird and scary but fascinating at the s...
67801                 great app, always been helpin me out.
50227                              sometime it doesn't work
1650                                      thank you chatgpt
63566     it is the best chat ai ..๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos
38261     this app is very helpful for containing study ...
78273                                         it's the best
118182                                    mind blowing... ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Removing Punctuation

def remove_punctuation(text):
    punctuation = string.punctuation
    clean_text = text.translate(str.maketrans('', '', punctuation))
    return clean_text
df['content'] = df['content'].apply(remove_punctuation)
19863     kinda weird and scary but fascinating at the s...
67801                   great app always been helpin me out
50227                               sometime it doesnt work
1650                                      thank you chatgpt
63566       it is the best chat ai ๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos
38261     this app is very helpful for containing study ...
78273                                          its the best
118182                                       mind blowing ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object
Chat Word Treatment

chat_words_mapping = {
    "lol": "laughing out loud",
    "brb": "be right back",
    "btw": "by the way",
    "afk": "away from keyboard",
    "rofl": "rolling on the floor laughing",
    "ttyl": "talk to you later",
    "np": "no problem",
    "thx": "thanks",
    "omg": "oh my god",
    "idk": "I don't know",
    "np": "no problem",
    "gg": "good game",
    "g2g": "got to go",
    "b4": "before",
    "cu": "see you",
    "yw": "you're welcome",
    "wtf": "what the f*ck",
    "imho": "in my humble opinion",
    "jk": "just kidding",
    "gf": "girlfriend",
    "bf": "boyfriend",
    "u": "you",
    "r": "are",
    "2": "to",
    "4": "for",
    "b": "be",
    "c": "see",
    "y": "why",
    "tho": "though",
    "smh": "shaking my head",
    "lolz": "laughing out loud",
    "h8": "hate",
    "luv": "love",
    "pls": "please",
    "sry": "sorry",
    "tbh": "to be honest",
    "omw": "on my way",
    "omw2syg": "on my way to see your girlfriend",

def expand_chat_words(text):
    words = text.split()
    expanded_words = [chat_words_mapping.get(word.lower(), word) for word in words]
    return ' '.join(expanded_words)
df['content'] = df['content'].apply(expand_chat_words)
19863     kinda weird and scary but fascinating at the s...
67801                   great app always been helpin me out
50227                               sometime it doesnt work
1650                                      thank you chatgpt
63566       it is the best chat ai ๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos
38261     this app is very helpful for containing study ...
78273                                          its the best
118182                                       mind blowing ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Removing Stop Words

def remove_stop_words(text):
	tokens = nltk.word_tokenize(text)
	stop_words = set(stopwords.words('english'))
	filtered_tokens = [token for token in tokens if token not in stop_words]
	preprocessed_text = ' '.join(filtered_tokens)
	return preprocessed_text
df['content'] = df['content'].apply(remove_stop_words)
19863                    kinda weird scary fascinating time
67801                               great app always helpin
50227                                  sometime doesnt work
1650                                          thank chatgpt
63566                           best chat ai ๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ pic videos
38261     app helpful containing study material types qu...
78273                                                  best
118182                                       mind blowing ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Replacing Emojis with Meanings

def replace_emojis_with_meanings(text):
    def replace(match):
        emoji_char =
        emoji_meaning = emoji.demojize(emoji_char)
        return emoji_meaning

    emoji_pattern = re.compile("["
                            "]+", flags=re.UNICODE)
    text_with_meanings = emoji_pattern.sub(replace, text)
    return text_with_meanings
df['content'] = df['content'].apply(replace_emojis_with_meanings)
19863                    kinda weird scary fascinating time
67801                               great app always helpin
50227                                  sometime doesnt work
1650                                          thank chatgpt
63566     best chat ai :OK_hand::OK_hand::OK_hand: pic v...
38261     app helpful containing study material types qu...
78273                                                  best
118182                             mind blowing :thumbs_up:
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Word Tokenization

def word_tokenization(text):
    return nltk.word_tokenize(text)
df['token_content'] = df['content'].apply(word_tokenization)
19863                    kinda weird scary fascinating time
67801                               great app always helpin
50227                                  sometime doesnt work
1650                                          thank chatgpt
63566     best chat ai :OK_hand::OK_hand::OK_hand: pic v...
38261     app helpful containing study material types qu...
78273                                                  best
118182                             mind blowing :thumbs_up:
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object
POS Tagging

nlp = spacy.load('en_core_web_sm', disable=['ner', 'textcat'])

def batch_pos_tagging(texts):
    docs = list(nlp.pipe(texts, batch_size=50))
    return [[(token.text, token.pos_) for token in doc] for doc in docs]

batch_size = 50
num_batches = len(df) // batch_size + 1

pos_tags = []
for i in tqdm(range(num_batches)):
    start = i * batch_size
    end = start + batch_size
    batch_texts = df['content'][start:end].tolist()

df['POS_Tags'] = pos_tags
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 365/365 [00:18<00:00, 19.53it/s]
19863     [(kinda, INTJ), (weird, ADJ), (scary, ADJ), (f...
67801     [(great, ADJ), (app, NOUN), (always, ADV), (he...
50227     [(sometime, ADV), (does, AUX), (nt, PART), (wo...
1650                       [(thank, VERB), (chatgpt, NOUN)]
63566     [(best, ADJ), (chat, NOUN), (ai, VERB), (:, PU...
38261     [(app, PROPN), (helpful, ADJ), (containing, VE...
78273                                         [(best, ADJ)]
118182    [(mind, NOUN), (blowing, VERB), (:, PUNCT), (t...
97082                                      [(awesome, ADJ)]
108190                                        [(good, ADJ)]
Name: POS_Tags, Length: 18214, dtype: object
19863kinda weird scary fascinating time504141.0[kinda, weird, scary, fascinating, time][(kinda, INTJ), (weird, ADJ), (scary, ADJ), (f…
67801great app always helpin5088.0[great, app, always, helpin][(great, ADJ), (app, NOUN), (always, ADV), (he…
50227sometime doesnt work102222.0[sometime, doesnt, work][(sometime, ADV), (does, AUX), (nt, PART), (wo…
1650thank chatgpt504848.0[thank, chatgpt][(thank, VERB), (chatgpt, NOUN)]
63566best chat ai :OK_hand::OK_hand::OK_hand: pic v…401212.0[best, chat, ai, :, OK_hand, :, :OK_hand, :, :…[(best, ADJ), (chat, NOUN), (ai, VERB), (:, PU…

Bag of Words

df['content'] = df['content'].apply(lambda x: ' '.join(x) if isinstance(x, list) else x)
df['token_content'] = df['token_content'].apply(lambda x: ' '.join(x) if isinstance(x, list) else x)
df['POS_Tags'] = df['POS_Tags'].apply(lambda x: ' '.join(str(i) for i in x) if isinstance(x, list) else x)

vectorizer = CountVectorizer(ngram_range=(2, 2))

bow_c = vectorizer.fit_transform(df['content'])
bow_t = vectorizer.fit_transform(df['token_content'])
bow_pos = vectorizer.fit_transform(df['POS_Tags'])

df['content'] = bow_c.toarray()
df['token_content'] = bow_t.toarray()
df['POS_Tags'] = bow_pos.toarray()
Predictive Modeling

Train-Test Split of Data

X_train, X_test, y_train, y_test = train_test_split(df.drop(columns=['score']), df['score'], test_size=0.2, random_state=41)

Decision Tree Model and Evaluation

dt = DecisionTreeClassifier(), y_train)


y_pred = dt.predict(X_test)
print(classification_report(y_test, y_pred, zero_division=0))
              precision    recall  f1-score   support

           1       0.34      0.07      0.12       202
           2       0.00      0.00      0.00        67
           3       0.00      0.00      0.00       161
           4       0.00      0.00      0.00       430
           5       0.77      0.99      0.87      2783

    accuracy                           0.76      3643
   macro avg       0.22      0.21      0.20      3643
weighted avg       0.61      0.76      0.67      3643

cm = confusion_matrix(y_test, y_pred)
cm_display = ConfusionMatrixDisplay(confusion_matrix=cm)

fig, ax = plt.subplots(figsize=(5, 5))
plt.title('Confusion Matrix')

Random Forest Model and Evaluation

rf = RandomForestClassifier(), y_train)


y_pred = rf.predict(X_test)
print(classification_report(y_test, y_pred, zero_division=0))
              precision    recall  f1-score   support

           1       0.37      0.05      0.09       202
           2       0.25      0.03      0.05        67
           3       0.00      0.00      0.00       161
           4       0.00      0.00      0.00       430
           5       0.77      0.99      0.87      2783

    accuracy                           0.76      3643
   macro avg       0.28      0.21      0.20      3643
weighted avg       0.61      0.76      0.67      3643

cm = confusion_matrix(y_test, y_pred)
cm_display = ConfusionMatrixDisplay(confusion_matrix=cm)

fig, ax = plt.subplots(figsize=(5, 5))
plt.title('Confusion Matrix')

ChatGPT reviews

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import as px
import warnings
01ea528a6-6d5d-4c9a-b266-9df306f20ed7abdulwaheed aminatamazing app,easy to navigate.501.2024.1012024-05-12 23:38:521.2024.101
19df43688-8a80-419e-b36d-61c95fd17d2aBenedette MorisonThe app is recommendable and reliable, especia…501.2024.1152024-05-12 23:35:021.2024.115
280b9806a-c5eb-44da-9f0d-6cd864c1f4cfAndroid TriggerSuperb ai app501.2024.1222024-05-12 23:27:051.2024.122
3ec5ea9e5-86bc-4de0-b170-f2871833ce74Brian PetersBest thing that ever happened to me.501.2024.1222024-05-12 23:17:461.2024.122
41e396118-3934-4ce6-8390-b2d56771e343Gautam kumar Patelthis is very good app501.2024.1082024-05-12 23:12:561.2024.108
113904462686ff-e500-413c-a6b4-2badc2e3b21dm.santhosh KumarUpdate 202350NaN2023-07-27 16:26:31NaN
113905f10e0d48-ecb6-42db-b103-46c0046f9be9Andrew Bourgeoisits grear50NaN2023-09-23 16:25:18NaN
113906df909a49-90b5-4dac-9b89-c4bd5a7c2f75Dern BobFuntastic App50NaN2023-11-08 13:57:14NaN
113907abe43878-973f-4e96-a765-c4af5c7f7b20Abdur rahman arifhi all50NaN2023-07-25 15:32:57NaN
1139080151001d-b81c-41b5-8927-f56738989625Tushar Deranexpert application50NaN2023-11-30 18:11:41NaN

113909 rows ร— 8 columns

01ea528a6-6d5d-4c9a-b266-9df306f20ed7abdulwaheed aminatamazing app,easy to navigate.501.2024.1012024-05-12 23:38:521.2024.101
19df43688-8a80-419e-b36d-61c95fd17d2aBenedette MorisonThe app is recommendable and reliable, especia…501.2024.1152024-05-12 23:35:021.2024.115
280b9806a-c5eb-44da-9f0d-6cd864c1f4cfAndroid TriggerSuperb ai app501.2024.1222024-05-12 23:27:051.2024.122
3ec5ea9e5-86bc-4de0-b170-f2871833ce74Brian PetersBest thing that ever happened to me.501.2024.1222024-05-12 23:17:461.2024.122
41e396118-3934-4ce6-8390-b2d56771e343Gautam kumar Patelthis is very good app501.2024.1082024-05-12 23:12:561.2024.108
Index(['reviewId', 'userName', 'content', 'score', 'thumbsUpCount',
       'reviewCreatedVersion', 'at', 'appVersion'],
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 113909 entries, 0 to 113908
Data columns (total 8 columns):
 #   Column                Non-Null Count   Dtype 
---  ------                --------------   ----- 
 0   reviewId              113909 non-null  object
 1   userName              113908 non-null  object
 2   content               113905 non-null  object
 3   score                 113909 non-null  int64 
 4   thumbsUpCount         113909 non-null  int64 
 5   reviewCreatedVersion  103913 non-null  object
 6   at                    113909 non-null  object
 7   appVersion            103913 non-null  object
dtypes: int64(2), object(6)
memory usage: 7.0+ MB
reviewId                   0
userName                   1
content                    4
score                      0
thumbsUpCount              0
reviewCreatedVersion    9996
at                         0
appVersion              9996
dtype: int64
sns.heatmap(df.isna() )
plt.figure(figsize=(10, 6))
sns.histplot(df['score'], bins=20, kde=True, color='skyblue')
plt.title('Distribution of Scores')
from wordcloud import WordCloud

reviews_text = ' '.join(df['content'].dropna())
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(reviews_text)
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.title('Word Cloud of Reviews')

ChatGPT Chronicles: Daily Dive into User Opinions

This analysis report provides an in-depth examination of user reviews for the ChatGPT Android App. The dataset consists of user reviews that are updated daily, containing valuable information like review ID, user name, review content, score, thumbs up count, review creation version, timestamp, and app version.

The main objective of this analysis is to extract valuable insights regarding user sentiment, identify patterns, and gain a better understanding of user satisfaction levels. These findings will serve as a basis for potential app improvements and enhancements.

The analysis uncovered several important discoveries. To begin with, the distribution of ratings shows that most users give high scores, suggesting they are generally satisfied with the app. However, there are also instances of lower ratings, indicating areas that could be improved.

Additionally, when looking at average scores for each app version, we can see differences in user satisfaction. Understanding these differences can help prioritize bug fixes and enhancements.

Furthermore, by conducting a correlation analysis between factors such as score, thumbs up count, and review length, we can gain insights into what influences user satisfaction and engagement. Lastly, analyzing scores over time using a time series approach can reveal trends and fluctuations in user sentiment.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import as px
import warnings
data = pd.read_csv("/kaggle/input/chatgpt-reviews-daily-updated/chatgpt_reviews.csv")
01ea528a6-6d5d-4c9a-b266-9df306f20ed7abdulwaheed aminatamazing app,easy to navigate.501.2024.1012024-05-12 23:38:521.2024.101
19df43688-8a80-419e-b36d-61c95fd17d2aBenedette MorisonThe app is recommendable and reliable, especia…501.2024.1152024-05-12 23:35:021.2024.115
280b9806a-c5eb-44da-9f0d-6cd864c1f4cfAndroid TriggerSuperb ai app501.2024.1222024-05-12 23:27:051.2024.122
3ec5ea9e5-86bc-4de0-b170-f2871833ce74Brian PetersBest thing that ever happened to me.501.2024.1222024-05-12 23:17:461.2024.122
41e396118-3934-4ce6-8390-b2d56771e343Gautam kumar Patelthis is very good app501.2024.1082024-05-12 23:12:561.2024.108
plt.figure(figsize=(10, 6))
sns.histplot(data['score'], bins=20, kde=True, color='skyblue')
plt.title('Distribution of Scores')
avg_score_by_version = data.groupby('appVersion')['score'].mean().reset_index()
plt.figure(figsize=(12, 6))
sns.barplot(x='appVersion', y='score', data=avg_score_by_version, palette='viridis')
plt.title('Average Score by App Version')
plt.xlabel('App Version')
plt.ylabel('Average Score')
sns.boxplot(x='appVersion', y='score', data=data, palette='pastel')
plt.title('Boxplot of Scores by App Version')
plt.xlabel('App Version')
# Select only numeric columns
numeric_data = data.select_dtypes(include=[np.number])

plt.figure(figsize=(10, 8))
corr = numeric_data.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap')
plt.figure(figsize=(10, 6))
sns.histplot(data['thumbsUpCount'], bins=30, kde=True, color='green')
plt.title('Distribution of Thumbs Up Count')
plt.xlabel('Thumbs Up Count')
plt.figure(figsize=(10, 6))
sns.scatterplot(x='score', y='thumbsUpCount', data=data, color='purple', alpha=0.5)
plt.title('Score vs Thumbs Up Count')
plt.ylabel('Thumbs Up Count')
review_count_by_user = data['userName'].value_counts().reset_index()
review_count_by_user.columns = ['User Name', 'Review Count']
plt.figure(figsize=(12, 6))
sns.barplot(x='Review Count', y='User Name', data=review_count_by_user.head(10), palette='magma')
plt.title('Top 10 Users by Review Count')
plt.xlabel('Review Count')
plt.ylabel('User Name')
data['review_length'] = data['content'].apply(lambda x: len(str(x)))
plt.figure(figsize=(10, 6))
sns.histplot(data['review_length'], bins=50, kde=True, color='brown')
plt.title('Distribution of Review Length')
plt.xlabel('Review Length')
from wordcloud import WordCloud

reviews_text = ' '.join(data['content'].dropna())
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(reviews_text)
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.title('Word Cloud of Reviews')


We’ve come to the end of our ChatGPT review journey.

Main Takeaways:

  • Average Perplexity: ChatGPT adds excitement to conversations.
  • Burstiness: It has a unique rhythm, like a talkative pal.
  • Predictability: Keeps you on your toes with surprises.
User feedback reveals the ups and downs of ChatGPT. It serves as a useful assistant and a fun chat companion. These observations shed light on the future of AI and how it’s enhancing our online interactions.

Thanks for being a part of this journey. More AI adventures coming soon!


