Machine Learning Project 7: Best ChatGPT Reviews Analysis

Introduction

Hey there, everyone! Welcome to our cozy corner of the internet where we’re delving deep into the realm of ChatGPT reviews.

That’s right, we’re exploring what people are saying about the wonders of ChatGPT and how it fares in the realm of AI chatbots. Grab a drink, get settled in, and let’s jump right in!

Also, check Machine Learning projects:

Why Does It Matter?

Ever been curious about how ChatGPT is holding up in the vast online world? Well, we’ve got the inside scoop. We’re breaking it down:

Average Perplexity: Sounds fancy, right? It’s all about how unpredictable the chatbot can be. More surprises, more excitement!
Burstiness Scores: You know that friend who talks a mile a minute and then suddenly goes silent? That’s burstiness. We’re seeing if ChatGPT has that same vibe.
Predictability: How easy is it to guess what ChatGPT will say next? Spoiler alert: it’s not always what you anticipate! 🌟📋🔍🌟

In-Depth Analysis of ChatGPT Reviews

Importing Libraries 📥

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
import string
import emoji
import nltk
import spacy
from tqdm import tqdm
from nltk.corpus import stopwords
from gensim.models import Word2Vec
from tensorflow.keras.preprocessing.text import Tokenizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.preprocessing import MinMaxScaler, LabelEncoder, OrdinalEncoder

2024-05-25 18:49:14.411021: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-25 18:49:14.411217: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-25 18:49:14.567808: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Importing Data

Dataset Link: https://www.kaggle.com/code/zain280/in-depth-analysis-of-chatgpt-reviews

df = pd.read_csv('/kaggle/input/chatgpt-reviews-daily-updated/chatgpt_reviews.csv')
df = df.sample(20000)
df.head()

	reviewId	userName	content	score	reviewCreatedVersion	at	appVersion
19863	d7f901e2-3032-4e7c-9ba0-cd8d1443b32d	Arshavir Mandegar	kinda weird and scary but fascinating at the s…	5	1.2024.073	2024-03-23 03:08:40	1.2024.073
67801	e05d7901-c968-4f5a-bd00-fbaf4a87e355	ayan khalid	great app, always been helpin me out.	5	1.0.0039	2023-09-04 16:39:37	1.0.0039
71308	76e6efef-5daa-4a2f-8bc8-aa698f9d5048	Archana Rajput	kep it in well done	5	NaN	2023-12-17 16:16:52	NaN
122762	9a1d150d-ff0b-4686-b017-499d46047db3	Rakibul Hasan	Good app	5	NaN	2023-12-08 17:03:22	NaN
50227	cf94f0c6-186a-4787-8bee-bcd0b5216345	Aasim Saquafi	Sometime it doesn’t work	1	1.2023.313	2024-01-05 15:41:41	1.2023.313

Header View of ChatGPT Reviews

pd.DataFrame(df['content']).head()

	content
19863	kinda weird and scary but fascinating at the s…
67801	great app, always been helpin me out.
71308	kep it in well done
122762	Good app
50227	Sometime it doesn’t work

Overview of Data

Distribution of Rows and Columns

df.shape

(20000, 8)

Data Information

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 20000 entries, 19863 to 108190
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   reviewId              20000 non-null  object
 1   userName              19999 non-null  object
 2   content               19998 non-null  object
 3   score                 20000 non-null  int64 
 4   thumbsUpCount         20000 non-null  int64 
 5   reviewCreatedVersion  18313 non-null  object
 6   at                    20000 non-null  object
 7   appVersion            18313 non-null  object
dtypes: int64(2), object(6)
memory usage: 1.4+ MB

Data Description

df.describe()

	score	thumbsUpCount
count	20000.000000	20000.000000
mean	4.502300	0.477100
std	1.083953	12.118301
min	1.000000	0.000000
25%	5.000000	0.000000
50%	5.000000	0.000000
75%	5.000000	0.000000
max	5.000000	1193.000000

Sum of Null Values

df.isnull().sum()

reviewId                   0
userName                   1
content                    2
score                      0
thumbsUpCount              0
reviewCreatedVersion    1687
at                         0
appVersion              1687
dtype: int64

ml process
kaggle machine learning projects
machine learning project manager
machine learning project management
machine learning projects for masters students

Data Cleaning 🧹

Dropping Duplicates

df = df.drop_duplicates()

Dropping Rows with Null Values

df = df.dropna()

Revised Data Shape

df.shape

(18214, 8)

Score Value Counts

df['score'].value_counts()

score
5    14008
4     2218
1      969
3      714
2      305
Name: count, dtype: int64

ThumbsUpCount Value Counts

df['thumbsUpCount'].value_counts()

thumbsUpCount
0      17444
1        420
2        118
3         47
5         24
       ...  
128        1
338        1
126        1
47         1
152        1
Name: count, Length: 66, dtype: int64

Visualizing Data

Histogram of Numerical Columns

df.hist(figsize=(15, 5))
plt.show()

Scatter Plot of Numerical Columns

sns.scatterplot(x='score', y='thumbsUpCount', data=df)
plt.show()

step machine learning
step of machine learning
ml projects
ml project
machine learning python projects
machine learning projects in python

Feature Engineering with Machine Learning

Selecting Relevant Features

df.drop(columns=['reviewId', 'userName', 'at'], inplace=True)

df.head()

	content	score	reviewCreatedVersion	appVersion
19863	kinda weird and scary but fascinating at the s…	5	1.2024.073	1.2024.073
67801	great app, always been helpin me out.	5	1.0.0039	1.0.0039
50227	Sometime it doesn’t work	1	1.2023.313	1.2023.313
1650	thank you chatgpt	5	1.2024.122	1.2024.122
63566	It is the best chat AI ..👌👌👌 but no pic or videos	4	1.2023.263	1.2023.263

Encoding Columns

le = LabelEncoder()
df['reviewCreatedVersion'] = le.fit_transform(df['reviewCreatedVersion'])

oe = OrdinalEncoder()
df['appVersion'] = oe.fit_transform(df[['appVersion']])

df.head()

	content	score	reviewCreatedVersion	appVersion
19863	kinda weird and scary but fascinating at the s…	5	41	41.0
67801	great app, always been helpin me out.	5	8	8.0
50227	Sometime it doesn’t work	1	22	22.0
1650	thank you chatgpt	5	48	48.0
63566	It is the best chat AI ..👌👌👌 but no pic or videos	4	12	12.0

Applying NLP to Review Content

df['content']

19863     kinda weird and scary but fascinating at the s...
67801                 great app, always been helpin me out.
50227                              Sometime it doesn't work
1650                                      thank you chatgpt
63566     It is the best chat AI ..👌👌👌 but no pic or videos
                                ...                        
38261     this app is very helpful for containing study ...
78273                                         it's the Best
118182                                    mind blowing... 👍
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Converting to Lowercase

df['content'] = df['content'].str.lower()

df['content']

19863     kinda weird and scary but fascinating at the s...
67801                 great app, always been helpin me out.
50227                              sometime it doesn't work
1650                                      thank you chatgpt
63566     it is the best chat ai ..👌👌👌 but no pic or videos
                                ...                        
38261     this app is very helpful for containing study ...
78273                                         it's the best
118182                                    mind blowing... 👍
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Removing HTML Tags

def remove_html_tags(text):
    clean_text = re.sub('<.*?>', '', text)
    return clean_text

df['content'] = df['content'].apply(remove_html_tags)

df['content']

19863     kinda weird and scary but fascinating at the s...
67801                 great app, always been helpin me out.
50227                              sometime it doesn't work
1650                                      thank you chatgpt
63566     it is the best chat ai ..👌👌👌 but no pic or videos
                                ...                        
38261     this app is very helpful for containing study ...
78273                                         it's the best
118182                                    mind blowing... 👍
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Removing URLs

def remove_urls(text):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
    clean_text = re.sub(url_pattern, '', text)
    return clean_text

df['content'] = df['content'].apply(remove_urls)

df['content']

19863     kinda weird and scary but fascinating at the s...
67801                 great app, always been helpin me out.
50227                              sometime it doesn't work
1650                                      thank you chatgpt
63566     it is the best chat ai ..👌👌👌 but no pic or videos
                                ...                        
38261     this app is very helpful for containing study ...
78273                                         it's the best
118182                                    mind blowing... 👍
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Removing Punctuation

def remove_punctuation(text):
    punctuation = string.punctuation
    clean_text = text.translate(str.maketrans('', '', punctuation))
    return clean_text

df['content'] = df['content'].apply(remove_punctuation)

df['content']

19863     kinda weird and scary but fascinating at the s...
67801                   great app always been helpin me out
50227                               sometime it doesnt work
1650                                      thank you chatgpt
63566       it is the best chat ai 👌👌👌 but no pic or videos
                                ...                        
38261     this app is very helpful for containing study ...
78273                                          its the best
118182                                       mind blowing 👍
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

machine learning project github
machine learning ideas
ml project ideas

Chat Word Treatment

chat_words_mapping = {
    "lol": "laughing out loud",
    "brb": "be right back",
    "btw": "by the way",
    "afk": "away from keyboard",
    "rofl": "rolling on the floor laughing",
    "ttyl": "talk to you later",
    "np": "no problem",
    "thx": "thanks",
    "omg": "oh my god",
    "idk": "I don't know",
    "np": "no problem",
    "gg": "good game",
    "g2g": "got to go",
    "b4": "before",
    "cu": "see you",
    "yw": "you're welcome",
    "wtf": "what the f*ck",
    "imho": "in my humble opinion",
    "jk": "just kidding",
    "gf": "girlfriend",
    "bf": "boyfriend",
    "u": "you",
    "r": "are",
    "2": "to",
    "4": "for",
    "b": "be",
    "c": "see",
    "y": "why",
    "tho": "though",
    "smh": "shaking my head",
    "lolz": "laughing out loud",
    "h8": "hate",
    "luv": "love",
    "pls": "please",
    "sry": "sorry",
    "tbh": "to be honest",
    "omw": "on my way",
    "omw2syg": "on my way to see your girlfriend",
}

def expand_chat_words(text):
    words = text.split()
    expanded_words = [chat_words_mapping.get(word.lower(), word) for word in words]
    return ' '.join(expanded_words)

df['content'] = df['content'].apply(expand_chat_words)

df['content']

19863     kinda weird and scary but fascinating at the s...
67801                   great app always been helpin me out
50227                               sometime it doesnt work
1650                                      thank you chatgpt
63566       it is the best chat ai 👌👌👌 but no pic or videos
                                ...                        
38261     this app is very helpful for containing study ...
78273                                          its the best
118182                                       mind blowing 👍
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Removing Stop Words

def remove_stop_words(text):
	tokens = nltk.word_tokenize(text)
	stop_words = set(stopwords.words('english'))
	filtered_tokens = [token for token in tokens if token not in stop_words]
	preprocessed_text = ' '.join(filtered_tokens)
	return preprocessed_text

df['content'] = df['content'].apply(remove_stop_words)

df['content']

19863                    kinda weird scary fascinating time
67801                               great app always helpin
50227                                  sometime doesnt work
1650                                          thank chatgpt
63566                           best chat ai 👌👌👌 pic videos
                                ...                        
38261     app helpful containing study material types qu...
78273                                                  best
118182                                       mind blowing 👍
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Replacing Emojis with Meanings

def replace_emojis_with_meanings(text):
    def replace(match):
        emoji_char = match.group()
        emoji_meaning = emoji.demojize(emoji_char)
        return emoji_meaning

    emoji_pattern = re.compile("["
                            u"\U0001F600-\U0001F64F"
                            u"\U0001F300-\U0001F5FF"
                            u"\U0001F680-\U0001F6FF"
                            u"\U0001F1E0-\U0001F1FF"
                            u"\U00002500-\U00002BEF"
                            u"\U00002702-\U000027B0"
                            u"\U00002702-\U000027B0"
                            u"\U000024C2-\U0001F251"
                            u"\U0001f926-\U0001f937"
                            u"\U00010000-\U0010ffff"
                            u"\u2640-\u2642"
                            u"\u2600-\u2B55"
                            u"\u200d"
                            u"\u23cf"
                            u"\u23e9"
                            u"\u231a"
                            u"\ufe0f"
                            u"\u3030"
                            "]+", flags=re.UNICODE)
    text_with_meanings = emoji_pattern.sub(replace, text)
    return text_with_meanings

df['content'] = df['content'].apply(replace_emojis_with_meanings)

df['content']

19863                    kinda weird scary fascinating time
67801                               great app always helpin
50227                                  sometime doesnt work
1650                                          thank chatgpt
63566     best chat ai :OK_hand::OK_hand::OK_hand: pic v...
                                ...                        
38261     app helpful containing study material types qu...
78273                                                  best
118182                             mind blowing :thumbs_up:
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Word Tokenization

def word_tokenization(text):
    return nltk.word_tokenize(text)

df['token_content'] = df['content'].apply(word_tokenization)

df['content']

19863                    kinda weird scary fascinating time
67801                               great app always helpin
50227                                  sometime doesnt work
1650                                          thank chatgpt
63566     best chat ai :OK_hand::OK_hand::OK_hand: pic v...
                                ...                        
38261     app helpful containing study material types qu...
78273                                                  best
118182                             mind blowing :thumbs_up:
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

cv machine learning
machine learning cv
machine learning projects github

POS Tagging

nlp = spacy.load('en_core_web_sm', disable=['ner', 'textcat'])

def batch_pos_tagging(texts):
    docs = list(nlp.pipe(texts, batch_size=50))
    return [[(token.text, token.pos_) for token in doc] for doc in docs]

batch_size = 50
num_batches = len(df) // batch_size + 1

pos_tags = []
for i in tqdm(range(num_batches)):
    start = i * batch_size
    end = start + batch_size
    batch_texts = df['content'][start:end].tolist()
    pos_tags.extend(batch_pos_tagging(batch_texts))

df['POS_Tags'] = pos_tags

100%|██████████| 365/365 [00:18<00:00, 19.53it/s]

df['POS_Tags']

19863     [(kinda, INTJ), (weird, ADJ), (scary, ADJ), (f...
67801     [(great, ADJ), (app, NOUN), (always, ADV), (he...
50227     [(sometime, ADV), (does, AUX), (nt, PART), (wo...
1650                       [(thank, VERB), (chatgpt, NOUN)]
63566     [(best, ADJ), (chat, NOUN), (ai, VERB), (:, PU...
                                ...                        
38261     [(app, PROPN), (helpful, ADJ), (containing, VE...
78273                                         [(best, ADJ)]
118182    [(mind, NOUN), (blowing, VERB), (:, PUNCT), (t...
97082                                      [(awesome, ADJ)]
108190                                        [(good, ADJ)]
Name: POS_Tags, Length: 18214, dtype: object

df.head()

	content	score	reviewCreatedVersion	appVersion	token_content	POS_Tags
19863	kinda weird scary fascinating time	5	41	41.0	[kinda, weird, scary, fascinating, time]	[(kinda, INTJ), (weird, ADJ), (scary, ADJ), (f…
67801	great app always helpin	5	8	8.0	[great, app, always, helpin]	[(great, ADJ), (app, NOUN), (always, ADV), (he…
50227	sometime doesnt work	1	22	22.0	[sometime, doesnt, work]	[(sometime, ADV), (does, AUX), (nt, PART), (wo…
1650	thank chatgpt	5	48	48.0	[thank, chatgpt]	[(thank, VERB), (chatgpt, NOUN)]
63566	best chat ai :OK_hand::OK_hand::OK_hand: pic v…	4	12	12.0	[best, chat, ai, :, OK_hand, :, :OK_hand, :, :…	[(best, ADJ), (chat, NOUN), (ai, VERB), (:, PU…

Bag of Words

df['content'] = df['content'].apply(lambda x: ' '.join(x) if isinstance(x, list) else x)
df['token_content'] = df['token_content'].apply(lambda x: ' '.join(x) if isinstance(x, list) else x)
df['POS_Tags'] = df['POS_Tags'].apply(lambda x: ' '.join(str(i) for i in x) if isinstance(x, list) else x)

vectorizer = CountVectorizer(ngram_range=(2, 2))

bow_c = vectorizer.fit_transform(df['content'])
bow_t = vectorizer.fit_transform(df['token_content'])
bow_pos = vectorizer.fit_transform(df['POS_Tags'])

df['content'] = bow_c.toarray()
df['token_content'] = bow_t.toarray()
df['POS_Tags'] = bow_pos.toarray()

df.head()

	score	reviewCreatedVersion	appVersion
19863	5	41	41.0
67801	5	8	8.0
50227	1	22	22.0
1650	5	48	48.0
63566	4	12	12.0

ml projects ideas
project manager artificial intelligence
best machine learning courses reddit
machine learning projects for resume

Predictive Modeling

Train-Test Split of Data

X_train, X_test, y_train, y_test = train_test_split(df.drop(columns=['score']), df['score'], test_size=0.2, random_state=41)

Decision Tree Model and Evaluation

dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

DecisionTreeClassifier

DecisionTreeClassifier()

y_pred = dt.predict(X_test)

print(classification_report(y_test, y_pred, zero_division=0))

              precision    recall  f1-score   support

           1       0.34      0.07      0.12       202
           2       0.00      0.00      0.00        67
           3       0.00      0.00      0.00       161
           4       0.00      0.00      0.00       430
           5       0.77      0.99      0.87      2783

    accuracy                           0.76      3643
   macro avg       0.22      0.21      0.20      3643
weighted avg       0.61      0.76      0.67      3643

cm = confusion_matrix(y_test, y_pred)
cm_display = ConfusionMatrixDisplay(confusion_matrix=cm)

fig, ax = plt.subplots(figsize=(5, 5))
cm_display.plot(ax=ax)
plt.title('Confusion Matrix')
plt.show()

Random Forest Model and Evaluation

rf = RandomForestClassifier()

rf.fit(X_train, y_train)

RandomForestClassifier

RandomForestClassifier()

y_pred = rf.predict(X_test)

print(classification_report(y_test, y_pred, zero_division=0))

              precision    recall  f1-score   support

           1       0.37      0.05      0.09       202
           2       0.25      0.03      0.05        67
           3       0.00      0.00      0.00       161
           4       0.00      0.00      0.00       430
           5       0.77      0.99      0.87      2783

    accuracy                           0.76      3643
   macro avg       0.28      0.21      0.20      3643
weighted avg       0.61      0.76      0.67      3643

cm = confusion_matrix(y_test, y_pred)
cm_display = ConfusionMatrixDisplay(confusion_matrix=cm)

fig, ax = plt.subplots(figsize=(5, 5))
cm_display.plot(ax=ax)
plt.title('Confusion Matrix')
plt.show()

ChatGPT reviews

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')

df=pd.read_csv('/kaggle/input/chatgpt-reviews-daily-updated/chatgpt_reviews.csv')
df

	reviewId	userName	content	score	thumbsUpCount	reviewCreatedVersion	at	appVersion
0	1ea528a6-6d5d-4c9a-b266-9df306f20ed7	abdulwaheed aminat	amazing app,easy to navigate.	5	0	1.2024.101	2024-05-12 23:38:52	1.2024.101
1	9df43688-8a80-419e-b36d-61c95fd17d2a	Benedette Morison	The app is recommendable and reliable, especia…	5	0	1.2024.115	2024-05-12 23:35:02	1.2024.115
2	80b9806a-c5eb-44da-9f0d-6cd864c1f4cf	Android Trigger	Superb ai app	5	0	1.2024.122	2024-05-12 23:27:05	1.2024.122
3	ec5ea9e5-86bc-4de0-b170-f2871833ce74	Brian Peters	Best thing that ever happened to me.	5	0	1.2024.122	2024-05-12 23:17:46	1.2024.122
4	1e396118-3934-4ce6-8390-b2d56771e343	Gautam kumar Patel	this is very good app	5	0	1.2024.108	2024-05-12 23:12:56	1.2024.108
…	…	…	…	…	…	…	…	…
113904	462686ff-e500-413c-a6b4-2badc2e3b21d	m.santhosh Kumar	Update 2023	5	0	NaN	2023-07-27 16:26:31	NaN
113905	f10e0d48-ecb6-42db-b103-46c0046f9be9	Andrew Bourgeois	its grear	5	0	NaN	2023-09-23 16:25:18	NaN
113906	df909a49-90b5-4dac-9b89-c4bd5a7c2f75	Dern Bob	Funtastic App	5	0	NaN	2023-11-08 13:57:14	NaN
113907	abe43878-973f-4e96-a765-c4af5c7f7b20	Abdur rahman arif	hi all	5	0	NaN	2023-07-25 15:32:57	NaN
113908	0151001d-b81c-41b5-8927-f56738989625	Tushar Deran	expert application	5	0	NaN	2023-11-30 18:11:41	NaN

113909 rows × 8 columns

df.head()

	reviewId	userName	content	score	reviewCreatedVersion	at	appVersion
0	1ea528a6-6d5d-4c9a-b266-9df306f20ed7	abdulwaheed aminat	amazing app,easy to navigate.	5	1.2024.101	2024-05-12 23:38:52	1.2024.101
1	9df43688-8a80-419e-b36d-61c95fd17d2a	Benedette Morison	The app is recommendable and reliable, especia…	5	1.2024.115	2024-05-12 23:35:02	1.2024.115
2	80b9806a-c5eb-44da-9f0d-6cd864c1f4cf	Android Trigger	Superb ai app	5	1.2024.122	2024-05-12 23:27:05	1.2024.122
3	ec5ea9e5-86bc-4de0-b170-f2871833ce74	Brian Peters	Best thing that ever happened to me.	5	1.2024.122	2024-05-12 23:17:46	1.2024.122
4	1e396118-3934-4ce6-8390-b2d56771e343	Gautam kumar Patel	this is very good app	5	1.2024.108	2024-05-12 23:12:56	1.2024.108

df.columns

Index(['reviewId', 'userName', 'content', 'score', 'thumbsUpCount',
       'reviewCreatedVersion', 'at', 'appVersion'],
      dtype='object')

machine learning project for resume
best machine learning projects
cool machine learning projects

df.describe().T

	count	mean	std	min	25%	50%	75%	max
score	113909.0	4.494544	1.094733	1.0	5.0	5.0	5.0	5.0
thumbsUpCount	113909.0	0.611277	13.717219	0.0	0.0	0.0	0.0	1193.0

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 113909 entries, 0 to 113908
Data columns (total 8 columns):
 #   Column                Non-Null Count   Dtype 
---  ------                --------------   ----- 
 0   reviewId              113909 non-null  object
 1   userName              113908 non-null  object
 2   content               113905 non-null  object
 3   score                 113909 non-null  int64 
 4   thumbsUpCount         113909 non-null  int64 
 5   reviewCreatedVersion  103913 non-null  object
 6   at                    113909 non-null  object
 7   appVersion            103913 non-null  object
dtypes: int64(2), object(6)
memory usage: 7.0+ MB

df.isna().sum()

reviewId                   0
userName                   1
content                    4
score                      0
thumbsUpCount              0
reviewCreatedVersion    9996
at                         0
appVersion              9996
dtype: int64

sns.heatmap(df.isna() )
plt.show()

plt.figure(figsize=(10, 6))
sns.histplot(df['score'], bins=20, kde=True, color='skyblue')
plt.title('Distribution of Scores')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

from wordcloud import WordCloud

reviews_text = ' '.join(df['content'].dropna())
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(reviews_text)
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.title('Word Cloud of Reviews')
plt.axis('off')
plt.show()

ChatGPT Chronicles: Daily Dive into User Opinions

This analysis report provides an in-depth examination of user reviews for the ChatGPT Android App. The dataset consists of user reviews that are updated daily, containing valuable information like review ID, user name, review content, score, thumbs up count, review creation version, timestamp, and app version.

machine learning projects
machine learning projects with source code
machine learning projects github
machine learning projects for final year
machine learning projects for students

The main objective of this analysis is to extract valuable insights regarding user sentiment, identify patterns, and gain a better understanding of user satisfaction levels. These findings will serve as a basis for potential app improvements and enhancements.

The analysis uncovered several important discoveries. To begin with, the distribution of ratings shows that most users give high scores, suggesting they are generally satisfied with the app. However, there are also instances of lower ratings, indicating areas that could be improved.

Additionally, when looking at average scores for each app version, we can see differences in user satisfaction. Understanding these differences can help prioritize bug fixes and enhancements.

Furthermore, by conducting a correlation analysis between factors such as score, thumbs up count, and review length, we can gain insights into what influences user satisfaction and engagement. Lastly, analyzing scores over time using a time series approach can reveal trends and fluctuations in user sentiment.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')

data = pd.read_csv("/kaggle/input/chatgpt-reviews-daily-updated/chatgpt_reviews.csv")

data.head()

	reviewId	userName	content	score	reviewCreatedVersion	at	appVersion
0	1ea528a6-6d5d-4c9a-b266-9df306f20ed7	abdulwaheed aminat	amazing app,easy to navigate.	5	1.2024.101	2024-05-12 23:38:52	1.2024.101
1	9df43688-8a80-419e-b36d-61c95fd17d2a	Benedette Morison	The app is recommendable and reliable, especia…	5	1.2024.115	2024-05-12 23:35:02	1.2024.115
2	80b9806a-c5eb-44da-9f0d-6cd864c1f4cf	Android Trigger	Superb ai app	5	1.2024.122	2024-05-12 23:27:05	1.2024.122
3	ec5ea9e5-86bc-4de0-b170-f2871833ce74	Brian Peters	Best thing that ever happened to me.	5	1.2024.122	2024-05-12 23:17:46	1.2024.122
4	1e396118-3934-4ce6-8390-b2d56771e343	Gautam kumar Patel	this is very good app	5	1.2024.108	2024-05-12 23:12:56	1.2024.108

plt.figure(figsize=(10, 6))
sns.histplot(data['score'], bins=20, kde=True, color='skyblue')
plt.title('Distribution of Scores')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

avg_score_by_version = data.groupby('appVersion')['score'].mean().reset_index()
plt.figure(figsize=(12, 6))
sns.barplot(x='appVersion', y='score', data=avg_score_by_version, palette='viridis')
plt.title('Average Score by App Version')
plt.xlabel('App Version')
plt.ylabel('Average Score')
plt.xticks(rotation=90)
plt.show()

projects on machine learning
machine learning project
project machine learning
machine learning certification
certification machine learning

plt.figure(figsize=(12, 6))
sns.boxplot(x='appVersion', y='score', data=data, palette='pastel')
plt.title('Boxplot of Scores by App Version')
plt.xlabel('App Version')
plt.ylabel('Score')
plt.xticks(rotation=90)
plt.show()

# Select only numeric columns
numeric_data = data.select_dtypes(include=[np.number])

plt.figure(figsize=(10, 8))
corr = numeric_data.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap')
plt.show()

plt.figure(figsize=(10, 6))
sns.histplot(data['thumbsUpCount'], bins=30, kde=True, color='green')
plt.title('Distribution of Thumbs Up Count')
plt.xlabel('Thumbs Up Count')
plt.ylabel('Frequency')
plt.show()

plt.figure(figsize=(10, 6))
sns.scatterplot(x='score', y='thumbsUpCount', data=data, color='purple', alpha=0.5)
plt.title('Score vs Thumbs Up Count')
plt.xlabel('Score')
plt.ylabel('Thumbs Up Count')
plt.show()

ml model
machine learning projects
projects machine learning

review_count_by_user = data['userName'].value_counts().reset_index()
review_count_by_user.columns = ['User Name', 'Review Count']
plt.figure(figsize=(12, 6))
sns.barplot(x='Review Count', y='User Name', data=review_count_by_user.head(10), palette='magma')
plt.title('Top 10 Users by Review Count')
plt.xlabel('Review Count')
plt.ylabel('User Name')
plt.show()

data['review_length'] = data['content'].apply(lambda x: len(str(x)))
plt.figure(figsize=(10, 6))
sns.histplot(data['review_length'], bins=50, kde=True, color='brown')
plt.title('Distribution of Review Length')
plt.xlabel('Review Length')
plt.ylabel('Frequency')
plt.show()

machine learning projects github
machine learning projects for final year
machine learning projects for students

from wordcloud import WordCloud

reviews_text = ' '.join(data['content'].dropna())
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(reviews_text)
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.title('Word Cloud of Reviews')
plt.axis('off')
plt.show()

Conclusion

We’ve come to the end of our ChatGPT review journey.

Main Takeaways:

Average Perplexity: ChatGPT adds excitement to conversations.
Burstiness: It has a unique rhythm, like a talkative pal.
Predictability: Keeps you on your toes with surprises.

ml projects github
ml projects for final year
ml projects for students

User feedback reveals the ups and downs of ChatGPT. It serves as a useful assistant and a fun chat companion. These observations shed light on the future of AI and how it’s enhancing our online interactions.

Thanks for being a part of this journey. More AI adventures coming soon!

Learn more

More info about our us

Facebook: Click

Telegram group of exercises: Click

YouTube: Click

Machine Learning Project 7: Best ChatGPT Reviews Analysis

Published by Writer1 on May 27, 2024May 27, 2024

Table of Contents

Introduction

In-Depth Analysis of ChatGPT Reviews

Importing Libraries 📥

Importing Data

Header View of ChatGPT Reviews

Overview of Data

Distribution of Rows and Columns

Data Information

Data Description

Sum of Null Values

Data Cleaning 🧹

Dropping Duplicates

Dropping Rows with Null Values

Revised Data Shape

Score Value Counts

ThumbsUpCount Value Counts

Visualizing Data

Histogram of Numerical Columns

Scatter Plot of Numerical Columns

Feature Engineering with Machine Learning

Selecting Relevant Features

Encoding Columns

Applying NLP to Review Content

Converting to Lowercase

Removing HTML Tags

Removing URLs

Removing Punctuation

Chat Word Treatment

Removing Stop Words

Replacing Emojis with Meanings

Word Tokenization

POS Tagging

Bag of Words

Predictive Modeling

Train-Test Split of Data

Decision Tree Model and Evaluation

Random Forest Model and Evaluation

ChatGPT reviews

ChatGPT Chronicles: Daily Dive into User Opinions

Conclusion

Learn more

More info about our us

0 Comments

Leave a Reply Cancel reply

Related Posts

Python

Python Script to Login to Website Automatically: Step-by-Step Guide

Python

Flask Login Form Example: A Complete Guide to User Authentication

Python

Understanding FutureWarnings in Pandas: A Guide for Developers (2025)