Sharing is caring!

Machine Learning Project 7: Best ChatGPT Reviews Analysis

Table of Contents

Introduction

Hey there, everyone! Welcome to our cozy corner of the internet where we’re delving deep into the realm of ChatGPT reviews.

That’s right, we’re exploring what people are saying about the wonders of ChatGPT and how it fares in the realm of AI chatbots. Grab a drink, get settled in, and let’s jump right in!

Also, check Machine Learning projects:

Why Does It Matter?

Ever been curious about how ChatGPT is holding up in the vast online world? Well, we’ve got the inside scoop. We’re breaking it down:

  • Average Perplexity: Sounds fancy, right? It’s all about how unpredictable the chatbot can be. More surprises, more excitement!
  • Burstiness Scores: You know that friend who talks a mile a minute and then suddenly goes silent? That’s burstiness. We’re seeing if ChatGPT has that same vibe.
  • Predictability: How easy is it to guess what ChatGPT will say next? Spoiler alert: it’s not always what you anticipate! ๐ŸŒŸ๐Ÿ“‹๐Ÿ”๐ŸŒŸ

In-Depth Analysis of ChatGPT Reviews

Importing Libraries ๐Ÿ“ฅ

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
import string
import emoji
import nltk
import spacy
from tqdm import tqdm
from nltk.corpus import stopwords
from gensim.models import Word2Vec
from tensorflow.keras.preprocessing.text import Tokenizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.preprocessing import MinMaxScaler, LabelEncoder, OrdinalEncoder
2024-05-25 18:49:14.411021: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-25 18:49:14.411217: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-25 18:49:14.567808: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Importing Data

Dataset Link: https://www.kaggle.com/code/zain280/in-depth-analysis-of-chatgpt-reviews

df = pd.read_csv('/kaggle/input/chatgpt-reviews-daily-updated/chatgpt_reviews.csv')
df = df.sample(20000)
df.head()
reviewIduserNamecontentscorethumbsUpCountreviewCreatedVersionatappVersion
19863d7f901e2-3032-4e7c-9ba0-cd8d1443b32dArshavir Mandegarkinda weird and scary but fascinating at the s…501.2024.0732024-03-23 03:08:401.2024.073
67801e05d7901-c968-4f5a-bd00-fbaf4a87e355ayan khalidgreat app, always been helpin me out.501.0.00392023-09-04 16:39:371.0.0039
7130876e6efef-5daa-4a2f-8bc8-aa698f9d5048Archana Rajputkep it in well done50NaN2023-12-17 16:16:52NaN
1227629a1d150d-ff0b-4686-b017-499d46047db3Rakibul HasanGood app50NaN2023-12-08 17:03:22NaN
50227cf94f0c6-186a-4787-8bee-bcd0b5216345Aasim SaquafiSometime it doesn’t work101.2023.3132024-01-05 15:41:411.2023.313

Header View of ChatGPT Reviews

pd.DataFrame(df['content']).head()
content
19863kinda weird and scary but fascinating at the s…
67801great app, always been helpin me out.
71308kep it in well done
122762Good app
50227Sometime it doesn’t work

Overview of Data

Distribution of Rows and Columns

df.shape

(20000, 8)

Data Information

df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 20000 entries, 19863 to 108190
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   reviewId              20000 non-null  object
 1   userName              19999 non-null  object
 2   content               19998 non-null  object
 3   score                 20000 non-null  int64 
 4   thumbsUpCount         20000 non-null  int64 
 5   reviewCreatedVersion  18313 non-null  object
 6   at                    20000 non-null  object
 7   appVersion            18313 non-null  object
dtypes: int64(2), object(6)
memory usage: 1.4+ MB

Data Description

df.describe()
scorethumbsUpCount
count20000.00000020000.000000
mean4.5023000.477100
std1.08395312.118301
min1.0000000.000000
25%5.0000000.000000
50%5.0000000.000000
75%5.0000000.000000
max5.0000001193.000000

Sum of Null Values

df.isnull().sum()
reviewId                   0
userName                   1
content                    2
score                      0
thumbsUpCount              0
reviewCreatedVersion    1687
at                         0
appVersion              1687
dtype: int64
ml process
kaggle machine learning projects
machine learning project manager
machine learning project management
machine learning projects for masters students

Data Cleaning ๐Ÿงน

Dropping Duplicates

df = df.drop_duplicates()

Dropping Rows with Null Values

df = df.dropna()

Revised Data Shape

df.shape
(18214, 8)

Score Value Counts

df['score'].value_counts()
score
5    14008
4     2218
1      969
3      714
2      305
Name: count, dtype: int64

ThumbsUpCount Value Counts

df['thumbsUpCount'].value_counts()
thumbsUpCount
0      17444
1        420
2        118
3         47
5         24
       ...  
128        1
338        1
126        1
47         1
152        1
Name: count, Length: 66, dtype: int64

Visualizing Data

Histogram of Numerical Columns

df.hist(figsize=(15, 5))
plt.show()

Scatter Plot of Numerical Columns

sns.scatterplot(x='score', y='thumbsUpCount', data=df)
plt.show()
step machine learning
step of machine learning
ml projects
ml project
machine learning python projects
machine learning projects in python

Feature Engineering with Machine Learning

Selecting Relevant Features

df.drop(columns=['reviewId', 'userName', 'at'], inplace=True)
df.head()
contentscorethumbsUpCountreviewCreatedVersionappVersion
19863kinda weird and scary but fascinating at the s…501.2024.0731.2024.073
67801great app, always been helpin me out.501.0.00391.0.0039
50227Sometime it doesn’t work101.2023.3131.2023.313
1650thank you chatgpt501.2024.1221.2024.122
63566It is the best chat AI ..๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos401.2023.2631.2023.263

Encoding Columns

le = LabelEncoder()
df['reviewCreatedVersion'] = le.fit_transform(df['reviewCreatedVersion'])
oe = OrdinalEncoder()
df['appVersion'] = oe.fit_transform(df[['appVersion']])
df.head()
contentscorethumbsUpCountreviewCreatedVersionappVersion
19863kinda weird and scary but fascinating at the s…504141.0
67801great app, always been helpin me out.5088.0
50227Sometime it doesn’t work102222.0
1650thank you chatgpt504848.0
63566It is the best chat AI ..๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos401212.0

Applying NLP to Review Content

df['content']
19863     kinda weird and scary but fascinating at the s...
67801                 great app, always been helpin me out.
50227                              Sometime it doesn't work
1650                                      thank you chatgpt
63566     It is the best chat AI ..๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos
                                ...                        
38261     this app is very helpful for containing study ...
78273                                         it's the Best
118182                                    mind blowing... ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Converting to Lowercase

df['content'] = df['content'].str.lower()
df['content']
19863     kinda weird and scary but fascinating at the s...
67801                 great app, always been helpin me out.
50227                              sometime it doesn't work
1650                                      thank you chatgpt
63566     it is the best chat ai ..๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos
                                ...                        
38261     this app is very helpful for containing study ...
78273                                         it's the best
118182                                    mind blowing... ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Removing HTML Tags

def remove_html_tags(text):
    clean_text = re.sub('<.*?>', '', text)
    return clean_text
df['content'] = df['content'].apply(remove_html_tags)
df['content']
19863     kinda weird and scary but fascinating at the s...
67801                 great app, always been helpin me out.
50227                              sometime it doesn't work
1650                                      thank you chatgpt
63566     it is the best chat ai ..๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos
                                ...                        
38261     this app is very helpful for containing study ...
78273                                         it's the best
118182                                    mind blowing... ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Removing URLs

def remove_urls(text):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
    clean_text = re.sub(url_pattern, '', text)
    return clean_text
df['content'] = df['content'].apply(remove_urls)
df['content']
19863     kinda weird and scary but fascinating at the s...
67801                 great app, always been helpin me out.
50227                              sometime it doesn't work
1650                                      thank you chatgpt
63566     it is the best chat ai ..๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos
                                ...                        
38261     this app is very helpful for containing study ...
78273                                         it's the best
118182                                    mind blowing... ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Removing Punctuation

def remove_punctuation(text):
    punctuation = string.punctuation
    clean_text = text.translate(str.maketrans('', '', punctuation))
    return clean_text
df['content'] = df['content'].apply(remove_punctuation)
df['content']
19863     kinda weird and scary but fascinating at the s...
67801                   great app always been helpin me out
50227                               sometime it doesnt work
1650                                      thank you chatgpt
63566       it is the best chat ai ๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos
                                ...                        
38261     this app is very helpful for containing study ...
78273                                          its the best
118182                                       mind blowing ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object
machine learning project github
machine learning ideas
ml project ideas

Chat Word Treatment

chat_words_mapping = {
    "lol": "laughing out loud",
    "brb": "be right back",
    "btw": "by the way",
    "afk": "away from keyboard",
    "rofl": "rolling on the floor laughing",
    "ttyl": "talk to you later",
    "np": "no problem",
    "thx": "thanks",
    "omg": "oh my god",
    "idk": "I don't know",
    "np": "no problem",
    "gg": "good game",
    "g2g": "got to go",
    "b4": "before",
    "cu": "see you",
    "yw": "you're welcome",
    "wtf": "what the f*ck",
    "imho": "in my humble opinion",
    "jk": "just kidding",
    "gf": "girlfriend",
    "bf": "boyfriend",
    "u": "you",
    "r": "are",
    "2": "to",
    "4": "for",
    "b": "be",
    "c": "see",
    "y": "why",
    "tho": "though",
    "smh": "shaking my head",
    "lolz": "laughing out loud",
    "h8": "hate",
    "luv": "love",
    "pls": "please",
    "sry": "sorry",
    "tbh": "to be honest",
    "omw": "on my way",
    "omw2syg": "on my way to see your girlfriend",
}

def expand_chat_words(text):
    words = text.split()
    expanded_words = [chat_words_mapping.get(word.lower(), word) for word in words]
    return ' '.join(expanded_words)
df['content'] = df['content'].apply(expand_chat_words)
df['content']
19863     kinda weird and scary but fascinating at the s...
67801                   great app always been helpin me out
50227                               sometime it doesnt work
1650                                      thank you chatgpt
63566       it is the best chat ai ๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ but no pic or videos
                                ...                        
38261     this app is very helpful for containing study ...
78273                                          its the best
118182                                       mind blowing ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Removing Stop Words

def remove_stop_words(text):
	tokens = nltk.word_tokenize(text)
	stop_words = set(stopwords.words('english'))
	filtered_tokens = [token for token in tokens if token not in stop_words]
	preprocessed_text = ' '.join(filtered_tokens)
	return preprocessed_text
df['content'] = df['content'].apply(remove_stop_words)
df['content']
19863                    kinda weird scary fascinating time
67801                               great app always helpin
50227                                  sometime doesnt work
1650                                          thank chatgpt
63566                           best chat ai ๐Ÿ‘Œ๐Ÿ‘Œ๐Ÿ‘Œ pic videos
                                ...                        
38261     app helpful containing study material types qu...
78273                                                  best
118182                                       mind blowing ๐Ÿ‘
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Replacing Emojis with Meanings

def replace_emojis_with_meanings(text):
    def replace(match):
        emoji_char = match.group()
        emoji_meaning = emoji.demojize(emoji_char)
        return emoji_meaning

    emoji_pattern = re.compile("["
                            u"\U0001F600-\U0001F64F"
                            u"\U0001F300-\U0001F5FF"
                            u"\U0001F680-\U0001F6FF"
                            u"\U0001F1E0-\U0001F1FF"
                            u"\U00002500-\U00002BEF"
                            u"\U00002702-\U000027B0"
                            u"\U00002702-\U000027B0"
                            u"\U000024C2-\U0001F251"
                            u"\U0001f926-\U0001f937"
                            u"\U00010000-\U0010ffff"
                            u"\u2640-\u2642"
                            u"\u2600-\u2B55"
                            u"\u200d"
                            u"\u23cf"
                            u"\u23e9"
                            u"\u231a"
                            u"\ufe0f"
                            u"\u3030"
                            "]+", flags=re.UNICODE)
    text_with_meanings = emoji_pattern.sub(replace, text)
    return text_with_meanings
df['content'] = df['content'].apply(replace_emojis_with_meanings)
df['content']
19863                    kinda weird scary fascinating time
67801                               great app always helpin
50227                                  sometime doesnt work
1650                                          thank chatgpt
63566     best chat ai :OK_hand::OK_hand::OK_hand: pic v...
                                ...                        
38261     app helpful containing study material types qu...
78273                                                  best
118182                             mind blowing :thumbs_up:
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object

Word Tokenization

def word_tokenization(text):
    return nltk.word_tokenize(text)
df['token_content'] = df['content'].apply(word_tokenization)
df['content']
19863                    kinda weird scary fascinating time
67801                               great app always helpin
50227                                  sometime doesnt work
1650                                          thank chatgpt
63566     best chat ai :OK_hand::OK_hand::OK_hand: pic v...
                                ...                        
38261     app helpful containing study material types qu...
78273                                                  best
118182                             mind blowing :thumbs_up:
97082                                               awesome
108190                                                 good
Name: content, Length: 18214, dtype: object
cv machine learning
machine learning cv
machine learning projects github

POS Tagging

nlp = spacy.load('en_core_web_sm', disable=['ner', 'textcat'])

def batch_pos_tagging(texts):
    docs = list(nlp.pipe(texts, batch_size=50))
    return [[(token.text, token.pos_) for token in doc] for doc in docs]

batch_size = 50
num_batches = len(df) // batch_size + 1

pos_tags = []
for i in tqdm(range(num_batches)):
    start = i * batch_size
    end = start + batch_size
    batch_texts = df['content'][start:end].tolist()
    pos_tags.extend(batch_pos_tagging(batch_texts))

df['POS_Tags'] = pos_tags
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 365/365 [00:18<00:00, 19.53it/s]
df['POS_Tags']
19863     [(kinda, INTJ), (weird, ADJ), (scary, ADJ), (f...
67801     [(great, ADJ), (app, NOUN), (always, ADV), (he...
50227     [(sometime, ADV), (does, AUX), (nt, PART), (wo...
1650                       [(thank, VERB), (chatgpt, NOUN)]
63566     [(best, ADJ), (chat, NOUN), (ai, VERB), (:, PU...
                                ...                        
38261     [(app, PROPN), (helpful, ADJ), (containing, VE...
78273                                         [(best, ADJ)]
118182    [(mind, NOUN), (blowing, VERB), (:, PUNCT), (t...
97082                                      [(awesome, ADJ)]
108190                                        [(good, ADJ)]
Name: POS_Tags, Length: 18214, dtype: object
df.head()
contentscorethumbsUpCountreviewCreatedVersionappVersiontoken_contentPOS_Tags
19863kinda weird scary fascinating time504141.0[kinda, weird, scary, fascinating, time][(kinda, INTJ), (weird, ADJ), (scary, ADJ), (f…
67801great app always helpin5088.0[great, app, always, helpin][(great, ADJ), (app, NOUN), (always, ADV), (he…
50227sometime doesnt work102222.0[sometime, doesnt, work][(sometime, ADV), (does, AUX), (nt, PART), (wo…
1650thank chatgpt504848.0[thank, chatgpt][(thank, VERB), (chatgpt, NOUN)]
63566best chat ai :OK_hand::OK_hand::OK_hand: pic v…401212.0[best, chat, ai, :, OK_hand, :, :OK_hand, :, :…[(best, ADJ), (chat, NOUN), (ai, VERB), (:, PU…

Bag of Words

df['content'] = df['content'].apply(lambda x: ' '.join(x) if isinstance(x, list) else x)
df['token_content'] = df['token_content'].apply(lambda x: ' '.join(x) if isinstance(x, list) else x)
df['POS_Tags'] = df['POS_Tags'].apply(lambda x: ' '.join(str(i) for i in x) if isinstance(x, list) else x)

vectorizer = CountVectorizer(ngram_range=(2, 2))

bow_c = vectorizer.fit_transform(df['content'])
bow_t = vectorizer.fit_transform(df['token_content'])
bow_pos = vectorizer.fit_transform(df['POS_Tags'])

df['content'] = bow_c.toarray()
df['token_content'] = bow_t.toarray()
df['POS_Tags'] = bow_pos.toarray()
df.head()
contentscorethumbsUpCountreviewCreatedVersionappVersiontoken_contentPOS_Tags
198630504141.000
6780105088.000
502270102222.000
16500504848.000
635660401212.000
ml projects ideas
project manager artificial intelligence
best machine learning courses reddit
machine learning projects for resume

Predictive Modeling

Train-Test Split of Data

X_train, X_test, y_train, y_test = train_test_split(df.drop(columns=['score']), df['score'], test_size=0.2, random_state=41)

Decision Tree Model and Evaluation

dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

DecisionTreeClassifier

DecisionTreeClassifier()
y_pred = dt.predict(X_test)
print(classification_report(y_test, y_pred, zero_division=0))
              precision    recall  f1-score   support

           1       0.34      0.07      0.12       202
           2       0.00      0.00      0.00        67
           3       0.00      0.00      0.00       161
           4       0.00      0.00      0.00       430
           5       0.77      0.99      0.87      2783

    accuracy                           0.76      3643
   macro avg       0.22      0.21      0.20      3643
weighted avg       0.61      0.76      0.67      3643

cm = confusion_matrix(y_test, y_pred)
cm_display = ConfusionMatrixDisplay(confusion_matrix=cm)

fig, ax = plt.subplots(figsize=(5, 5))
cm_display.plot(ax=ax)
plt.title('Confusion Matrix')
plt.show()

Random Forest Model and Evaluation

rf = RandomForestClassifier()
rf.fit(X_train, y_train)

RandomForestClassifier

RandomForestClassifier()
y_pred = rf.predict(X_test)
print(classification_report(y_test, y_pred, zero_division=0))
              precision    recall  f1-score   support

           1       0.37      0.05      0.09       202
           2       0.25      0.03      0.05        67
           3       0.00      0.00      0.00       161
           4       0.00      0.00      0.00       430
           5       0.77      0.99      0.87      2783

    accuracy                           0.76      3643
   macro avg       0.28      0.21      0.20      3643
weighted avg       0.61      0.76      0.67      3643

cm = confusion_matrix(y_test, y_pred)
cm_display = ConfusionMatrixDisplay(confusion_matrix=cm)

fig, ax = plt.subplots(figsize=(5, 5))
cm_display.plot(ax=ax)
plt.title('Confusion Matrix')
plt.show()

ChatGPT reviews

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')
df=pd.read_csv('/kaggle/input/chatgpt-reviews-daily-updated/chatgpt_reviews.csv')
df
reviewIduserNamecontentscorethumbsUpCountreviewCreatedVersionatappVersion
01ea528a6-6d5d-4c9a-b266-9df306f20ed7abdulwaheed aminatamazing app,easy to navigate.501.2024.1012024-05-12 23:38:521.2024.101
19df43688-8a80-419e-b36d-61c95fd17d2aBenedette MorisonThe app is recommendable and reliable, especia…501.2024.1152024-05-12 23:35:021.2024.115
280b9806a-c5eb-44da-9f0d-6cd864c1f4cfAndroid TriggerSuperb ai app501.2024.1222024-05-12 23:27:051.2024.122
3ec5ea9e5-86bc-4de0-b170-f2871833ce74Brian PetersBest thing that ever happened to me.501.2024.1222024-05-12 23:17:461.2024.122
41e396118-3934-4ce6-8390-b2d56771e343Gautam kumar Patelthis is very good app501.2024.1082024-05-12 23:12:561.2024.108
113904462686ff-e500-413c-a6b4-2badc2e3b21dm.santhosh KumarUpdate 202350NaN2023-07-27 16:26:31NaN
113905f10e0d48-ecb6-42db-b103-46c0046f9be9Andrew Bourgeoisits grear50NaN2023-09-23 16:25:18NaN
113906df909a49-90b5-4dac-9b89-c4bd5a7c2f75Dern BobFuntastic App50NaN2023-11-08 13:57:14NaN
113907abe43878-973f-4e96-a765-c4af5c7f7b20Abdur rahman arifhi all50NaN2023-07-25 15:32:57NaN
1139080151001d-b81c-41b5-8927-f56738989625Tushar Deranexpert application50NaN2023-11-30 18:11:41NaN

113909 rows ร— 8 columns

df.head()
reviewIduserNamecontentscorethumbsUpCountreviewCreatedVersionatappVersion
01ea528a6-6d5d-4c9a-b266-9df306f20ed7abdulwaheed aminatamazing app,easy to navigate.501.2024.1012024-05-12 23:38:521.2024.101
19df43688-8a80-419e-b36d-61c95fd17d2aBenedette MorisonThe app is recommendable and reliable, especia…501.2024.1152024-05-12 23:35:021.2024.115
280b9806a-c5eb-44da-9f0d-6cd864c1f4cfAndroid TriggerSuperb ai app501.2024.1222024-05-12 23:27:051.2024.122
3ec5ea9e5-86bc-4de0-b170-f2871833ce74Brian PetersBest thing that ever happened to me.501.2024.1222024-05-12 23:17:461.2024.122
41e396118-3934-4ce6-8390-b2d56771e343Gautam kumar Patelthis is very good app501.2024.1082024-05-12 23:12:561.2024.108
df.columns
Index(['reviewId', 'userName', 'content', 'score', 'thumbsUpCount',
       'reviewCreatedVersion', 'at', 'appVersion'],
      dtype='object')
machine learning project for resume
best machine learning projects
cool machine learning projects
df.describe().T
countmeanstdmin25%50%75%max
score113909.04.4945441.0947331.05.05.05.05.0
thumbsUpCount113909.00.61127713.7172190.00.00.00.01193.0
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 113909 entries, 0 to 113908
Data columns (total 8 columns):
 #   Column                Non-Null Count   Dtype 
---  ------                --------------   ----- 
 0   reviewId              113909 non-null  object
 1   userName              113908 non-null  object
 2   content               113905 non-null  object
 3   score                 113909 non-null  int64 
 4   thumbsUpCount         113909 non-null  int64 
 5   reviewCreatedVersion  103913 non-null  object
 6   at                    113909 non-null  object
 7   appVersion            103913 non-null  object
dtypes: int64(2), object(6)
memory usage: 7.0+ MB
df.isna().sum()
reviewId                   0
userName                   1
content                    4
score                      0
thumbsUpCount              0
reviewCreatedVersion    9996
at                         0
appVersion              9996
dtype: int64
sns.heatmap(df.isna() )
plt.show()
plt.figure(figsize=(10, 6))
sns.histplot(df['score'], bins=20, kde=True, color='skyblue')
plt.title('Distribution of Scores')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()
from wordcloud import WordCloud

reviews_text = ' '.join(df['content'].dropna())
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(reviews_text)
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.title('Word Cloud of Reviews')
plt.axis('off')
plt.show()

ChatGPT Chronicles: Daily Dive into User Opinions

This analysis report provides an in-depth examination of user reviews for the ChatGPT Android App. The dataset consists of user reviews that are updated daily, containing valuable information like review ID, user name, review content, score, thumbs up count, review creation version, timestamp, and app version.

machine learning projects
machine learning projects with source code
machine learning projects github
machine learning projects for final year
machine learning projects for students

The main objective of this analysis is to extract valuable insights regarding user sentiment, identify patterns, and gain a better understanding of user satisfaction levels. These findings will serve as a basis for potential app improvements and enhancements.

The analysis uncovered several important discoveries. To begin with, the distribution of ratings shows that most users give high scores, suggesting they are generally satisfied with the app. However, there are also instances of lower ratings, indicating areas that could be improved.

Additionally, when looking at average scores for each app version, we can see differences in user satisfaction. Understanding these differences can help prioritize bug fixes and enhancements.

Furthermore, by conducting a correlation analysis between factors such as score, thumbs up count, and review length, we can gain insights into what influences user satisfaction and engagement. Lastly, analyzing scores over time using a time series approach can reveal trends and fluctuations in user sentiment.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')
data = pd.read_csv("/kaggle/input/chatgpt-reviews-daily-updated/chatgpt_reviews.csv")
data.head()
reviewIduserNamecontentscorethumbsUpCountreviewCreatedVersionatappVersion
01ea528a6-6d5d-4c9a-b266-9df306f20ed7abdulwaheed aminatamazing app,easy to navigate.501.2024.1012024-05-12 23:38:521.2024.101
19df43688-8a80-419e-b36d-61c95fd17d2aBenedette MorisonThe app is recommendable and reliable, especia…501.2024.1152024-05-12 23:35:021.2024.115
280b9806a-c5eb-44da-9f0d-6cd864c1f4cfAndroid TriggerSuperb ai app501.2024.1222024-05-12 23:27:051.2024.122
3ec5ea9e5-86bc-4de0-b170-f2871833ce74Brian PetersBest thing that ever happened to me.501.2024.1222024-05-12 23:17:461.2024.122
41e396118-3934-4ce6-8390-b2d56771e343Gautam kumar Patelthis is very good app501.2024.1082024-05-12 23:12:561.2024.108
plt.figure(figsize=(10, 6))
sns.histplot(data['score'], bins=20, kde=True, color='skyblue')
plt.title('Distribution of Scores')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()
avg_score_by_version = data.groupby('appVersion')['score'].mean().reset_index()
plt.figure(figsize=(12, 6))
sns.barplot(x='appVersion', y='score', data=avg_score_by_version, palette='viridis')
plt.title('Average Score by App Version')
plt.xlabel('App Version')
plt.ylabel('Average Score')
plt.xticks(rotation=90)
plt.show()
projects on machine learning
machine learning project
project machine learning
machine learning certification
certification machine learning
plt.figure(figsize=(12, 6))
sns.boxplot(x='appVersion', y='score', data=data, palette='pastel')
plt.title('Boxplot of Scores by App Version')
plt.xlabel('App Version')
plt.ylabel('Score')
plt.xticks(rotation=90)
plt.show()
# Select only numeric columns
numeric_data = data.select_dtypes(include=[np.number])

plt.figure(figsize=(10, 8))
corr = numeric_data.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap')
plt.show()
plt.figure(figsize=(10, 6))
sns.histplot(data['thumbsUpCount'], bins=30, kde=True, color='green')
plt.title('Distribution of Thumbs Up Count')
plt.xlabel('Thumbs Up Count')
plt.ylabel('Frequency')
plt.show()
plt.figure(figsize=(10, 6))
sns.scatterplot(x='score', y='thumbsUpCount', data=data, color='purple', alpha=0.5)
plt.title('Score vs Thumbs Up Count')
plt.xlabel('Score')
plt.ylabel('Thumbs Up Count')
plt.show()
ml model
machine learning projects
projects machine learning
review_count_by_user = data['userName'].value_counts().reset_index()
review_count_by_user.columns = ['User Name', 'Review Count']
plt.figure(figsize=(12, 6))
sns.barplot(x='Review Count', y='User Name', data=review_count_by_user.head(10), palette='magma')
plt.title('Top 10 Users by Review Count')
plt.xlabel('Review Count')
plt.ylabel('User Name')
plt.show()
data['review_length'] = data['content'].apply(lambda x: len(str(x)))
plt.figure(figsize=(10, 6))
sns.histplot(data['review_length'], bins=50, kde=True, color='brown')
plt.title('Distribution of Review Length')
plt.xlabel('Review Length')
plt.ylabel('Frequency')
plt.show()
machine learning projects github
machine learning projects for final year
machine learning projects for students
from wordcloud import WordCloud

reviews_text = ' '.join(data['content'].dropna())
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(reviews_text)
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.title('Word Cloud of Reviews')
plt.axis('off')
plt.show()

Conclusion

We’ve come to the end of our ChatGPT review journey.

Main Takeaways:

  • Average Perplexity: ChatGPT adds excitement to conversations.
  • Burstiness: It has a unique rhythm, like a talkative pal.
  • Predictability: Keeps you on your toes with surprises.
ml projects github
ml projects for final year
ml projects for students

User feedback reveals the ups and downs of ChatGPT. It serves as a useful assistant and a fun chat companion. These observations shed light on the future of AI and how it’s enhancing our online interactions.

Thanks for being a part of this journey. More AI adventures coming soon!


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *