Suicide Tweet prediction analysis project

Introducing our groundbreaking project: Suicide Tweet Prediction Analysis. 📊📝🧠 With an unwavering commitment to mental health and well-being, we have harnessed the power of advanced data analysis to create a platform that identifies and analyzes tweets with the potential for self-harm indicators.

Our project combines cutting-edge machine learning algorithms with linguistic and contextual analysis to accurately predict and flag concerning content. By leveraging this technology, we aim to provide timely interventions and support for individuals in need, while also contributing to ongoing research in the field of mental health. With the potential to save lives and make a meaningful impact, the Suicide Tweet Prediction Analysis project stands as a testament to the positive applications of technology for the greater good. Join us in our mission to create a safer and more supportive online community. Together, we can make a difference. 💙🤖🌟

📌 Dataset:

https://www.kaggle.com/datasets/aunanya875/suicidal-tweet-detection-dataset/code

📌 1. Data import

In [1]:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
df = pd.read_csv('/kaggle/input/suicidal-tweet-detection-dataset/Suicide_Ideation_Dataset(Twitter-based).csv')
df

Out[1]:

	Tweet	Suicide
0	making some lunch	Not Suicide post
1	@Alexia You want his money.	Not Suicide post
2	@dizzyhrvy that crap took me forever to put to…	Potential Suicide post
3	@jnaylor #kiwitweets Hey Jer! Since when did y…	Not Suicide post
4	Trying out "Delicious Library 2" wit…	Not Suicide post
…	…	…
1782	i have forgotten how much i love my Nokia N95-1	Not Suicide post
1783	Starting my day out with a positive attitude! …	Not Suicide post
1784	@belledame222 Hey, it’s 5 am…give a girl som…	Not Suicide post
1785	2 drunken besties stumble into my room and we …	Not Suicide post
1786	@dancingbonita "I friggin love you!!!&quo…	Not Suicide post

1787 rows × 2 columns

📌 2. Data check

In [2]:

df.isnull().sum()

Out[2]:

Tweet      2
Suicide    0
dtype: int64

In [3]:

df.Suicide.value_counts()

Out[3]:

Not Suicide post           1127
Potential Suicide post      660
Name: Suicide, dtype: int64

📌 3. Processing data

In [4]:

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

df.Suicide = le.fit_transform(df.Suicide )
df.Suicide

/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"

Out[4]:

0       0
1       0
2       1
3       0
4       0
       ..
1782    0
1783    0
1784    0
1785    0
1786    0
Name: Suicide, Length: 1787, dtype: int64

📌 4.TfidfVectorizer

In [5]:

from sklearn.feature_extraction.text import TfidfVectorizer
df = df.dropna()
vectorizer = TfidfVectorizer(max_features=2000)  # Ajusta según tus necesidades
X = vectorizer.fit_transform(df['Tweet'])
y = df.Suicide
X = X.toarray()

📌 5.Split data

In [6]:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

📌 6. Model & Prediction

In [7]:

import tensorflow as tf
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score


model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(128, activation='gelu', input_shape=(X.shape[1],)),
    tf.keras.layers.Dense(64, activation='gelu'),
    tf.keras.layers.Dense(1, activation='sigmoid')  # Cambiar según el tipo de problema
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=5, batch_size=32)
y_pred = model.predict(X_test)
y_pred = np.round(y_pred)
score = accuracy_score(y_pred, y_test)
print(f"--------------------------------------\nAccuracy Score: {score:.2f}" )

/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:98: UserWarning: unable to load libtensorflow_io_plugins.so: unable to open file: libtensorflow_io_plugins.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
  warnings.warn(f"unable to load libtensorflow_io_plugins.so: {e}")
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:104: UserWarning: file system plugins are not loaded: unable to open file: libtensorflow_io.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']
  warnings.warn(f"file system plugins are not loaded: {e}")

Epoch 1/5
42/42 [==============================] - 2s 6ms/step - loss: 0.6235 - accuracy: 0.6607
Epoch 2/5
42/42 [==============================] - 0s 5ms/step - loss: 0.2937 - accuracy: 0.9178
Epoch 3/5
42/42 [==============================] - 0s 5ms/step - loss: 0.0910 - accuracy: 0.9776
Epoch 4/5
42/42 [==============================] - 0s 5ms/step - loss: 0.0386 - accuracy: 0.9925
Epoch 5/5
42/42 [==============================] - 0s 5ms/step - loss: 0.0184 - accuracy: 0.9970
14/14 [==============================] - 0s 2ms/step
--------------------------------------
Accuracy Score: 0.93

📌 7. Multi-models

In [8]:

from sklearn.metrics import accuracy_score
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from lightgbm import LGBMClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

# Definir una lista de nombres de modelos y sus correspondientes instancias
models = [
    ('XGBClassifier', XGBClassifier()),
    ('CatBoostClassifier', CatBoostClassifier(verbose=0)),
    ('LGBMClassifier', LGBMClassifier()),
    ('RandomForestClassifier', RandomForestClassifier()),
    ('SVC', SVC(probability=True))
    # Agrega otros clasificadores si deseas
]

In [9]:

for model_name, model in models:
    print(f"Training {model_name}...")
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    score = accuracy_score(y_pred, y_test)
    print(f"{model_name} Accuracy Score: {score:.2f}")
    print(f"--------------------------------------")

Training XGBClassifier...
XGBClassifier Accuracy Score: 0.93
--------------------------------------
Training CatBoostClassifier...
CatBoostClassifier Accuracy Score: 0.94
--------------------------------------
Training LGBMClassifier...
LGBMClassifier Accuracy Score: 0.94
--------------------------------------
Training RandomForestClassifier...
RandomForestClassifier Accuracy Score: 0.93
--------------------------------------
Training SVC...
SVC Accuracy Score: 0.93
--------------------------------------

📌 8. VotingClassifier

In [10]:

from sklearn.ensemble import VotingClassifier

model_instances = [model for _, model in models]

voting_classifier = VotingClassifier(estimators=models, voting='soft')
print(f"Training voting classifier...")

voting_classifier.fit(X_train, y_train)

y_pred2 = voting_classifier.predict(X_test)

score = accuracy_score(y_pred2, y_test)
print(f"Voting Ensemble Accuracy Score: {score:.2f}")

Training voting classifier...
Voting Ensemble Accuracy Score: 0.94

Learn more

More info about our us

Facebook: Click

Telegram group of exercises: Click

YouTube: Click

Suicide Tweet prediction analysis project

Published by Writer1 on August 26, 2023August 26, 2023

📌 Dataset:

📌 1. Data import

📌 2. Data check

📌 3. Processing data

📌 4.TfidfVectorizer

📌 5.Split data

📌 6. Model & Prediction

📌 7. Multi-models

📌 8. VotingClassifier

Learn more

More info about our us

0 Comments

Leave a Reply Cancel reply

Computer Engineering

E Commerce Customer Satisfaction: E Commerce Data Analysis Project

Computer Engineering

How to fully learn game development?

Computer Engineering

Best 7 Pandas Error Solved Step by Step

Suicide Tweet prediction analysis project

Published by Writer1 on August 26, 2023August 26, 2023

📌 Dataset:

📌 1. Data import

📌 2. Data check

📌 3. Processing data

📌 4.TfidfVectorizer

📌 5.Split data

📌 6. Model & Prediction

📌 7. Multi-models

📌 8. VotingClassifier

Learn more

More info about our us

0 Comments

Leave a Reply Cancel reply

Related Posts

Computer Engineering

E Commerce Customer Satisfaction: E Commerce Data Analysis Project

Computer Engineering

How to fully learn game development?

Computer Engineering

Best 7 Pandas Error Solved Step by Step