Sharing is caring!

Suicide Tweet prediction analysis project

Introducing our groundbreaking project: Suicide Tweet Prediction Analysis. πŸ“ŠπŸ“πŸ§  With an unwavering commitment to mental health and well-being, we have harnessed the power of advanced data analysis to create a platform that identifies and analyzes tweets with the potential for self-harm indicators.

Our project combines cutting-edge machine learning algorithms with linguistic and contextual analysis to accurately predict and flag concerning content. By leveraging this technology, we aim to provide timely interventions and support for individuals in need, while also contributing to ongoing research in the field of mental health. With the potential to save lives and make a meaningful impact, the Suicide Tweet Prediction Analysis project stands as a testament to the positive applications of technology for the greater good. Join us in our mission to create a safer and more supportive online community. Together, we can make a difference. πŸ’™πŸ€–πŸŒŸ

πŸ“Œ Dataset:

https://www.kaggle.com/datasets/aunanya875/suicidal-tweet-detection-dataset/code

πŸ“Œ 1. Data import

In [1]:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
df = pd.read_csv('/kaggle/input/suicidal-tweet-detection-dataset/Suicide_Ideation_Dataset(Twitter-based).csv')
df

Out[1]:

TweetSuicide
0making some lunchNot Suicide post
1@Alexia You want his money.Not Suicide post
2@dizzyhrvy that crap took me forever to put to…Potential Suicide post
3@jnaylor #kiwitweets Hey Jer! Since when did y…Not Suicide post
4Trying out "Delicious Library 2" wit…Not Suicide post
1782i have forgotten how much i love my Nokia N95-1Not Suicide post
1783Starting my day out with a positive attitude! …Not Suicide post
1784@belledame222 Hey, it’s 5 am…give a girl som…Not Suicide post
17852 drunken besties stumble into my room and we …Not Suicide post
1786@dancingbonita "I friggin love you!!!&quo…Not Suicide post

1787 rows Γ— 2 columns

πŸ“Œ 2. Data check

In [2]:

df.isnull().sum()

Out[2]:

Tweet      2
Suicide    0
dtype: int64

In [3]:

df.Suicide.value_counts()

Out[3]:

Not Suicide post           1127
Potential Suicide post      660
Name: Suicide, dtype: int64

πŸ“Œ 3. Processing data

In [4]:

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

df.Suicide = le.fit_transform(df.Suicide )
df.Suicide 
/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"

Out[4]:

0       0
1       0
2       1
3       0
4       0
       ..
1782    0
1783    0
1784    0
1785    0
1786    0
Name: Suicide, Length: 1787, dtype: int64

πŸ“Œ 4.TfidfVectorizer

In [5]:

from sklearn.feature_extraction.text import TfidfVectorizer
df = df.dropna()
vectorizer = TfidfVectorizer(max_features=2000)  # Ajusta segΓΊn tus necesidades
X = vectorizer.fit_transform(df['Tweet'])
y = df.Suicide
X = X.toarray()

πŸ“Œ 5.Split data

In [6]:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

πŸ“Œ 6. Model & Prediction

In [7]:

import tensorflow as tf
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score


model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(128, activation='gelu', input_shape=(X.shape[1],)),
    tf.keras.layers.Dense(64, activation='gelu'),
    tf.keras.layers.Dense(1, activation='sigmoid')  # Cambiar segΓΊn el tipo de problema
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=5, batch_size=32)
y_pred = model.predict(X_test)
y_pred = np.round(y_pred)
score = accuracy_score(y_pred, y_test)
print(f"--------------------------------------\nAccuracy Score: {score:.2f}" )
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:98: UserWarning: unable to load libtensorflow_io_plugins.so: unable to open file: libtensorflow_io_plugins.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
  warnings.warn(f"unable to load libtensorflow_io_plugins.so: {e}")
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:104: UserWarning: file system plugins are not loaded: unable to open file: libtensorflow_io.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']
  warnings.warn(f"file system plugins are not loaded: {e}")
Epoch 1/5
42/42 [==============================] - 2s 6ms/step - loss: 0.6235 - accuracy: 0.6607
Epoch 2/5
42/42 [==============================] - 0s 5ms/step - loss: 0.2937 - accuracy: 0.9178
Epoch 3/5
42/42 [==============================] - 0s 5ms/step - loss: 0.0910 - accuracy: 0.9776
Epoch 4/5
42/42 [==============================] - 0s 5ms/step - loss: 0.0386 - accuracy: 0.9925
Epoch 5/5
42/42 [==============================] - 0s 5ms/step - loss: 0.0184 - accuracy: 0.9970
14/14 [==============================] - 0s 2ms/step
--------------------------------------
Accuracy Score: 0.93

πŸ“Œ 7. Multi-models

In [8]:

from sklearn.metrics import accuracy_score
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from lightgbm import LGBMClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

# Definir una lista de nombres de modelos y sus correspondientes instancias
models = [
    ('XGBClassifier', XGBClassifier()),
    ('CatBoostClassifier', CatBoostClassifier(verbose=0)),
    ('LGBMClassifier', LGBMClassifier()),
    ('RandomForestClassifier', RandomForestClassifier()),
    ('SVC', SVC(probability=True))
    # Agrega otros clasificadores si deseas
]

In [9]:

for model_name, model in models:
    print(f"Training {model_name}...")
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    score = accuracy_score(y_pred, y_test)
    print(f"{model_name} Accuracy Score: {score:.2f}")
    print(f"--------------------------------------")
Training XGBClassifier...
XGBClassifier Accuracy Score: 0.93
--------------------------------------
Training CatBoostClassifier...
CatBoostClassifier Accuracy Score: 0.94
--------------------------------------
Training LGBMClassifier...
LGBMClassifier Accuracy Score: 0.94
--------------------------------------
Training RandomForestClassifier...
RandomForestClassifier Accuracy Score: 0.93
--------------------------------------
Training SVC...
SVC Accuracy Score: 0.93
--------------------------------------

πŸ“Œ 8. VotingClassifier

In [10]:

from sklearn.ensemble import VotingClassifier

model_instances = [model for _, model in models]

voting_classifier = VotingClassifier(estimators=models, voting='soft')
print(f"Training voting classifier...")

voting_classifier.fit(X_train, y_train)

y_pred2 = voting_classifier.predict(X_test)

score = accuracy_score(y_pred2, y_test)
print(f"Voting Ensemble Accuracy Score: {score:.2f}")
Training voting classifier...
Voting Ensemble Accuracy Score: 0.94


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *