Introducing our groundbreaking project: Suicide Tweet Prediction Analysis. ๐๐๐ง With an unwavering commitment to mental health and well-being, we have harnessed the power of advanced data analysis to create a platform that identifies and analyzes tweets with the potential for self-harm indicators.
Our project combines cutting-edge machine learning algorithms with linguistic and contextual analysis to accurately predict and flag concerning content. By leveraging this technology, we aim to provide timely interventions and support for individuals in need, while also contributing to ongoing research in the field of mental health. With the potential to save lives and make a meaningful impact, the Suicide Tweet Prediction Analysis project stands as a testament to the positive applications of technology for the greater good. Join us in our mission to create a safer and more supportive online community. Together, we can make a difference. ๐๐ค๐
๐ Dataset:
https://www.kaggle.com/datasets/aunanya875/suicidal-tweet-detection-dataset/code
๐ 1. Data import
In [1]:
import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv) df = pd.read_csv('/kaggle/input/suicidal-tweet-detection-dataset/Suicide_Ideation_Dataset(Twitter-based).csv') df
Out[1]:
Tweet | Suicide | |
---|---|---|
0 | making some lunch | Not Suicide post |
1 | @Alexia You want his money. | Not Suicide post |
2 | @dizzyhrvy that crap took me forever to put to… | Potential Suicide post |
3 | @jnaylor #kiwitweets Hey Jer! Since when did y… | Not Suicide post |
4 | Trying out "Delicious Library 2" wit… | Not Suicide post |
… | … | … |
1782 | i have forgotten how much i love my Nokia N95-1 | Not Suicide post |
1783 | Starting my day out with a positive attitude! … | Not Suicide post |
1784 | @belledame222 Hey, it’s 5 am…give a girl som… | Not Suicide post |
1785 | 2 drunken besties stumble into my room and we … | Not Suicide post |
1786 | @dancingbonita "I friggin love you!!!&quo… | Not Suicide post |
1787 rows ร 2 columns
๐ 2. Data check
In [2]:
df.isnull().sum()
Out[2]:
Tweet 2 Suicide 0 dtype: int64
In [3]:
df.Suicide.value_counts()
Out[3]:
Not Suicide post 1127 Potential Suicide post 660 Name: Suicide, dtype: int64
๐ 3. Processing data
In [4]:
from sklearn.preprocessing import LabelEncoder le = LabelEncoder() df.Suicide = le.fit_transform(df.Suicide ) df.Suicide
/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Out[4]:
0 0 1 0 2 1 3 0 4 0 .. 1782 0 1783 0 1784 0 1785 0 1786 0 Name: Suicide, Length: 1787, dtype: int64
๐ 4.TfidfVectorizer
In [5]:
from sklearn.feature_extraction.text import TfidfVectorizer df = df.dropna() vectorizer = TfidfVectorizer(max_features=2000) # Ajusta segรบn tus necesidades X = vectorizer.fit_transform(df['Tweet']) y = df.Suicide X = X.toarray()
๐ 5.Split data
In [6]:
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
๐ 6. Model & Prediction
In [7]:
import tensorflow as tf from xgboost import XGBClassifier from sklearn.metrics import accuracy_score model = tf.keras.models.Sequential([ tf.keras.layers.Dense(128, activation='gelu', input_shape=(X.shape[1],)), tf.keras.layers.Dense(64, activation='gelu'), tf.keras.layers.Dense(1, activation='sigmoid') # Cambiar segรบn el tipo de problema ]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) model.fit(X_train, y_train, epochs=5, batch_size=32) y_pred = model.predict(X_test) y_pred = np.round(y_pred) score = accuracy_score(y_pred, y_test) print(f"--------------------------------------\nAccuracy Score: {score:.2f}" )
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:98: UserWarning: unable to load libtensorflow_io_plugins.so: unable to open file: libtensorflow_io_plugins.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so'] caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE'] warnings.warn(f"unable to load libtensorflow_io_plugins.so: {e}") /opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:104: UserWarning: file system plugins are not loaded: unable to open file: libtensorflow_io.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so'] caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE'] warnings.warn(f"file system plugins are not loaded: {e}")
Epoch 1/5 42/42 [==============================] - 2s 6ms/step - loss: 0.6235 - accuracy: 0.6607 Epoch 2/5 42/42 [==============================] - 0s 5ms/step - loss: 0.2937 - accuracy: 0.9178 Epoch 3/5 42/42 [==============================] - 0s 5ms/step - loss: 0.0910 - accuracy: 0.9776 Epoch 4/5 42/42 [==============================] - 0s 5ms/step - loss: 0.0386 - accuracy: 0.9925 Epoch 5/5 42/42 [==============================] - 0s 5ms/step - loss: 0.0184 - accuracy: 0.9970 14/14 [==============================] - 0s 2ms/step -------------------------------------- Accuracy Score: 0.93
๐ 7. Multi-models
In [8]:
from sklearn.metrics import accuracy_score from xgboost import XGBClassifier from catboost import CatBoostClassifier from lightgbm import LGBMClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC # Definir una lista de nombres de modelos y sus correspondientes instancias models = [ ('XGBClassifier', XGBClassifier()), ('CatBoostClassifier', CatBoostClassifier(verbose=0)), ('LGBMClassifier', LGBMClassifier()), ('RandomForestClassifier', RandomForestClassifier()), ('SVC', SVC(probability=True)) # Agrega otros clasificadores si deseas ]In [9]:
for model_name, model in models: print(f"Training {model_name}...") model.fit(X_train, y_train) y_pred = model.predict(X_test) score = accuracy_score(y_pred, y_test) print(f"{model_name} Accuracy Score: {score:.2f}") print(f"--------------------------------------")
Training XGBClassifier... XGBClassifier Accuracy Score: 0.93 -------------------------------------- Training CatBoostClassifier... CatBoostClassifier Accuracy Score: 0.94 -------------------------------------- Training LGBMClassifier... LGBMClassifier Accuracy Score: 0.94 -------------------------------------- Training RandomForestClassifier... RandomForestClassifier Accuracy Score: 0.93 -------------------------------------- Training SVC... SVC Accuracy Score: 0.93 --------------------------------------
๐ 8. VotingClassifier
In [10]:
from sklearn.ensemble import VotingClassifier model_instances = [model for _, model in models] voting_classifier = VotingClassifier(estimators=models, voting='soft') print(f"Training voting classifier...") voting_classifier.fit(X_train, y_train) y_pred2 = voting_classifier.predict(X_test) score = accuracy_score(y_pred2, y_test) print(f"Voting Ensemble Accuracy Score: {score:.2f}")
Training voting classifier... Voting Ensemble Accuracy Score: 0.94
0 Comments