Sharing is caring!
Introduction Welcome to our journey through student performance data from 2018 to 2021! Throughout this exploration, we will delve into the world of Python programming and utilize powerful data science tools such as Pandas , Matplotlib , and Seaborn .
Also, check Machine Learning projects:
During this exploration, we will not only analyze trends, correlations, and patterns that influence student success but also employ techniques for general marks distribution, outlier detection, and treatment. By applying machine learning models, we will even be able to predict membership trends over time.
Join us as we navigate through the fascinating landscape of student performance data, using the power of Python and data science to gain a deeper understanding of student learning and pave the way for improved educational strategies.
Import Libraries Dataset Links: https://www.kaggle.com/datasets/bhargavlc/studentsperformance/code
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings .filterwarnings ("ignore")
machine learning projects github machine learning projects for final year machine learning projects for students Explore dataset df = pd .read_csv ("/kaggle/input/studentsperformance/StudentsPerformance.csv")
df .head ()
Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 0 65 86 67 78 2021 1 64 85 71 80 2019 2 76 77 77 84 2021 3 80 76 75 75 2021 4 63 91 62 90 2019
df .describe ()
Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date count 30.000000 30.000000 30.000000 30.000000 30.000000 mean 70.066667 83.800000 68.300000 84.400000 2019.833333 std 5.464199 5.554743 6.923971 7.194826 0.949894 min 62.000000 75.000000 60.000000 75.000000 2018.000000 25% 64.250000 80.250000 62.000000 79.000000 2019.000000 50% 70.000000 83.000000 67.000000 83.500000 2020.000000 75% 74.750000 87.500000 74.500000 89.000000 2021.000000 max 80.000000 95.000000 80.000000 100.000000 2021.000000
nums = df .columns [:-1].tolist ()
sns .pairplot (df , vars=nums , hue=df .columns [-1])
plt .show ()
Mean score for each year recorded fig , axes = plt .subplots (nrows=1, ncols=4, figsize=(10, 7))
grouped = df .groupby (df .columns [-1])
for i , j in enumerate(nums ):
mean = grouped [j ].mean ()
sns .barplot (x=mean .index , y=mean , ax=axes [i ])
axes [i ].set_xticklabels (axes [i ].get_xticklabels (), rotation=90)
for container in axes [i ].containers:
axes [i ].bar_label (container , rotation=90, label_type="center")
axes [i ].set_ylabel ("")
axes [i ].set_xlabel ("")
axes [i ].set_title (j .replace ('_', ' '))
plt .tight_layout ()
plt .show ()
machine learning projects machine learning projects with source code General marks distribution on boxplots fig , axes = plt .subplots (nrows=1, ncols=4, figsize=(13, 5))
for i , j in enumerate (nums ):
sns .boxplot (df , x=j , ax=axes [i ])
axes [i ].set_xlabel ("")
axes [i ].set_title (j .replace ('_', ' '))
plt .tight_layout ()
plt .show ()
Marks distribution on boxplots for each year fig , axes = plt .subplots (nrows=1, ncols=4, figsize=(13, 5))
for i , j in enumerate (nums ):
sns .boxplot (df , x=df .columns [-1], y=j , ax=axes [i ])
axes [i ].set_xlabel ("")
axes [i ].set_ylabel ("")
axes [i ].set_title (j .replace ('_', ' '))
plt .tight_layout ()
plt .show ()
Correlation of marsk between subjects corr = df [nums ].corr ()
corr .style .background_gradient (cmap='coolwarm')
Math_Score Reading_Score Writing_Score Placement_Score Math_Score 1.000000 -0.106338 0.217283 0.210682 Reading_Score -0.106338 1.000000 -0.615224 0.310959 Writing_Score 0.217283 -0.615224 1.000000 -0.293212 Placement_Score 0.210682 0.310959 -0.293212 1.000000
Machine Learning Mode to Predict Membership Trend Over Time import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import warnings
warnings .filterwarnings ('ignore')
df = pd .read_csv ("/kaggle/input/studentsperformance/StudentsPerformance.csv")
df .head(2)
Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 0 65 86 67 78 2021 1 64 85 71 80 2019
# Visualization 1: Pairplot for correlation analysis
sns .pairplot (df )
plt .title ('Pairplot of Scores and Placement')
plt .show ()
machine learning projects machine learning projects with source code machine learning projects github machine learning projects for final year machine learning projects for students # Visualization 2: Heatmap for correlation analysis
corr = df .corr()
sns .heatmap (corr , annot=True , cmap='coolwarm', fmt=".2f")
plt .title ('Correlation Heatmap')
plt .show ()
# Visualization 3: Distribution of Math Scores
sns .histplot (df ['Math_Score'], kde=True , color='skyblue')
plt .title ('Distribution of Math Scores')
plt .xlabel ('Math Score')
plt .ylabel ('Frequency')
plt .show ()
# Visualization 4: Boxplot of Reading Scores
sns .boxplot (x=df ['Reading_Score'], color='salmon')
plt .title ('Boxplot of Reading Scores')
plt .xlabel ('Reading Score')
plt .show ()
# Visualization 5: Time series plot of Club Join Dates
df ['Club_Join_Date'] = pd .to_datetime (df ['Club_Join_Date'], format='%Y')
df ['Year'] = df ['Club_Join_Date'].dt .year
club_counts = df ['Year'].value_counts ().sort_index ()
sns .lineplot (x=club_counts .index , y=club_counts .values , marker='o', color='green')
plt .title ('Club Join Dates Over Time')
plt .xlabel ('Year')
plt .ylabel ('Number of Joinings')
plt .xticks (rotation=45)
plt .show ()
import numpy as np
import pandas as pd
data =pd .read_csv ("/kaggle/input/studentsperformance/StudentsPerformance.csv")
data
Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 0 65 86 67 78 2021 1 64 85 71 80 2019 2 76 77 77 84 2021 3 80 76 75 75 2021 4 63 91 62 90 2019 5 73 95 62 79 2020 6 72 82 76 79 2020 7 77 82 62 87 2021 8 74 90 60 100 2019 9 68 85 72 89 2019 10 64 78 80 84 2019 11 75 83 76 83 2019 12 62 89 61 76 2019 13 69 80 73 87 2021 14 74 84 80 79 2019 15 69 85 66 78 2019 16 64 75 68 75 2021 17 75 81 76 95 2019 18 73 80 73 75 2020 19 75 92 62 97 2020 20 69 82 60 93 2021 21 68 90 66 83 2019 22 66 88 62 84 2018 23 75 75 80 89 2021 24 80 83 64 80 2020 25 71 95 60 95 2020 26 63 83 70 81 2019 27 62 77 67 78 2021 28 64 83 60 84 2021 29 72 82 61 95 2019
data .head()
Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 0 65 86 67 78 2021 1 64 85 71 80 2019 2 76 77 77 84 2021 3 80 76 75 75 2021 4 63 91 62 90 2019
data .tail()
Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 25 71 95 60 95 2020 26 63 83 70 81 2019 27 62 77 67 78 2021 28 64 83 60 84 2021 29 72 82 61 95 2019
data .isnull()
ml process kaggle machine learning projects machine learning project manager machine learning project management machine learning projects for masters students Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 0 False False False False False 1 False False False False False 2 False False False False False 3 False False False False False 4 False False False False False 5 False False False False False 6 False False False False False 7 False False False False False 8 False False False False False 9 False False False False False 10 False False False False False 11 False False False False False 12 False False False False False 13 False False False False False 14 False False False False False 15 False False False False False 16 False False False False False 17 False False False False False 18 False False False False False 19 False False False False False 20 False False False False False 21 False False False False False 22 False False False False False 23 False False False False False 24 False False False False False 25 False False False False False 26 False False False False False 27 False False False False False 28 False False False False False 29 False False False False False
data .notnull()
Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 0 True True True True True 1 True True True True True 2 True True True True True 3 True True True True True 4 True True True True True 5 True True True True True 6 True True True True True 7 True True True True True 8 True True True True True 9 True True True True True 10 True True True True True 11 True True True True True 12 True True True True True 13 True True True True True 14 True True True True True 15 True True True True True 16 True True True True True 17 True True True True True 18 True True True True True 19 True True True True True 20 True True True True True 21 True True True True True 22 True True True True True 23 True True True True True 24 True True True True True 25 True True True True True 26 True True True True True 27 True True True True True 28 True True True True True 29 True True True True True
data .dropna()
ml process kaggle machine learning projects machine learning project manager machine learning project management machine learning projects for masters students Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 0 65 86 67 78 2021 1 64 85 71 80 2019 2 76 77 77 84 2021 3 80 76 75 75 2021 4 63 91 62 90 2019 5 73 95 62 79 2020 6 72 82 76 79 2020 7 77 82 62 87 2021 8 74 90 60 100 2019 9 68 85 72 89 2019 10 64 78 80 84 2019 11 75 83 76 83 2019 12 62 89 61 76 2019 13 69 80 73 87 2021 14 74 84 80 79 2019 15 69 85 66 78 2019 16 64 75 68 75 2021 17 75 81 76 95 2019 18 73 80 73 75 2020 19 75 92 62 97 2020 20 69 82 60 93 2021 21 68 90 66 83 2019 22 66 88 62 84 2018 23 75 75 80 89 2021 24 80 83 64 80 2020 25 71 95 60 95 2020 26 63 83 70 81 2019 27 62 77 67 78 2021 28 64 83 60 84 2021 29 72 82 61 95 2019
data ['Math_Score'].fillna (value=0,inplace=True )
data
/tmp/ipykernel_18/43567279.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
data['Math_Score'].fillna(value=0,inplace=True)
Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 0 65 86 67 78 2021 1 64 85 71 80 2019 2 76 77 77 84 2021 3 80 76 75 75 2021 4 63 91 62 90 2019 5 73 95 62 79 2020 6 72 82 76 79 2020 7 77 82 62 87 2021 8 74 90 60 100 2019 9 68 85 72 89 2019 10 64 78 80 84 2019 11 75 83 76 83 2019 12 62 89 61 76 2019 13 69 80 73 87 2021 14 74 84 80 79 2019 15 69 85 66 78 2019 16 64 75 68 75 2021 17 75 81 76 95 2019 18 73 80 73 75 2020 19 75 92 62 97 2020 20 69 82 60 93 2021 21 68 90 66 83 2019 22 66 88 62 84 2018 23 75 75 80 89 2021 24 80 83 64 80 2020 25 71 95 60 95 2020 26 63 83 70 81 2019 27 62 77 67 78 2021 28 64 83 60 84 2021 29 72 82 61 95 2019
data .fillna(0,inplace=True )
data
step machine learning step of machine learning ml projects ml project machine learning python projects machine learning projects in python Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 0 65 86 67 78 2021 1 64 85 71 80 2019 2 76 77 77 84 2021 3 80 76 75 75 2021 4 63 91 62 90 2019 5 73 95 62 79 2020 6 72 82 76 79 2020 7 77 82 62 87 2021 8 74 90 60 100 2019 9 68 85 72 89 2019 10 64 78 80 84 2019 11 75 83 76 83 2019 12 62 89 61 76 2019 13 69 80 73 87 2021 14 74 84 80 79 2019 15 69 85 66 78 2019 16 64 75 68 75 2021 17 75 81 76 95 2019 18 73 80 73 75 2020 19 75 92 62 97 2020 20 69 82 60 93 2021 21 68 90 66 83 2019 22 66 88 62 84 2018 23 75 75 80 89 2021 24 80 83 64 80 2020 25 71 95 60 95 2020 26 63 83 70 81 2019 27 62 77 67 78 2021 28 64 83 60 84 2021 29 72 82 61 95 2019
num =data ._get_numeric_data()
num [num <0]=0
data
Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 0 65 86 67 78 2021 1 64 85 71 80 2019 2 76 77 77 84 2021 3 80 76 75 75 2021 4 63 91 62 90 2019 5 73 95 62 79 2020 6 72 82 76 79 2020 7 77 82 62 87 2021 8 74 90 60 100 2019 9 68 85 72 89 2019 10 64 78 80 84 2019 11 75 83 76 83 2019 12 62 89 61 76 2019 13 69 80 73 87 2021 14 74 84 80 79 2019 15 69 85 66 78 2019 16 64 75 68 75 2021 17 75 81 76 95 2019 18 73 80 73 75 2020 19 75 92 62 97 2020 20 69 82 60 93 2021 21 68 90 66 83 2019 22 66 88 62 84 2018 23 75 75 80 89 2021 24 80 83 64 80 2020 25 71 95 60 95 2020 26 63 83 70 81 2019 27 62 77 67 78 2021 28 64 83 60 84 2021 29 72 82 61 95 2019
import seaborn as sns
sns .boxplot (data ['Math_Score'])
<Axes: > import matplotlib.pyplot as plt
fig , ax = plt .subplots (figsize = (18,10))
ax .scatter (data ['Writing_Score'], data ['Reading_Score'])
ax .set_xlabel ('Math_Score')
ax .set_ylabel ('Reading_Score')
plt .show ()
from scipy import stats
import numpy as np
z =np .abs (stats .zscore (data ['Reading_Score']))
print (z )
0 0.402829
1 0.219725
2 1.245107
3 1.428211
4 1.318348
5 2.050764
6 0.329587
7 0.329587
8 1.135244
9 0.219725
10 1.062003
11 0.146483
12 0.952140
13 0.695795
14 0.036621
15 0.219725
16 1.611314
17 0.512691
18 0.695795
19 1.501452
20 0.329587
21 1.135244
22 0.769036
23 1.611314
24 0.146483
25 2.050764
26 0.146483
27 1.245107
28 0.146483
29 0.329587
Name: Reading_Score, dtype: float64
threshold =3
print (np .where (z >3))
(array([], dtype=int64),)
Q1 =np .percentile (data ['Reading_Score'],25,interpolation='midpoint')
Q3 =np .percentile (data ['Reading_Score'],75,interpolation='midpoint')
IQR =Q3 -Q1
IQR
6.5 data .fillna(0,inplace=True )
print (data .to_string())
github artificial intelligence-projects machine learning project life cycle machine learning project python machine learning projects python deep learning projects for masters students Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date
0 65 86 67 78 2021
1 64 85 71 80 2019
2 76 77 77 84 2021
3 80 76 75 75 2021
4 63 91 62 90 2019
5 73 95 62 79 2020
6 72 82 76 79 2020
7 77 82 62 87 2021
8 74 90 60 100 2019
9 68 85 72 89 2019
10 64 78 80 84 2019
11 75 83 76 83 2019
12 62 89 61 76 2019
13 69 80 73 87 2021
14 74 84 80 79 2019
15 69 85 66 78 2019
16 64 75 68 75 2021
17 75 81 76 95 2019
18 73 80 73 75 2020
19 75 92 62 97 2020
20 69 82 60 93 2021
21 68 90 66 83 2019
22 66 88 62 84 2018
23 75 75 80 89 2021
24 80 83 64 80 2020
25 71 95 60 95 2020
26 63 83 70 81 2019
27 62 77 67 78 2021
28 64 83 60 84 2021
29 72 82 61 95 2019
Q1 =np .percentile (data ['Math_Score'],25,interpolation='midpoint')
Q3 =np .percentile (data ['Math_Score'],75,interpolation='midpoint')
IQR =Q3 -Q1
IQR
10.0 upper = data ['Math_Score']>=(Q3 +1.5*IQR )
print ("Upper Bound:",upper )
print (np .where (upper ))
lower =data ['Math_Score']<=(Q1 -1.5*IQR )
print ("Lower Bound:",lower )
print (np .where (lower ))
Upper Bound: 0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
Name: Math_Score, dtype: bool
(array([], dtype=int64),)
Lower Bound: 0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
Name: Math_Score, dtype: bool
(array([], dtype=int64),)
data .skew(axis=0,skipna=True )
Math_Score 0.068575
Reading_Score 0.349884
Writing_Score 0.333115
Placement_Score 0.580572
Club_Join_Date 0.095792
dtype: float64 Q1 = np .percentile (data ['Math_Score'],25,
interpolation = 'midpoint')
Q3 = np .percentile (data ['Math_Score'],75,
interpolation = 'midpoint')
IQR =Q3 -Q1
upper = np .where (data ['Math_Score']>=(Q3 +1.5*IQR ))
lower =np .where (data ['Math_Score']<=(Q1 -1.5*IQR ))
data .drop(upper [0],inplace=True )
data .drop(lower [0],inplace=True )
#outliner detection
arr =np .where (z >3)[0]
print (arr )
print ("total outliners:",len (arr ))
res =data .iloc[arr ]
res
[]
total outliners: 0
Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date
data .isna().sum ()
Math_Score 0
Reading_Score 0
Writing_Score 0
Placement_Score 0
Club_Join_Date 0
dtype: int64 null_columns =data .columns[data .isnull().any ()].tolist ()
print ("null",null_columns )
data .dtypes
null []
Math_Score int64
Reading_Score int64
Writing_Score int64
Placement_Score int64
Club_Join_Date int64
dtype: object for column in null_columns :
data [column ]=pd .to_numeric (data [column ],errors='coerce').astype ('float')
data
Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 0 65 86 67 78 2021 1 64 85 71 80 2019 2 76 77 77 84 2021 3 80 76 75 75 2021 4 63 91 62 90 2019 5 73 95 62 79 2020 6 72 82 76 79 2020 7 77 82 62 87 2021 8 74 90 60 100 2019 9 68 85 72 89 2019 10 64 78 80 84 2019 11 75 83 76 83 2019 12 62 89 61 76 2019 13 69 80 73 87 2021 14 74 84 80 79 2019 15 69 85 66 78 2019 16 64 75 68 75 2021 17 75 81 76 95 2019 18 73 80 73 75 2020 19 75 92 62 97 2020 20 69 82 60 93 2021 21 68 90 66 83 2019 22 66 88 62 84 2018 23 75 75 80 89 2021 24 80 83 64 80 2020 25 71 95 60 95 2020 26 63 83 70 81 2019 27 62 77 67 78 2021 28 64 83 60 84 2021 29 72 82 61 95 2019
data .ffill()
data .bfill()
data .isna().sum ()
Math_Score 0
Reading_Score 0
Writing_Score 0
Placement_Score 0
Club_Join_Date 0
dtype: int64 z =np .abs (stats .zscore (data ['Math_Score']))
z
0 0.943099
1 1.129237
2 1.104419
3 1.848971
4 1.315375
5 0.546005
6 0.359867
7 1.290557
8 0.732143
9 0.384685
10 1.129237
11 0.918281
12 1.501513
13 0.198547
14 0.732143
15 0.198547
16 1.129237
17 0.918281
18 0.546005
19 0.918281
20 0.198547
21 0.384685
22 0.756961
23 0.918281
24 1.848971
25 0.173729
26 1.315375
27 1.501513
28 1.129237
29 0.359867
Name: Math_Score, dtype: float64 data_no_outliers = data [(z <=3)]
data =data [z <=3]
data
machine learning projects machine learning projects with source code machine learning projects github machine learning projects for final year machine learning projects for students Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date 0 65 86 67 78 2021 1 64 85 71 80 2019 2 76 77 77 84 2021 3 80 76 75 75 2021 4 63 91 62 90 2019 5 73 95 62 79 2020 6 72 82 76 79 2020 7 77 82 62 87 2021 8 74 90 60 100 2019 9 68 85 72 89 2019 10 64 78 80 84 2019 11 75 83 76 83 2019 12 62 89 61 76 2019 13 69 80 73 87 2021 14 74 84 80 79 2019 15 69 85 66 78 2019 16 64 75 68 75 2021 17 75 81 76 95 2019 18 73 80 73 75 2020 19 75 92 62 97 2020 20 69 82 60 93 2021 21 68 90 66 83 2019 22 66 88 62 84 2018 23 75 75 80 89 2021 24 80 83 64 80 2020 25 71 95 60 95 2020 26 63 83 70 81 2019 27 62 77 67 78 2021 28 64 83 60 84 2021 29 72 82 61 95 2019
Q1 =data ['Math_Score'].quantile (0.25)
Q3 =np .percentile (data ['Math_Score'],75,interpolation='midpoint')
IQR =Q3 -Q1
print ("Q1:",Q1 )
print ("Q3:",Q3 )
print ("IQR:",IQR )
Q1: 64.25
Q3: 74.5
IQR: 10.25
upper =data ['Math_Score']>=(Q3 +1.5*IQR )
print ("Upper Bound:",Q3 +1.5*IQR )
print (np .where (upper ))
lower = data ['Math_Score']<=(Q1 -1.5*IQR )
print ("Lower bound:",Q1 -1.5*IQR )
print (np .where (lower ))
Upper Bound: 89.875
(array([], dtype=int64),)
Lower bound: 48.875
(array([], dtype=int64),)
Q1 =data ['Math_Score'].quantile (0.25)
Q3 =np .percentile (data ['Math_Score'],75,interpolation='midpoint')
IQR =Q3 -Q1
print ("Q1:",Q1 )
print ("Q3:",Q3 )
print ("IQR:",IQR )
Q1: 64.25
Q3: 74.5
IQR: 10.25
data .plot(kind='scatter',x='Reading_Score',y='Math_Score',alpha=1,color='blue')
<Axes: xlabel='Reading_Score', ylabel='Math_Score'> print ("OLD skew",data ['Math_Score'].skew ())
data .plot(kind='hist',y='Math_Score')
OLD skew 0.0685751962334722
<Axes: ylabel='Frequency'> print ("New Skew",data ['Math_Score'].skew ())
New Skew 0.0685751962334722
machine learning projects reddit reddit ai subreddit machine learning interesting projects good machine learning projects Q1 =data ['Math_Score'].quantile (0.25)
Q3 =np .percentile (data ['Math_Score'],75,interpolation='midpoint')
IQR =Q3 -Q1
print ("Q1:",Q1 )
print ("Q3:",Q3 )
print ("IQR:",IQR )
print ("OLD skew",data ['Writing_Score'].skew ())
data .plot(kind='hist',y='Writing_Score')
print ("OLD skew",data ['Writing_Score'].skew ())
data .plot(kind='hist',y='Writing_Score')
Q1: 64.25
Q3: 74.5
IQR: 10.25
OLD skew 0.3331152741378783
OLD skew 0.3331152741378783
<Axes: ylabel='Frequency'> data .hist()
array([[<Axes: title={'center': 'Math_Score'}>,
<Axes: title={'center': 'Reading_Score'}>],
[<Axes: title={'center': 'Writing_Score'}>,
<Axes: title={'center': 'Placement_Score'}>],
[<Axes: title={'center': 'Club_Join_Date'}>, <Axes: >]],
dtype=object) data .skew()
Math_Score 0.068575
Reading_Score 0.349884
Writing_Score 0.333115
Placement_Score 0.580572
Club_Join_Date 0.095792
dtype: float64 data ['Writing_Score copy']=np .sqrt (data ['Writing_Score'])
data .plot(kind='hist',y='Writing_Score copy')
data
deep learning projects github deep learning project github github artificial intelligence projects Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date Writing_Score copy 0 65 86 67 78 2021 8.185353 1 64 85 71 80 2019 8.426150 2 76 77 77 84 2021 8.774964 3 80 76 75 75 2021 8.660254 4 63 91 62 90 2019 7.874008 5 73 95 62 79 2020 7.874008 6 72 82 76 79 2020 8.717798 7 77 82 62 87 2021 7.874008 8 74 90 60 100 2019 7.745967 9 68 85 72 89 2019 8.485281 10 64 78 80 84 2019 8.944272 11 75 83 76 83 2019 8.717798 12 62 89 61 76 2019 7.810250 13 69 80 73 87 2021 8.544004 14 74 84 80 79 2019 8.944272 15 69 85 66 78 2019 8.124038 16 64 75 68 75 2021 8.246211 17 75 81 76 95 2019 8.717798 18 73 80 73 75 2020 8.544004 19 75 92 62 97 2020 7.874008 20 69 82 60 93 2021 7.745967 21 68 90 66 83 2019 8.124038 22 66 88 62 84 2018 7.874008 23 75 75 80 89 2021 8.944272 24 80 83 64 80 2020 8.000000 25 71 95 60 95 2020 7.745967 26 63 83 70 81 2019 8.366600 27 62 77 67 78 2021 8.185353 28 64 83 60 84 2021 7.745967 29 72 82 61 95 2019 7.810250
sns .boxplot (x="Math_Score",data=data )
<Axes: xlabel='Math_Score'> data ['Math_Score']=data ['Math_Score'].fillna (data ['Math_Score'].mean ())
data ['Math_Score']=data ['Math_Score'].fillna (data ['Math_Score'].median ())
data ['Math_Score']=data ['Math_Score'].fillna (data ['Math_Score'].std ())
data
ml projects ideas project manager artificial intelligence best machine learning courses reddit Math_Score Reading_Score Writing_Score Placement_Score Club_Join_Date Writing_Score copy 0 65 86 67 78 2021 8.185353 1 64 85 71 80 2019 8.426150 2 76 77 77 84 2021 8.774964 3 80 76 75 75 2021 8.660254 4 63 91 62 90 2019 7.874008 5 73 95 62 79 2020 7.874008 6 72 82 76 79 2020 8.717798 7 77 82 62 87 2021 7.874008 8 74 90 60 100 2019 7.745967 9 68 85 72 89 2019 8.485281 10 64 78 80 84 2019 8.944272 11 75 83 76 83 2019 8.717798 12 62 89 61 76 2019 7.810250 13 69 80 73 87 2021 8.544004 14 74 84 80 79 2019 8.944272 15 69 85 66 78 2019 8.124038 16 64 75 68 75 2021 8.246211 17 75 81 76 95 2019 8.717798 18 73 80 73 75 2020 8.544004 19 75 92 62 97 2020 7.874008 20 69 82 60 93 2021 7.745967 21 68 90 66 83 2019 8.124038 22 66 88 62 84 2018 7.874008 23 75 75 80 89 2021 8.944272 24 80 83 64 80 2020 8.000000 25 71 95 60 95 2020 7.745967 26 63 83 70 81 2019 8.366600 27 62 77 67 78 2021 8.185353 28 64 83 60 84 2021 7.745967 29 72 82 61 95 2019 7.810250
import math
data2 =data .copy()
for i in data2 .index:
data2 .at[i ,'Math_Score']=math .log (data2 ['Math_Score'][i ])
/tmp/ipykernel_18/3405897247.py:4: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '4.174387269895637' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
data2.at[i,'Math_Score']=math.log(data2['Math_Score'][i])
data2 .skew (axis=0,skipna=True )
Math_Score -0.028399
Reading_Score 0.349884
Writing_Score 0.333115
Placement_Score 0.580572
Club_Join_Date 0.095792
Writing_Score copy 0.288178
dtype: float64 data .skew(axis=0,skipna=True )
Math_Score 0.068575
Reading_Score 0.349884
Writing_Score 0.333115
Placement_Score 0.580572
Club_Join_Date 0.095792
Writing_Score copy 0.288178
dtype: float64 from scipy import stats
boxcox =stats .boxcox (data ['Math_Score'])[0]
pd .Series (boxcox ).skew ()
-0.007281337455985359 Conclusion To sum up, this blog extensively explored student performance data from 2018 to 2021. It covered a wide range of aspects, including the distribution of marks, average scores for each year, visual representations of marks distribution, correlations between subjects, and identification and handling of outliers.
Moreover, it showcased the use of machine learning models to predict membership trends over time.
machine learning projects for resume machine learning project for resume best machine learning projects cool machine learning projects By utilizing visualization techniques such as pairplots, boxplots, heatmaps, and histograms, the blog effectively communicated insights about the dataset’s characteristics and relationships.
It also discussed important steps in data preprocessing, such as dealing with missing values and outliers, as well as applying transformations like log transformation and Box-Cox transformation to enhance data distribution.
More info about our us Facebook: Click
Telegram group of exercises: Click
YouTube: Click
6 Comments
Machine Learning Project 4: Best Explore Video Game Data · May 27, 2024 at 1:11 pm
[…] Machine Learning Project 5: Best Students Performance EDA […]
Machine Learning Project 3: Best Explore Indian Cuisine · May 27, 2024 at 1:12 pm
[…] Machine Learning Project 5: Best Students Performance EDA […]
Machine Learning Project 2: Diversity Tech Company Best EDA · May 27, 2024 at 1:12 pm
[…] Machine Learning Project 5: Best Students Performance EDA […]
Machine Learning Project 1: Honda Motor Stocks Best Prices · May 27, 2024 at 1:13 pm
[…] Machine Learning Project 5: Best Students Performance EDA […]
ML Project 6: Obesity Type Best EDA And Classification · May 27, 2024 at 1:37 pm
[…] Machine Learning Project 5: Best Students Performance EDA […]
Best ML Project: Machine Learning Engineer Salary In 2024 · May 28, 2024 at 6:22 pm
[…] Machine Learning Project 5: Best Students Performance EDA […]