Machine Learning Project 5: Best Students Performance EDA

Introduction

Welcome to our journey through student performance data from 2018 to 2021! Throughout this exploration, we will delve into the world of Python programming and utilize powerful data science tools such as Pandas, Matplotlib, and Seaborn.

Also, check Machine Learning projects:

During this exploration, we will not only analyze trends, correlations, and patterns that influence student success but also employ techniques for general marks distribution, outlier detection, and treatment. By applying machine learning models, we will even be able to predict membership trends over time.

Join us as we navigate through the fascinating landscape of student performance data, using the power of Python and data science to gain a deeper understanding of student learning and pave the way for improved educational strategies.

Import Libraries

Dataset Links: https://www.kaggle.com/datasets/bhargavlc/studentsperformance/code

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")

machine learning projects github
machine learning projects for final year
machine learning projects for students

Explore dataset

df = pd.read_csv("/kaggle/input/studentsperformance/StudentsPerformance.csv")

df.head()

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
0	65	86	67	78	2021
1	64	85	71	80	2019
2	76	77	77	84	2021
3	80	76	75	75	2021
4	63	91	62	90	2019

df.describe()

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
count	30.000000	30.000000	30.000000	30.000000	30.000000
mean	70.066667	83.800000	68.300000	84.400000	2019.833333
std	5.464199	5.554743	6.923971	7.194826	0.949894
min	62.000000	75.000000	60.000000	75.000000	2018.000000
25%	64.250000	80.250000	62.000000	79.000000	2019.000000
50%	70.000000	83.000000	67.000000	83.500000	2020.000000
75%	74.750000	87.500000	74.500000	89.000000	2021.000000
max	80.000000	95.000000	80.000000	100.000000	2021.000000

nums = df.columns[:-1].tolist()

General marks distribution on pairplots showing students performance over 2018-2021 period

sns.pairplot(df, vars=nums, hue=df.columns[-1])
plt.show()

Mean score for each year recorded

fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 7))
grouped = df.groupby(df.columns[-1])
for i, j in enumerate(nums):
    mean = grouped[j].mean()
    sns.barplot(x=mean.index, y=mean, ax=axes[i])
    axes[i].set_xticklabels(axes[i].get_xticklabels(), rotation=90)
    for container in axes[i].containers:
        axes[i].bar_label(container, rotation=90, label_type="center")
    axes[i].set_ylabel("")
    axes[i].set_xlabel("")
    axes[i].set_title(j.replace('_', ' '))
plt.tight_layout()
plt.show()

machine learning projects
machine learning projects with source code

General marks distribution on boxplots

fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(13, 5))

for i, j in enumerate(nums):
    sns.boxplot(df, x=j, ax=axes[i])
    axes[i].set_xlabel("")
    axes[i].set_title(j.replace('_', ' '))
plt.tight_layout()
plt.show()

machine learning projects
machine learning projects with source code
machine learning projects github
machine learning projects for final year
machine learning projects for students

Marks distribution on boxplots for each year

fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(13, 5))

for i, j in enumerate(nums):
    sns.boxplot(df, x=df.columns[-1], y=j, ax=axes[i])
    axes[i].set_xlabel("")
    axes[i].set_ylabel("")
    axes[i].set_title(j.replace('_', ' '))
plt.tight_layout()
plt.show()

Correlation of marsk between subjects

corr = df[nums].corr()
corr.style.background_gradient(cmap='coolwarm')

	Math_Score	Reading_Score	Writing_Score	Placement_Score
Math_Score	1.000000	-0.106338	0.217283	0.210682
Reading_Score	-0.106338	1.000000	-0.615224	0.310959
Writing_Score	0.217283	-0.615224	1.000000	-0.293212
Placement_Score	0.210682	0.310959	-0.293212	1.000000

Machine Learning Mode to Predict Membership Trend Over Time

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

df = pd.read_csv("/kaggle/input/studentsperformance/StudentsPerformance.csv")

df.head(2)

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
0	65	86	67	78	2021
1	64	85	71	80	2019

# Visualization 1: Pairplot for correlation analysis
sns.pairplot(df)
plt.title('Pairplot of Scores and Placement')
plt.show()

machine learning projects
machine learning projects with source code
machine learning projects github
machine learning projects for final year
machine learning projects for students

# Visualization 2: Heatmap for correlation analysis
corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap')
plt.show()

# Visualization 3: Distribution of Math Scores
sns.histplot(df['Math_Score'], kde=True, color='skyblue')
plt.title('Distribution of Math Scores')
plt.xlabel('Math Score')
plt.ylabel('Frequency')
plt.show()

# Visualization 4: Boxplot of Reading Scores
sns.boxplot(x=df['Reading_Score'], color='salmon')
plt.title('Boxplot of Reading Scores')
plt.xlabel('Reading Score')
plt.show()

# Visualization 5: Time series plot of Club Join Dates
df['Club_Join_Date'] = pd.to_datetime(df['Club_Join_Date'], format='%Y')
df['Year'] = df['Club_Join_Date'].dt.year
club_counts = df['Year'].value_counts().sort_index()
sns.lineplot(x=club_counts.index, y=club_counts.values, marker='o', color='green')
plt.title('Club Join Dates Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Joinings')
plt.xticks(rotation=45)
plt.show()

import numpy as np
import pandas as pd
data=pd.read_csv("/kaggle/input/studentsperformance/StudentsPerformance.csv")
data

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
0	65	86	67	78	2021
1	64	85	71	80	2019
2	76	77	77	84	2021
3	80	76	75	75	2021
4	63	91	62	90	2019
5	73	95	62	79	2020
6	72	82	76	79	2020
7	77	82	62	87	2021
8	74	90	60	100	2019
9	68	85	72	89	2019
10	64	78	80	84	2019
11	75	83	76	83	2019
12	62	89	61	76	2019
13	69	80	73	87	2021
14	74	84	80	79	2019
15	69	85	66	78	2019
16	64	75	68	75	2021
17	75	81	76	95	2019
18	73	80	73	75	2020
19	75	92	62	97	2020
20	69	82	60	93	2021
21	68	90	66	83	2019
22	66	88	62	84	2018
23	75	75	80	89	2021
24	80	83	64	80	2020
25	71	95	60	95	2020
26	63	83	70	81	2019
27	62	77	67	78	2021
28	64	83	60	84	2021
29	72	82	61	95	2019

data.head()

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
0	65	86	67	78	2021
1	64	85	71	80	2019
2	76	77	77	84	2021
3	80	76	75	75	2021
4	63	91	62	90	2019

data.tail()

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
25	71	95	60	95	2020
26	63	83	70	81	2019
27	62	77	67	78	2021
28	64	83	60	84	2021
29	72	82	61	95	2019

data.isnull()

ml process
kaggle machine learning projects
machine learning project manager
machine learning project management
machine learning projects for masters students

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
0	False	False	False	False	False
1	False	False	False	False	False
2	False	False	False	False	False
3	False	False	False	False	False
4	False	False	False	False	False
5	False	False	False	False	False
6	False	False	False	False	False
7	False	False	False	False	False
8	False	False	False	False	False
9	False	False	False	False	False
10	False	False	False	False	False
11	False	False	False	False	False
12	False	False	False	False	False
13	False	False	False	False	False
14	False	False	False	False	False
15	False	False	False	False	False
16	False	False	False	False	False
17	False	False	False	False	False
18	False	False	False	False	False
19	False	False	False	False	False
20	False	False	False	False	False
21	False	False	False	False	False
22	False	False	False	False	False
23	False	False	False	False	False
24	False	False	False	False	False
25	False	False	False	False	False
26	False	False	False	False	False
27	False	False	False	False	False
28	False	False	False	False	False
29	False	False	False	False	False

data.notnull()

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
0	True	True	True	True	True
1	True	True	True	True	True
2	True	True	True	True	True
3	True	True	True	True	True
4	True	True	True	True	True
5	True	True	True	True	True
6	True	True	True	True	True
7	True	True	True	True	True
8	True	True	True	True	True
9	True	True	True	True	True
10	True	True	True	True	True
11	True	True	True	True	True
12	True	True	True	True	True
13	True	True	True	True	True
14	True	True	True	True	True
15	True	True	True	True	True
16	True	True	True	True	True
17	True	True	True	True	True
18	True	True	True	True	True
19	True	True	True	True	True
20	True	True	True	True	True
21	True	True	True	True	True
22	True	True	True	True	True
23	True	True	True	True	True
24	True	True	True	True	True
25	True	True	True	True	True
26	True	True	True	True	True
27	True	True	True	True	True
28	True	True	True	True	True
29	True	True	True	True	True

data.dropna()

ml process
kaggle machine learning projects
machine learning project manager
machine learning project management
machine learning projects for masters students

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
0	65	86	67	78	2021
1	64	85	71	80	2019
2	76	77	77	84	2021
3	80	76	75	75	2021
4	63	91	62	90	2019
5	73	95	62	79	2020
6	72	82	76	79	2020
7	77	82	62	87	2021
8	74	90	60	100	2019
9	68	85	72	89	2019
10	64	78	80	84	2019
11	75	83	76	83	2019
12	62	89	61	76	2019
13	69	80	73	87	2021
14	74	84	80	79	2019
15	69	85	66	78	2019
16	64	75	68	75	2021
17	75	81	76	95	2019
18	73	80	73	75	2020
19	75	92	62	97	2020
20	69	82	60	93	2021
21	68	90	66	83	2019
22	66	88	62	84	2018
23	75	75	80	89	2021
24	80	83	64	80	2020
25	71	95	60	95	2020
26	63	83	70	81	2019
27	62	77	67	78	2021
28	64	83	60	84	2021
29	72	82	61	95	2019

data['Math_Score'].fillna(value=0,inplace=True)
data

/tmp/ipykernel_18/43567279.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data['Math_Score'].fillna(value=0,inplace=True)

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
0	65	86	67	78	2021
1	64	85	71	80	2019
2	76	77	77	84	2021
3	80	76	75	75	2021
4	63	91	62	90	2019
5	73	95	62	79	2020
6	72	82	76	79	2020
7	77	82	62	87	2021
8	74	90	60	100	2019
9	68	85	72	89	2019
10	64	78	80	84	2019
11	75	83	76	83	2019
12	62	89	61	76	2019
13	69	80	73	87	2021
14	74	84	80	79	2019
15	69	85	66	78	2019
16	64	75	68	75	2021
17	75	81	76	95	2019
18	73	80	73	75	2020
19	75	92	62	97	2020
20	69	82	60	93	2021
21	68	90	66	83	2019
22	66	88	62	84	2018
23	75	75	80	89	2021
24	80	83	64	80	2020
25	71	95	60	95	2020
26	63	83	70	81	2019
27	62	77	67	78	2021
28	64	83	60	84	2021
29	72	82	61	95	2019

data.fillna(0,inplace=True)
data

step machine learning
step of machine learning
ml projects
ml project
machine learning python projects
machine learning projects in python

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
0	65	86	67	78	2021
1	64	85	71	80	2019
2	76	77	77	84	2021
3	80	76	75	75	2021
4	63	91	62	90	2019
5	73	95	62	79	2020
6	72	82	76	79	2020
7	77	82	62	87	2021
8	74	90	60	100	2019
9	68	85	72	89	2019
10	64	78	80	84	2019
11	75	83	76	83	2019
12	62	89	61	76	2019
13	69	80	73	87	2021
14	74	84	80	79	2019
15	69	85	66	78	2019
16	64	75	68	75	2021
17	75	81	76	95	2019
18	73	80	73	75	2020
19	75	92	62	97	2020
20	69	82	60	93	2021
21	68	90	66	83	2019
22	66	88	62	84	2018
23	75	75	80	89	2021
24	80	83	64	80	2020
25	71	95	60	95	2020
26	63	83	70	81	2019
27	62	77	67	78	2021
28	64	83	60	84	2021
29	72	82	61	95	2019

num=data._get_numeric_data()
num[num<0]=0
data

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
0	65	86	67	78	2021
1	64	85	71	80	2019
2	76	77	77	84	2021
3	80	76	75	75	2021
4	63	91	62	90	2019
5	73	95	62	79	2020
6	72	82	76	79	2020
7	77	82	62	87	2021
8	74	90	60	100	2019
9	68	85	72	89	2019
10	64	78	80	84	2019
11	75	83	76	83	2019
12	62	89	61	76	2019
13	69	80	73	87	2021
14	74	84	80	79	2019
15	69	85	66	78	2019
16	64	75	68	75	2021
17	75	81	76	95	2019
18	73	80	73	75	2020
19	75	92	62	97	2020
20	69	82	60	93	2021
21	68	90	66	83	2019
22	66	88	62	84	2018
23	75	75	80	89	2021
24	80	83	64	80	2020
25	71	95	60	95	2020
26	63	83	70	81	2019
27	62	77	67	78	2021
28	64	83	60	84	2021
29	72	82	61	95	2019

import seaborn as sns
sns.boxplot(data['Math_Score'])

<Axes: >

step machine learning
step of machine learning
ml projects
ml project
machine learning python projects
machine learning projects in python

import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize = (18,10))
ax.scatter(data['Writing_Score'], data['Reading_Score'])

ax.set_xlabel('Math_Score')
ax.set_ylabel('Reading_Score')
plt.show()

from scipy import stats
import numpy as np

z=np.abs(stats.zscore(data['Reading_Score']))
print(z)

0     0.402829
1     0.219725
2     1.245107
3     1.428211
4     1.318348
5     2.050764
6     0.329587
7     0.329587
8     1.135244
9     0.219725
10    1.062003
11    0.146483
12    0.952140
13    0.695795
14    0.036621
15    0.219725
16    1.611314
17    0.512691
18    0.695795
19    1.501452
20    0.329587
21    1.135244
22    0.769036
23    1.611314
24    0.146483
25    2.050764
26    0.146483
27    1.245107
28    0.146483
29    0.329587
Name: Reading_Score, dtype: float64

threshold=3
print(np.where(z>3))

(array([], dtype=int64),)

Q1=np.percentile(data['Reading_Score'],25,interpolation='midpoint')
Q3=np.percentile(data['Reading_Score'],75,interpolation='midpoint')
IQR=Q3-Q1
IQR

6.5

data.fillna(0,inplace=True)
print(data.to_string())

github artificial intelligence-projects
machine learning project life cycle
machine learning project python
machine learning projects python
deep learning projects for masters students

    Math_Score  Reading_Score  Writing_Score  Placement_Score  Club_Join_Date
0           65             86             67               78            2021
1           64             85             71               80            2019
2           76             77             77               84            2021
3           80             76             75               75            2021
4           63             91             62               90            2019
5           73             95             62               79            2020
6           72             82             76               79            2020
7           77             82             62               87            2021
8           74             90             60              100            2019
9           68             85             72               89            2019
10          64             78             80               84            2019
11          75             83             76               83            2019
12          62             89             61               76            2019
13          69             80             73               87            2021
14          74             84             80               79            2019
15          69             85             66               78            2019
16          64             75             68               75            2021
17          75             81             76               95            2019
18          73             80             73               75            2020
19          75             92             62               97            2020
20          69             82             60               93            2021
21          68             90             66               83            2019
22          66             88             62               84            2018
23          75             75             80               89            2021
24          80             83             64               80            2020
25          71             95             60               95            2020
26          63             83             70               81            2019
27          62             77             67               78            2021
28          64             83             60               84            2021
29          72             82             61               95            2019

Q1=np.percentile(data['Math_Score'],25,interpolation='midpoint')
Q3=np.percentile(data['Math_Score'],75,interpolation='midpoint')
IQR=Q3-Q1
IQR

10.0

upper= data['Math_Score']>=(Q3+1.5*IQR)
print("Upper Bound:",upper)
print(np.where(upper))

lower=data['Math_Score']<=(Q1-1.5*IQR)
print("Lower Bound:",lower)
print(np.where(lower))

Upper Bound: 0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
Name: Math_Score, dtype: bool
(array([], dtype=int64),)
Lower Bound: 0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
Name: Math_Score, dtype: bool
(array([], dtype=int64),)

data.skew(axis=0,skipna=True)

Math_Score         0.068575
Reading_Score      0.349884
Writing_Score      0.333115
Placement_Score    0.580572
Club_Join_Date     0.095792
dtype: float64

Q1= np.percentile(data['Math_Score'],25,
                  interpolation = 'midpoint')
Q3= np.percentile(data['Math_Score'],75,
                  interpolation = 'midpoint')
IQR=Q3-Q1
upper = np.where(data['Math_Score']>=(Q3+1.5*IQR))

lower=np.where(data['Math_Score']<=(Q1-1.5*IQR))
data.drop(upper[0],inplace=True)
data.drop(lower[0],inplace=True)

#outliner detection
arr=np.where(z>3)[0]

print(arr)
print("total outliners:",len(arr))
res=data.iloc[arr]
res

[]
total outliners: 0

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date

data.isna().sum()

Math_Score         0
Reading_Score      0
Writing_Score      0
Placement_Score    0
Club_Join_Date     0
dtype: int64

null_columns=data.columns[data.isnull().any()].tolist()
print("null",null_columns)
data.dtypes

null []

Math_Score         int64
Reading_Score      int64
Writing_Score      int64
Placement_Score    int64
Club_Join_Date     int64
dtype: object

for column in null_columns:
    data[column]=pd.to_numeric(data[column],errors='coerce').astype('float')
data

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
0	65	86	67	78	2021
1	64	85	71	80	2019
2	76	77	77	84	2021
3	80	76	75	75	2021
4	63	91	62	90	2019
5	73	95	62	79	2020
6	72	82	76	79	2020
7	77	82	62	87	2021
8	74	90	60	100	2019
9	68	85	72	89	2019
10	64	78	80	84	2019
11	75	83	76	83	2019
12	62	89	61	76	2019
13	69	80	73	87	2021
14	74	84	80	79	2019
15	69	85	66	78	2019
16	64	75	68	75	2021
17	75	81	76	95	2019
18	73	80	73	75	2020
19	75	92	62	97	2020
20	69	82	60	93	2021
21	68	90	66	83	2019
22	66	88	62	84	2018
23	75	75	80	89	2021
24	80	83	64	80	2020
25	71	95	60	95	2020
26	63	83	70	81	2019
27	62	77	67	78	2021
28	64	83	60	84	2021
29	72	82	61	95	2019

data.ffill()
data.bfill()
data.isna().sum()

Math_Score         0
Reading_Score      0
Writing_Score      0
Placement_Score    0
Club_Join_Date     0
dtype: int64

z=np.abs(stats.zscore(data['Math_Score']))
z

0     0.943099
1     1.129237
2     1.104419
3     1.848971
4     1.315375
5     0.546005
6     0.359867
7     1.290557
8     0.732143
9     0.384685
10    1.129237
11    0.918281
12    1.501513
13    0.198547
14    0.732143
15    0.198547
16    1.129237
17    0.918281
18    0.546005
19    0.918281
20    0.198547
21    0.384685
22    0.756961
23    0.918281
24    1.848971
25    0.173729
26    1.315375
27    1.501513
28    1.129237
29    0.359867
Name: Math_Score, dtype: float64

data_no_outliers= data[(z<=3)]
data=data[z<=3]
data

machine learning projects
machine learning projects with source code
machine learning projects github
machine learning projects for final year
machine learning projects for students

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date
0	65	86	67	78	2021
1	64	85	71	80	2019
2	76	77	77	84	2021
3	80	76	75	75	2021
4	63	91	62	90	2019
5	73	95	62	79	2020
6	72	82	76	79	2020
7	77	82	62	87	2021
8	74	90	60	100	2019
9	68	85	72	89	2019
10	64	78	80	84	2019
11	75	83	76	83	2019
12	62	89	61	76	2019
13	69	80	73	87	2021
14	74	84	80	79	2019
15	69	85	66	78	2019
16	64	75	68	75	2021
17	75	81	76	95	2019
18	73	80	73	75	2020
19	75	92	62	97	2020
20	69	82	60	93	2021
21	68	90	66	83	2019
22	66	88	62	84	2018
23	75	75	80	89	2021
24	80	83	64	80	2020
25	71	95	60	95	2020
26	63	83	70	81	2019
27	62	77	67	78	2021
28	64	83	60	84	2021
29	72	82	61	95	2019

Q1=data['Math_Score'].quantile(0.25)
Q3=np.percentile(data['Math_Score'],75,interpolation='midpoint')

IQR=Q3-Q1

print("Q1:",Q1)
print("Q3:",Q3)
print("IQR:",IQR)

Q1: 64.25
Q3: 74.5
IQR: 10.25

upper=data['Math_Score']>=(Q3+1.5*IQR)
print("Upper Bound:",Q3+1.5*IQR)
print(np.where(upper))

lower= data['Math_Score']<=(Q1-1.5*IQR)
print("Lower bound:",Q1-1.5*IQR)
print(np.where(lower))

Upper Bound: 89.875
(array([], dtype=int64),)
Lower bound: 48.875
(array([], dtype=int64),)

Q1=data['Math_Score'].quantile(0.25)
Q3=np.percentile(data['Math_Score'],75,interpolation='midpoint')

IQR=Q3-Q1

print("Q1:",Q1)
print("Q3:",Q3)
print("IQR:",IQR)

Q1: 64.25
Q3: 74.5
IQR: 10.25

data.plot(kind='scatter',x='Reading_Score',y='Math_Score',alpha=1,color='blue')

<Axes: xlabel='Reading_Score', ylabel='Math_Score'>

print("OLD skew",data['Math_Score'].skew())
data.plot(kind='hist',y='Math_Score')

OLD skew 0.0685751962334722

<Axes: ylabel='Frequency'>

print("New Skew",data['Math_Score'].skew())

New Skew 0.0685751962334722

machine learning projects reddit
reddit ai subreddit
machine learning interesting projects
good machine learning projects

Q1=data['Math_Score'].quantile(0.25)
Q3=np.percentile(data['Math_Score'],75,interpolation='midpoint')

IQR=Q3-Q1

print("Q1:",Q1)
print("Q3:",Q3)
print("IQR:",IQR)

print("OLD skew",data['Writing_Score'].skew())
data.plot(kind='hist',y='Writing_Score')
print("OLD skew",data['Writing_Score'].skew())
data.plot(kind='hist',y='Writing_Score')

Q1: 64.25
Q3: 74.5
IQR: 10.25
OLD skew 0.3331152741378783
OLD skew 0.3331152741378783

<Axes: ylabel='Frequency'>

machine learning projects reddit
reddit ai subreddit
machine learning interesting projects
good machine learning projects

data.hist()

array([[<Axes: title={'center': 'Math_Score'}>,
        <Axes: title={'center': 'Reading_Score'}>],
       [<Axes: title={'center': 'Writing_Score'}>,
        <Axes: title={'center': 'Placement_Score'}>],
       [<Axes: title={'center': 'Club_Join_Date'}>, <Axes: >]],
      dtype=object)

data.skew()

Math_Score         0.068575
Reading_Score      0.349884
Writing_Score      0.333115
Placement_Score    0.580572
Club_Join_Date     0.095792
dtype: float64

data['Writing_Score copy']=np.sqrt(data['Writing_Score'])
data.plot(kind='hist',y='Writing_Score copy')
data

deep learning projects github
deep learning project github
github artificial intelligence projects

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date	Writing_Score copy
0	65	86	67	78	2021	8.185353
1	64	85	71	80	2019	8.426150
2	76	77	77	84	2021	8.774964
3	80	76	75	75	2021	8.660254
4	63	91	62	90	2019	7.874008
5	73	95	62	79	2020	7.874008
6	72	82	76	79	2020	8.717798
7	77	82	62	87	2021	7.874008
8	74	90	60	100	2019	7.745967
9	68	85	72	89	2019	8.485281
10	64	78	80	84	2019	8.944272
11	75	83	76	83	2019	8.717798
12	62	89	61	76	2019	7.810250
13	69	80	73	87	2021	8.544004
14	74	84	80	79	2019	8.944272
15	69	85	66	78	2019	8.124038
16	64	75	68	75	2021	8.246211
17	75	81	76	95	2019	8.717798
18	73	80	73	75	2020	8.544004
19	75	92	62	97	2020	7.874008
20	69	82	60	93	2021	7.745967
21	68	90	66	83	2019	8.124038
22	66	88	62	84	2018	7.874008
23	75	75	80	89	2021	8.944272
24	80	83	64	80	2020	8.000000
25	71	95	60	95	2020	7.745967
26	63	83	70	81	2019	8.366600
27	62	77	67	78	2021	8.185353
28	64	83	60	84	2021	7.745967
29	72	82	61	95	2019	7.810250

deep learning projects github
deep learning project github
github artificial intelligence projects

sns.boxplot(x="Math_Score",data=data)

<Axes: xlabel='Math_Score'>

data['Math_Score']=data['Math_Score'].fillna(data['Math_Score'].mean())
data['Math_Score']=data['Math_Score'].fillna(data['Math_Score'].median())
data['Math_Score']=data['Math_Score'].fillna(data['Math_Score'].std())
data

ml projects ideas
project manager artificial intelligence
best machine learning courses reddit

	Math_Score	Reading_Score	Writing_Score	Placement_Score	Club_Join_Date	Writing_Score copy
0	65	86	67	78	2021	8.185353
1	64	85	71	80	2019	8.426150
2	76	77	77	84	2021	8.774964
3	80	76	75	75	2021	8.660254
4	63	91	62	90	2019	7.874008
5	73	95	62	79	2020	7.874008
6	72	82	76	79	2020	8.717798
7	77	82	62	87	2021	7.874008
8	74	90	60	100	2019	7.745967
9	68	85	72	89	2019	8.485281
10	64	78	80	84	2019	8.944272
11	75	83	76	83	2019	8.717798
12	62	89	61	76	2019	7.810250
13	69	80	73	87	2021	8.544004
14	74	84	80	79	2019	8.944272
15	69	85	66	78	2019	8.124038
16	64	75	68	75	2021	8.246211
17	75	81	76	95	2019	8.717798
18	73	80	73	75	2020	8.544004
19	75	92	62	97	2020	7.874008
20	69	82	60	93	2021	7.745967
21	68	90	66	83	2019	8.124038
22	66	88	62	84	2018	7.874008
23	75	75	80	89	2021	8.944272
24	80	83	64	80	2020	8.000000
25	71	95	60	95	2020	7.745967
26	63	83	70	81	2019	8.366600
27	62	77	67	78	2021	8.185353
28	64	83	60	84	2021	7.745967
29	72	82	61	95	2019	7.810250

import math
data2=data.copy()
for i in data2.index:
    data2.at[i,'Math_Score']=math.log(data2['Math_Score'][i])

/tmp/ipykernel_18/3405897247.py:4: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '4.174387269895637' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  data2.at[i,'Math_Score']=math.log(data2['Math_Score'][i])

data2.skew(axis=0,skipna=True)

Math_Score           -0.028399
Reading_Score         0.349884
Writing_Score         0.333115
Placement_Score       0.580572
Club_Join_Date        0.095792
Writing_Score copy    0.288178
dtype: float64

data.skew(axis=0,skipna=True)

Math_Score            0.068575
Reading_Score         0.349884
Writing_Score         0.333115
Placement_Score       0.580572
Club_Join_Date        0.095792
Writing_Score copy    0.288178
dtype: float64

from scipy import stats
boxcox=stats.boxcox(data['Math_Score'])[0]
pd.Series(boxcox).skew()

-0.007281337455985359

Conclusion

To sum up, this blog extensively explored student performance data from 2018 to 2021. It covered a wide range of aspects, including the distribution of marks, average scores for each year, visual representations of marks distribution, correlations between subjects, and identification and handling of outliers.

Moreover, it showcased the use of machine learning models to predict membership trends over time.

machine learning projects for resume
machine learning project for resume
best machine learning projects
cool machine learning projects

By utilizing visualization techniques such as pairplots, boxplots, heatmaps, and histograms, the blog effectively communicated insights about the dataset’s characteristics and relationships.

It also discussed important steps in data preprocessing, such as dealing with missing values and outliers, as well as applying transformations like log transformation and Box-Cox transformation to enhance data distribution.

Learn more

More info about our us

Facebook: Click

Telegram group of exercises: Click

YouTube: Click

6 Comments

Machine Learning Project 4: Best Explore Video Game Data · May 27, 2024 at 1:11 pm

[…] Machine Learning Project 5: Best Students Performance EDA […]

Machine Learning Project 5: Best Students Performance EDA

Published by Writer1 on May 27, 2024May 27, 2024

Table of Contents

Introduction

Import Libraries

Explore dataset

General marks distribution on pairplots showing students performance over 2018-2021 period

Mean score for each year recorded

General marks distribution on boxplots

Marks distribution on boxplots for each year

Correlation of marsk between subjects

Machine Learning Mode to Predict Membership Trend Over Time

Conclusion

Learn more

More info about our us

6 Comments

Machine Learning Project 4: Best Explore Video Game Data · May 27, 2024 at 1:11 pm

Machine Learning Project 3: Best Explore Indian Cuisine · May 27, 2024 at 1:12 pm

Machine Learning Project 2: Diversity Tech Company Best EDA · May 27, 2024 at 1:12 pm

Machine Learning Project 1: Honda Motor Stocks Best Prices · May 27, 2024 at 1:13 pm

ML Project 6: Obesity Type Best EDA And Classification · May 27, 2024 at 1:37 pm

Best ML Project: Machine Learning Engineer Salary In 2024 · May 28, 2024 at 6:22 pm

Leave a Reply Cancel reply

Python

Neo4j shortest path between two nodes: A Comprehensive Guide

Python

How to create currency converter API?

Python

How to Create a Currency Converter API: A Step-by-Step Guide

Machine Learning Project 5: Best Students Performance EDA

Published by Writer1 on May 27, 2024May 27, 2024

Table of Contents

Introduction

Import Libraries

Explore dataset

General marks distribution on pairplots showing students performance over 2018-2021 period

Mean score for each year recorded

General marks distribution on boxplots

Marks distribution on boxplots for each year

Correlation of marsk between subjects

Machine Learning Mode to Predict Membership Trend Over Time

Conclusion

Learn more

More info about our us

6 Comments

Machine Learning Project 4: Best Explore Video Game Data · May 27, 2024 at 1:11 pm

Machine Learning Project 3: Best Explore Indian Cuisine · May 27, 2024 at 1:12 pm

Machine Learning Project 2: Diversity Tech Company Best EDA · May 27, 2024 at 1:12 pm

Machine Learning Project 1: Honda Motor Stocks Best Prices · May 27, 2024 at 1:13 pm

ML Project 6: Obesity Type Best EDA And Classification · May 27, 2024 at 1:37 pm

Best ML Project: Machine Learning Engineer Salary In 2024 · May 28, 2024 at 6:22 pm

Leave a Reply Cancel reply

Related Posts

Python

Neo4j shortest path between two nodes: A Comprehensive Guide

Python

How to create currency converter API?

Python

How to Create a Currency Converter API: A Step-by-Step Guide