Sharing is caring!

Table of Contents

Introduction

Welcome to our journey through student performance data from 2018 to 2021! Throughout this exploration, we will delve into the world of Python programming and utilize powerful data science tools such as Pandas, Matplotlib, and Seaborn.

Also, check Machine Learning projects:

During this exploration, we will not only analyze trends, correlations, and patterns that influence student success but also employ techniques for general marks distribution, outlier detection, and treatment. By applying machine learning models, we will even be able to predict membership trends over time.

Join us as we navigate through the fascinating landscape of student performance data, using the power of Python and data science to gain a deeper understanding of student learning and pave the way for improved educational strategies.

Import Libraries

Dataset Links: https://www.kaggle.com/datasets/bhargavlc/studentsperformance/code

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")
machine learning projects github
machine learning projects for final year
machine learning projects for students

Explore dataset

df = pd.read_csv("/kaggle/input/studentsperformance/StudentsPerformance.csv")
df.head()
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
0658667782021
1648571802019
2767777842021
3807675752021
4639162902019
df.describe()
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
count30.00000030.00000030.00000030.00000030.000000
mean70.06666783.80000068.30000084.4000002019.833333
std5.4641995.5547436.9239717.1948260.949894
min62.00000075.00000060.00000075.0000002018.000000
25%64.25000080.25000062.00000079.0000002019.000000
50%70.00000083.00000067.00000083.5000002020.000000
75%74.75000087.50000074.50000089.0000002021.000000
max80.00000095.00000080.000000100.0000002021.000000
nums = df.columns[:-1].tolist()

General marks distribution on pairplots showing students performance over 2018-2021 period

sns.pairplot(df, vars=nums, hue=df.columns[-1])
plt.show()

Mean score for each year recorded

fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 7))
grouped = df.groupby(df.columns[-1])
for i, j in enumerate(nums):
    mean = grouped[j].mean()
    sns.barplot(x=mean.index, y=mean, ax=axes[i])
    axes[i].set_xticklabels(axes[i].get_xticklabels(), rotation=90)
    for container in axes[i].containers:
        axes[i].bar_label(container, rotation=90, label_type="center")
    axes[i].set_ylabel("")
    axes[i].set_xlabel("")
    axes[i].set_title(j.replace('_', ' '))
plt.tight_layout()
plt.show()
machine learning projects
machine learning projects with source code

General marks distribution on boxplots

fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(13, 5))

for i, j in enumerate(nums):
    sns.boxplot(df, x=j, ax=axes[i])
    axes[i].set_xlabel("")
    axes[i].set_title(j.replace('_', ' '))
plt.tight_layout()
plt.show()
machine learning projects
machine learning projects with source code
machine learning projects github
machine learning projects for final year
machine learning projects for students

Marks distribution on boxplots for each year

fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(13, 5))

for i, j in enumerate(nums):
    sns.boxplot(df, x=df.columns[-1], y=j, ax=axes[i])
    axes[i].set_xlabel("")
    axes[i].set_ylabel("")
    axes[i].set_title(j.replace('_', ' '))
plt.tight_layout()
plt.show()
machine learning projects
machine learning projects with source code
machine learning projects github
machine learning projects for final year
machine learning projects for students

Correlation of marsk between subjects

corr = df[nums].corr()
corr.style.background_gradient(cmap='coolwarm')
 Math_ScoreReading_ScoreWriting_ScorePlacement_Score
Math_Score1.000000-0.1063380.2172830.210682
Reading_Score-0.1063381.000000-0.6152240.310959
Writing_Score0.217283-0.6152241.000000-0.293212
Placement_Score0.2106820.310959-0.2932121.000000

Machine Learning Mode to Predict Membership Trend Over Time

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv("/kaggle/input/studentsperformance/StudentsPerformance.csv")
df.head(2)
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
0658667782021
1648571802019
# Visualization 1: Pairplot for correlation analysis
sns.pairplot(df)
plt.title('Pairplot of Scores and Placement')
plt.show()
machine learning projects
machine learning projects with source code
machine learning projects github
machine learning projects for final year
machine learning projects for students
# Visualization 2: Heatmap for correlation analysis
corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap')
plt.show()
# Visualization 3: Distribution of Math Scores
sns.histplot(df['Math_Score'], kde=True, color='skyblue')
plt.title('Distribution of Math Scores')
plt.xlabel('Math Score')
plt.ylabel('Frequency')
plt.show()
# Visualization 4: Boxplot of Reading Scores
sns.boxplot(x=df['Reading_Score'], color='salmon')
plt.title('Boxplot of Reading Scores')
plt.xlabel('Reading Score')
plt.show()
# Visualization 5: Time series plot of Club Join Dates
df['Club_Join_Date'] = pd.to_datetime(df['Club_Join_Date'], format='%Y')
df['Year'] = df['Club_Join_Date'].dt.year
club_counts = df['Year'].value_counts().sort_index()
sns.lineplot(x=club_counts.index, y=club_counts.values, marker='o', color='green')
plt.title('Club Join Dates Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Joinings')
plt.xticks(rotation=45)
plt.show()
import numpy as np
import pandas as pd
data=pd.read_csv("/kaggle/input/studentsperformance/StudentsPerformance.csv")
data
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
0658667782021
1648571802019
2767777842021
3807675752021
4639162902019
5739562792020
6728276792020
7778262872021
87490601002019
9688572892019
10647880842019
11758376832019
12628961762019
13698073872021
14748480792019
15698566782019
16647568752021
17758176952019
18738073752020
19759262972020
20698260932021
21689066832019
22668862842018
23757580892021
24808364802020
25719560952020
26638370812019
27627767782021
28648360842021
29728261952019
data.head()
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
0658667782021
1648571802019
2767777842021
3807675752021
4639162902019
data.tail()
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
25719560952020
26638370812019
27627767782021
28648360842021
29728261952019
data.isnull()
ml process
kaggle machine learning projects
machine learning project manager
machine learning project management
machine learning projects for masters students
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
0FalseFalseFalseFalseFalse
1FalseFalseFalseFalseFalse
2FalseFalseFalseFalseFalse
3FalseFalseFalseFalseFalse
4FalseFalseFalseFalseFalse
5FalseFalseFalseFalseFalse
6FalseFalseFalseFalseFalse
7FalseFalseFalseFalseFalse
8FalseFalseFalseFalseFalse
9FalseFalseFalseFalseFalse
10FalseFalseFalseFalseFalse
11FalseFalseFalseFalseFalse
12FalseFalseFalseFalseFalse
13FalseFalseFalseFalseFalse
14FalseFalseFalseFalseFalse
15FalseFalseFalseFalseFalse
16FalseFalseFalseFalseFalse
17FalseFalseFalseFalseFalse
18FalseFalseFalseFalseFalse
19FalseFalseFalseFalseFalse
20FalseFalseFalseFalseFalse
21FalseFalseFalseFalseFalse
22FalseFalseFalseFalseFalse
23FalseFalseFalseFalseFalse
24FalseFalseFalseFalseFalse
25FalseFalseFalseFalseFalse
26FalseFalseFalseFalseFalse
27FalseFalseFalseFalseFalse
28FalseFalseFalseFalseFalse
29FalseFalseFalseFalseFalse
data.notnull()
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
0TrueTrueTrueTrueTrue
1TrueTrueTrueTrueTrue
2TrueTrueTrueTrueTrue
3TrueTrueTrueTrueTrue
4TrueTrueTrueTrueTrue
5TrueTrueTrueTrueTrue
6TrueTrueTrueTrueTrue
7TrueTrueTrueTrueTrue
8TrueTrueTrueTrueTrue
9TrueTrueTrueTrueTrue
10TrueTrueTrueTrueTrue
11TrueTrueTrueTrueTrue
12TrueTrueTrueTrueTrue
13TrueTrueTrueTrueTrue
14TrueTrueTrueTrueTrue
15TrueTrueTrueTrueTrue
16TrueTrueTrueTrueTrue
17TrueTrueTrueTrueTrue
18TrueTrueTrueTrueTrue
19TrueTrueTrueTrueTrue
20TrueTrueTrueTrueTrue
21TrueTrueTrueTrueTrue
22TrueTrueTrueTrueTrue
23TrueTrueTrueTrueTrue
24TrueTrueTrueTrueTrue
25TrueTrueTrueTrueTrue
26TrueTrueTrueTrueTrue
27TrueTrueTrueTrueTrue
28TrueTrueTrueTrueTrue
29TrueTrueTrueTrueTrue
data.dropna()
ml process
kaggle machine learning projects
machine learning project manager
machine learning project management
machine learning projects for masters students
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
0658667782021
1648571802019
2767777842021
3807675752021
4639162902019
5739562792020
6728276792020
7778262872021
87490601002019
9688572892019
10647880842019
11758376832019
12628961762019
13698073872021
14748480792019
15698566782019
16647568752021
17758176952019
18738073752020
19759262972020
20698260932021
21689066832019
22668862842018
23757580892021
24808364802020
25719560952020
26638370812019
27627767782021
28648360842021
29728261952019
data['Math_Score'].fillna(value=0,inplace=True)
data
/tmp/ipykernel_18/43567279.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data['Math_Score'].fillna(value=0,inplace=True)
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
0658667782021
1648571802019
2767777842021
3807675752021
4639162902019
5739562792020
6728276792020
7778262872021
87490601002019
9688572892019
10647880842019
11758376832019
12628961762019
13698073872021
14748480792019
15698566782019
16647568752021
17758176952019
18738073752020
19759262972020
20698260932021
21689066832019
22668862842018
23757580892021
24808364802020
25719560952020
26638370812019
27627767782021
28648360842021
29728261952019
data.fillna(0,inplace=True)
data
step machine learning
step of machine learning
ml projects
ml project
machine learning python projects
machine learning projects in python
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
0658667782021
1648571802019
2767777842021
3807675752021
4639162902019
5739562792020
6728276792020
7778262872021
87490601002019
9688572892019
10647880842019
11758376832019
12628961762019
13698073872021
14748480792019
15698566782019
16647568752021
17758176952019
18738073752020
19759262972020
20698260932021
21689066832019
22668862842018
23757580892021
24808364802020
25719560952020
26638370812019
27627767782021
28648360842021
29728261952019
num=data._get_numeric_data()
num[num<0]=0
data
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
0658667782021
1648571802019
2767777842021
3807675752021
4639162902019
5739562792020
6728276792020
7778262872021
87490601002019
9688572892019
10647880842019
11758376832019
12628961762019
13698073872021
14748480792019
15698566782019
16647568752021
17758176952019
18738073752020
19759262972020
20698260932021
21689066832019
22668862842018
23757580892021
24808364802020
25719560952020
26638370812019
27627767782021
28648360842021
29728261952019
import seaborn as sns
sns.boxplot(data['Math_Score'])
<Axes: >
step machine learning
step of machine learning
ml projects
ml project
machine learning python projects
machine learning projects in python
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize = (18,10))
ax.scatter(data['Writing_Score'], data['Reading_Score'])

ax.set_xlabel('Math_Score')
ax.set_ylabel('Reading_Score')
plt.show()
step machine learning
step of machine learning
ml projects
ml project
machine learning python projects
machine learning projects in python
from scipy import stats
import numpy as np

z=np.abs(stats.zscore(data['Reading_Score']))
print(z)
0     0.402829
1     0.219725
2     1.245107
3     1.428211
4     1.318348
5     2.050764
6     0.329587
7     0.329587
8     1.135244
9     0.219725
10    1.062003
11    0.146483
12    0.952140
13    0.695795
14    0.036621
15    0.219725
16    1.611314
17    0.512691
18    0.695795
19    1.501452
20    0.329587
21    1.135244
22    0.769036
23    1.611314
24    0.146483
25    2.050764
26    0.146483
27    1.245107
28    0.146483
29    0.329587
Name: Reading_Score, dtype: float64
threshold=3
print(np.where(z>3))
(array([], dtype=int64),)
Q1=np.percentile(data['Reading_Score'],25,interpolation='midpoint')
Q3=np.percentile(data['Reading_Score'],75,interpolation='midpoint')
IQR=Q3-Q1
IQR
6.5
data.fillna(0,inplace=True)
print(data.to_string())
github artificial intelligence-projects
machine learning project life cycle
machine learning project python
machine learning projects python
deep learning projects for masters students
    Math_Score  Reading_Score  Writing_Score  Placement_Score  Club_Join_Date
0           65             86             67               78            2021
1           64             85             71               80            2019
2           76             77             77               84            2021
3           80             76             75               75            2021
4           63             91             62               90            2019
5           73             95             62               79            2020
6           72             82             76               79            2020
7           77             82             62               87            2021
8           74             90             60              100            2019
9           68             85             72               89            2019
10          64             78             80               84            2019
11          75             83             76               83            2019
12          62             89             61               76            2019
13          69             80             73               87            2021
14          74             84             80               79            2019
15          69             85             66               78            2019
16          64             75             68               75            2021
17          75             81             76               95            2019
18          73             80             73               75            2020
19          75             92             62               97            2020
20          69             82             60               93            2021
21          68             90             66               83            2019
22          66             88             62               84            2018
23          75             75             80               89            2021
24          80             83             64               80            2020
25          71             95             60               95            2020
26          63             83             70               81            2019
27          62             77             67               78            2021
28          64             83             60               84            2021
29          72             82             61               95            2019
Q1=np.percentile(data['Math_Score'],25,interpolation='midpoint')
Q3=np.percentile(data['Math_Score'],75,interpolation='midpoint')
IQR=Q3-Q1
IQR
10.0
upper= data['Math_Score']>=(Q3+1.5*IQR)
print("Upper Bound:",upper)
print(np.where(upper))

lower=data['Math_Score']<=(Q1-1.5*IQR)
print("Lower Bound:",lower)
print(np.where(lower))
Upper Bound: 0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
Name: Math_Score, dtype: bool
(array([], dtype=int64),)
Lower Bound: 0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
Name: Math_Score, dtype: bool
(array([], dtype=int64),)
data.skew(axis=0,skipna=True)
Math_Score         0.068575
Reading_Score      0.349884
Writing_Score      0.333115
Placement_Score    0.580572
Club_Join_Date     0.095792
dtype: float64
Q1= np.percentile(data['Math_Score'],25,
                  interpolation = 'midpoint')
Q3= np.percentile(data['Math_Score'],75,
                  interpolation = 'midpoint')
IQR=Q3-Q1
upper = np.where(data['Math_Score']>=(Q3+1.5*IQR))

lower=np.where(data['Math_Score']<=(Q1-1.5*IQR))
data.drop(upper[0],inplace=True)
data.drop(lower[0],inplace=True)

                  
#outliner detection
arr=np.where(z>3)[0]

print(arr)
print("total outliners:",len(arr))
res=data.iloc[arr]
res
[]
total outliners: 0
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
data.isna().sum()
Math_Score         0
Reading_Score      0
Writing_Score      0
Placement_Score    0
Club_Join_Date     0
dtype: int64
null_columns=data.columns[data.isnull().any()].tolist()
print("null",null_columns)
data.dtypes
null []
Math_Score         int64
Reading_Score      int64
Writing_Score      int64
Placement_Score    int64
Club_Join_Date     int64
dtype: object
for column in null_columns:
    data[column]=pd.to_numeric(data[column],errors='coerce').astype('float')
data
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
0658667782021
1648571802019
2767777842021
3807675752021
4639162902019
5739562792020
6728276792020
7778262872021
87490601002019
9688572892019
10647880842019
11758376832019
12628961762019
13698073872021
14748480792019
15698566782019
16647568752021
17758176952019
18738073752020
19759262972020
20698260932021
21689066832019
22668862842018
23757580892021
24808364802020
25719560952020
26638370812019
27627767782021
28648360842021
29728261952019
data.ffill()
data.bfill()
data.isna().sum()
Math_Score         0
Reading_Score      0
Writing_Score      0
Placement_Score    0
Club_Join_Date     0
dtype: int64
z=np.abs(stats.zscore(data['Math_Score']))
z
0     0.943099
1     1.129237
2     1.104419
3     1.848971
4     1.315375
5     0.546005
6     0.359867
7     1.290557
8     0.732143
9     0.384685
10    1.129237
11    0.918281
12    1.501513
13    0.198547
14    0.732143
15    0.198547
16    1.129237
17    0.918281
18    0.546005
19    0.918281
20    0.198547
21    0.384685
22    0.756961
23    0.918281
24    1.848971
25    0.173729
26    1.315375
27    1.501513
28    1.129237
29    0.359867
Name: Math_Score, dtype: float64
data_no_outliers= data[(z<=3)]
data=data[z<=3]
data
machine learning projects
machine learning projects with source code
machine learning projects github
machine learning projects for final year
machine learning projects for students
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_Date
0658667782021
1648571802019
2767777842021
3807675752021
4639162902019
5739562792020
6728276792020
7778262872021
87490601002019
9688572892019
10647880842019
11758376832019
12628961762019
13698073872021
14748480792019
15698566782019
16647568752021
17758176952019
18738073752020
19759262972020
20698260932021
21689066832019
22668862842018
23757580892021
24808364802020
25719560952020
26638370812019
27627767782021
28648360842021
29728261952019
Q1=data['Math_Score'].quantile(0.25)
Q3=np.percentile(data['Math_Score'],75,interpolation='midpoint')

IQR=Q3-Q1

print("Q1:",Q1)
print("Q3:",Q3)
print("IQR:",IQR)
Q1: 64.25
Q3: 74.5
IQR: 10.25
upper=data['Math_Score']>=(Q3+1.5*IQR)
print("Upper Bound:",Q3+1.5*IQR)
print(np.where(upper))

lower= data['Math_Score']<=(Q1-1.5*IQR)
print("Lower bound:",Q1-1.5*IQR)
print(np.where(lower))
Upper Bound: 89.875
(array([], dtype=int64),)
Lower bound: 48.875
(array([], dtype=int64),)
Q1=data['Math_Score'].quantile(0.25)
Q3=np.percentile(data['Math_Score'],75,interpolation='midpoint')

IQR=Q3-Q1

print("Q1:",Q1)
print("Q3:",Q3)
print("IQR:",IQR)
Q1: 64.25
Q3: 74.5
IQR: 10.25
data.plot(kind='scatter',x='Reading_Score',y='Math_Score',alpha=1,color='blue')
<Axes: xlabel='Reading_Score', ylabel='Math_Score'>
machine learning projects
machine learning projects with source code
machine learning projects github
machine learning projects for final year
machine learning projects for students
print("OLD skew",data['Math_Score'].skew())
data.plot(kind='hist',y='Math_Score')
OLD skew 0.0685751962334722
<Axes: ylabel='Frequency'>
machine learning projects
machine learning projects with source code
machine learning projects github
machine learning projects for final year
machine learning projects for students
print("New Skew",data['Math_Score'].skew())
New Skew 0.0685751962334722
machine learning projects reddit
reddit ai subreddit
machine learning interesting projects
good machine learning projects
Q1=data['Math_Score'].quantile(0.25)
Q3=np.percentile(data['Math_Score'],75,interpolation='midpoint')

IQR=Q3-Q1

print("Q1:",Q1)
print("Q3:",Q3)
print("IQR:",IQR)

print("OLD skew",data['Writing_Score'].skew())
data.plot(kind='hist',y='Writing_Score')
print("OLD skew",data['Writing_Score'].skew())
data.plot(kind='hist',y='Writing_Score')
Q1: 64.25
Q3: 74.5
IQR: 10.25
OLD skew 0.3331152741378783
OLD skew 0.3331152741378783
<Axes: ylabel='Frequency'>
machine learning projects reddit
reddit ai subreddit
machine learning interesting projects
good machine learning projects
machine learning projects reddit
reddit ai subreddit
machine learning interesting projects
good machine learning projects
data.hist()
array([[<Axes: title={'center': 'Math_Score'}>,
        <Axes: title={'center': 'Reading_Score'}>],
       [<Axes: title={'center': 'Writing_Score'}>,
        <Axes: title={'center': 'Placement_Score'}>],
       [<Axes: title={'center': 'Club_Join_Date'}>, <Axes: >]],
      dtype=object)
machine learning projects reddit
reddit ai subreddit
machine learning interesting projects
good machine learning projects
data.skew()
Math_Score         0.068575
Reading_Score      0.349884
Writing_Score      0.333115
Placement_Score    0.580572
Club_Join_Date     0.095792
dtype: float64
data['Writing_Score copy']=np.sqrt(data['Writing_Score'])
data.plot(kind='hist',y='Writing_Score copy')
data
deep learning projects github
deep learning project github
github artificial intelligence projects
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_DateWriting_Score copy
06586677820218.185353
16485718020198.426150
27677778420218.774964
38076757520218.660254
46391629020197.874008
57395627920207.874008
67282767920208.717798
77782628720217.874008
874906010020197.745967
96885728920198.485281
106478808420198.944272
117583768320198.717798
126289617620197.810250
136980738720218.544004
147484807920198.944272
156985667820198.124038
166475687520218.246211
177581769520198.717798
187380737520208.544004
197592629720207.874008
206982609320217.745967
216890668320198.124038
226688628420187.874008
237575808920218.944272
248083648020208.000000
257195609520207.745967
266383708120198.366600
276277677820218.185353
286483608420217.745967
297282619520197.810250
deep learning projects github
deep learning project github
github artificial intelligence projects
sns.boxplot(x="Math_Score",data=data)
<Axes: xlabel='Math_Score'>
deep learning projects github
deep learning project github
github artificial intelligence projects
data['Math_Score']=data['Math_Score'].fillna(data['Math_Score'].mean())
data['Math_Score']=data['Math_Score'].fillna(data['Math_Score'].median())
data['Math_Score']=data['Math_Score'].fillna(data['Math_Score'].std())
data
ml projects ideas
project manager artificial intelligence
best machine learning courses reddit
Math_ScoreReading_ScoreWriting_ScorePlacement_ScoreClub_Join_DateWriting_Score copy
06586677820218.185353
16485718020198.426150
27677778420218.774964
38076757520218.660254
46391629020197.874008
57395627920207.874008
67282767920208.717798
77782628720217.874008
874906010020197.745967
96885728920198.485281
106478808420198.944272
117583768320198.717798
126289617620197.810250
136980738720218.544004
147484807920198.944272
156985667820198.124038
166475687520218.246211
177581769520198.717798
187380737520208.544004
197592629720207.874008
206982609320217.745967
216890668320198.124038
226688628420187.874008
237575808920218.944272
248083648020208.000000
257195609520207.745967
266383708120198.366600
276277677820218.185353
286483608420217.745967
297282619520197.810250
import math
data2=data.copy()
for i in data2.index:
    data2.at[i,'Math_Score']=math.log(data2['Math_Score'][i])
/tmp/ipykernel_18/3405897247.py:4: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '4.174387269895637' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  data2.at[i,'Math_Score']=math.log(data2['Math_Score'][i])
data2.skew(axis=0,skipna=True)
Math_Score           -0.028399
Reading_Score         0.349884
Writing_Score         0.333115
Placement_Score       0.580572
Club_Join_Date        0.095792
Writing_Score copy    0.288178
dtype: float64
data.skew(axis=0,skipna=True)
Math_Score            0.068575
Reading_Score         0.349884
Writing_Score         0.333115
Placement_Score       0.580572
Club_Join_Date        0.095792
Writing_Score copy    0.288178
dtype: float64
from scipy import stats
boxcox=stats.boxcox(data['Math_Score'])[0]
pd.Series(boxcox).skew()
-0.007281337455985359

Conclusion

To sum up, this blog extensively explored student performance data from 2018 to 2021. It covered a wide range of aspects, including the distribution of marks, average scores for each year, visual representations of marks distribution, correlations between subjects, and identification and handling of outliers.

Moreover, it showcased the use of machine learning models to predict membership trends over time.

machine learning projects for resume
machine learning project for resume
best machine learning projects
cool machine learning projects

By utilizing visualization techniques such as pairplots, boxplots, heatmaps, and histograms, the blog effectively communicated insights about the dataset’s characteristics and relationships.

It also discussed important steps in data preprocessing, such as dealing with missing values and outliers, as well as applying transformations like log transformation and Box-Cox transformation to enhance data distribution.


6 Comments

Machine Learning Project 4: Best Explore Video Game Data · May 27, 2024 at 1:11 pm

[…] Machine Learning Project 5: Best Students Performance EDA […]

Machine Learning Project 3: Best Explore Indian Cuisine · May 27, 2024 at 1:12 pm

[…] Machine Learning Project 5: Best Students Performance EDA […]

Machine Learning Project 2: Diversity Tech Company Best EDA · May 27, 2024 at 1:12 pm

[…] Machine Learning Project 5: Best Students Performance EDA […]

Machine Learning Project 1: Honda Motor Stocks Best Prices · May 27, 2024 at 1:13 pm

[…] Machine Learning Project 5: Best Students Performance EDA […]

ML Project 6: Obesity Type Best EDA And Classification · May 27, 2024 at 1:37 pm

[…] Machine Learning Project 5: Best Students Performance EDA […]

Best ML Project: Machine Learning Engineer Salary In 2024 · May 28, 2024 at 6:22 pm

[…] Machine Learning Project 5: Best Students Performance EDA […]

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *