Best ML Project: Machine Learning Engineer Salary In 2024

Machine Learning Project 7: Machine Learning Engineer Salary in 2024

Introduction

Hello there! Welcome to our blog where we’re delving into the captivating realm of Machine Learning Engineer salaries for 2024. Interested in knowing how much these tech wizards are earning? You’ve come to the perfect place.

Also, check Machine Learning projects:

Machine Learning is currently one of the most sought-after fields in the tech industry. Whether you’re considering a career in this field or simply curious about the pay scale, we’ve got all the exciting details.

Here’s what we’ll cover:

Salary Ranges: We’ll provide a breakdown of the average salaries you can expect.
Industry Demand: Discover which sectors are offering top dollar for AI talent.
Location, Location, Location: Learn how geography can impact your paycheck.

So grab a cup of coffee and get cozy, because we’re about to unravel everything you need to know about Machine Learning Engineer salaries in 2024. Let’s dive in!

project machine learning
machine learning certification
certification machine learning

Machine Learning Engineer Salary in 2024 EDA

Dataset Info

Description of the features in dataset:

work_year: The year in which the salary data was collected (e.g., 2024).
experience_level: The level of experience of the employee (e.g., MI for Mid-Level).
employment_type: The type of employment (e.g., FT for Full-Time).
job_title: The title of the job (e.g., Data Scientist).
salary: The salary amount.
salary_currency: The currency in which the salary is denominated (e.g., USD for US Dollars).
salary_in_usd: The salary amount converted to US Dollars.
employee_residence: The country of residence of the employee (e.g., AU for Australia).
remote_ratio: The ratio indicating the level of remote work (0 for no remote work).
company_location: The location of the company (e.g., AU for Australia).
company_size: The size of the company (e.g., S for Small).

Import Dependencies

import warnings
warnings.filterwarnings("ignore")

import os
import squarify
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import clear_output
from wordcloud import WordCloud

# Verify input 
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/machine-learning-engineer-salary-in-2024/salaries.csv

machine learning projects github
machine learning projects for final year
machine learning projects for students

Load Dataset

Dataset Link: https://www.kaggle.com/code/maskara31/machine-learning-engineer-salary-in-2024-eda

data_salary = pd.read_csv('/kaggle/input/machine-learning-engineer-salary-in-2024/salaries.csv')

# show dataset
data_salary.head()

	work_year	experience_level	employment_type	job_title	salary	salary_currency	salary_in_usd	employee_residence	company_location	company_size
0	2024	MI	FT	Data Scientist	120000	USD	120000	AU	AU	S
1	2024	MI	FT	Data Scientist	70000	USD	70000	AU	AU	S
2	2024	MI	CT	Data Scientist	130000	USD	130000	US	US	M
3	2024	MI	CT	Data Scientist	110000	USD	110000	US	US	M
4	2024	MI	FT	Data Science Manager	240000	USD	240000	US	US	M

# Verify dataset info

data_salary.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16494 entries, 0 to 16493
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   work_year           16494 non-null  int64 
 1   experience_level    16494 non-null  object
 2   employment_type     16494 non-null  object
 3   job_title           16494 non-null  object
 4   salary              16494 non-null  int64 
 5   salary_currency     16494 non-null  object
 6   salary_in_usd       16494 non-null  int64 
 7   employee_residence  16494 non-null  object
 8   remote_ratio        16494 non-null  int64 
 9   company_location    16494 non-null  object
 10  company_size        16494 non-null  object
dtypes: int64(4), object(7)
memory usage: 1.4+ MB

Dataset has 16494 rows and 12 columns

Visualization

Top 10 Job Titles with Highest Salaries

top_job_titles = data_salary.groupby('job_title')['salary_in_usd'].median().sort_values(ascending=False).head(10)

plt.figure(figsize=(10, 6))
top_job_titles.plot(kind='bar', color='blue')
plt.title('Top 10 Job Titles with Highest Salaries')
plt.xlabel('Job Title')
plt.ylabel('Median Salary (USD)')
plt.xticks(rotation=45, ha='right')
plt.show()

machine learning projects github
machine learning projects for final year
machine learning projects for students

Top 15 Job titles by Word Cloud

top_15_job_titles = data_salary['job_title'].value_counts().head(15)

title_counts = dict(top_15_job_titles)

wordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(title_counts)

plt.figure(figsize=(10, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Top 15 Job Titles')
plt.show()

Salaries Distribution

plt.figure(figsize=(10, 6))
sns.histplot(data=data_salary, x='salary_in_usd', kde=True)
plt.title('Salaries Distribution')
plt.xlabel('Salary (USD)')
plt.ylabel('Frequency')
plt.show()

ml process
kaggle machine learning projects

Salaries Distribution by company size

plt.figure(figsize=(10, 6))
sns.violinplot(data=data_salary, x='company_size', y='salary_in_usd')
plt.title('Salaries Distribution by company size')
plt.xlabel('Company size')
plt.ylabel('Salary (USD)')
plt.show()

Distribution of Salaries by Experience Level

plt.subplots(figsize=(10, 6))
sns.set_color_codes("pastel")
sns.barplot(x='experience_level', y='salary_in_usd', data=data_salary)
plt.xticks([0, 1, 2, 3], ['EN', 'MI', 'SE', 'EX'])
plt.title('Salary Distribution by experience level')
plt.xlabel('Experience level')
plt.ylabel('Salary (USD)')
plt.show()

machine learning project manager
machine learning project management
machine learning projects for masters students

Salaries Distribution by Experience Level and Employment Type

plt.figure(figsize=(12, 6))
sns.set_color_codes("pastel")
sns.barplot(x='experience_level', y='salary_in_usd', hue='employment_type', data=data_salary,)
plt.title('Salary Distribution by Experience Level and Employment Type')
plt.xlabel('Experience Level')
plt.ylabel('Salary (USD)')
plt.legend(title='Employment Type')
plt.show()

Average salaries by level of experience over the years

plt.figure(figsize=(10, 6))
sns.lineplot(data=data_salary, x='work_year', y='salary_in_usd', hue='experience_level', estimator='mean', ci=None)
plt.title('Average salaries by level of experience over the years')
plt.xlabel('Year')
plt.ylabel('Salary (USD)')
plt.legend(title='Experience level', labels=['EN', 'MI', 'SE', 'EX'])
plt.show()

Salary trends over the years

plt.figure(figsize=(10, 6))
sns.lineplot(data=data_salary, x='work_year', y='salary_in_usd', estimator='mean', ci=None)
plt.title('Salary trends over the years')
plt.xlabel('Year')
plt.ylabel('Salary (USD)')
plt.show()

Average Salaries by Company Size Over the Years

plt.figure(figsize=(10, 6))
data_salary.groupby(['work_year', 'company_size'])['salary_in_usd'].mean().unstack().plot(kind='line', marker='o')
plt.title('Average Salaries by Company Size Over the Years')
plt.xlabel('Year')
plt.ylabel('Average salary (USD)')
plt.legend(title='Company size')
plt.show()

<Figure size 1000x600 with 0 Axes>

2024 Data and AI Profession Salary Insights

Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import LabelEncoder
from sklearn.cluster import KMeans
from sklearn.preprocessing import RobustScaler
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_samples, silhouette_score

from scipy.cluster import hierarchy

step machine learning
step of machine learning
ml projects

Data Exploration

df = pd.read_csv('/kaggle/input/machine-learning-engineer-salary-in-2024/salaries.csv')

df.head()

	work_year	experience_level	employment_type	job_title	salary	salary_currency	salary_in_usd	employee_residence	company_location	company_size
0	2024	MI	FT	Data Scientist	120000	USD	120000	AU	AU	S
1	2024	MI	FT	Data Scientist	70000	USD	70000	AU	AU	S
2	2024	MI	CT	Data Scientist	130000	USD	130000	US	US	M
3	2024	MI	CT	Data Scientist	110000	USD	110000	US	US	M
4	2024	MI	FT	Data Science Manager	240000	USD	240000	US	US	M

df.shape

(16494, 11)

df.columns

Index(['work_year', 'experience_level', 'employment_type', 'job_title',
       'salary', 'salary_currency', 'salary_in_usd', 'employee_residence',
       'remote_ratio', 'company_location', 'company_size'],
      dtype='object')

df.dtypes

work_year              int64
experience_level      object
employment_type       object
job_title             object
salary                 int64
salary_currency       object
salary_in_usd          int64
employee_residence    object
remote_ratio           int64
company_location      object
company_size          object
dtype: object

df.isnull().sum()

work_year             0
experience_level      0
employment_type       0
job_title             0
salary                0
salary_currency       0
salary_in_usd         0
employee_residence    0
remote_ratio          0
company_location      0
company_size          0
dtype: int64

EDA

df.describe()

	work_year	salary	salary_in_usd	remote_ratio
count	16494.000000	1.649400e+04	16494.000000	16494.000000
mean	2023.224991	1.637878e+05	149713.575725	32.044986
std	0.713405	3.406017e+05	68516.136918	46.260201
min	2020.000000	1.400000e+04	15000.000000	0.000000
25%	2023.000000	1.020000e+05	101517.500000	0.000000
50%	2023.000000	1.422000e+05	141300.000000	0.000000
75%	2024.000000	1.873422e+05	185900.000000	100.000000
max	2024.000000	3.040000e+07	800000.000000	100.000000

df.describe(include='object')

	experience_level	employment_type	job_title	salary_currency	employee_residence	company_location	company_size
count	16494	16494	16494	16494	16494	16494	16494
unique	4	4	155	23	88	77	3
top	SE	FT	Data Engineer	USD	US	US	M
freq	10652	16414	3456	15254	14427	14478	15268

ml project
machine learning python projects
machine learning projects in python

Univariate Analysis

def annotate_axes(ax, axis='y'):
    for p in ax.patches:
        if axis == 'y':
            ax.annotate(format(p.get_height(), '.0f'), 
                        (p.get_x() + p.get_width() / 2., p.get_height()), 
                        ha='center', va='center', 
                        xytext=(0, 9), 
                        textcoords='offset points')
        elif axis == 'x':
            ax.annotate(format(p.get_width(), '.0f'),  
                        (p.get_width(), p.get_y() + p.get_height() / 2.),  
                        ha='center', va='center', 
                        xytext=(9, 0), 
                        textcoords='offset points')

def plot_count_graph(col: str, title: str, xlabel: str, ylabel: str, _df: pd.DataFrame, axis: str, 
                     figsize=(10, 6), ordered: bool = False, asc: bool = False, 
                     sort_by: str = 'count'):
    plt.figure(figsize=figsize)
    
    order = _df[col].unique()
    if ordered:
        if sort_by == 'count':
            order = _df[col].value_counts().sort_values(ascending=asc).index
        elif sort_by == 'values':
            order = sorted(_df[col].unique(), reverse=not asc)
            
    if axis == 'x':
        ax = sns.countplot(data=_df, x=col, palette='Pastel1', order=order)
        annotate_axes(ax, axis='y')
        plt.grid(axis='y')
    elif axis == 'y':
        ax = sns.countplot(data=_df, y=col, palette='Pastel1', order=order)
        annotate_axes(ax, axis='x')
        plt.grid(axis='x')
        
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.xticks(rotation=45)
    plt.show()

plot_count_graph(col='work_year', 
                 title='Year Distribution', 
                 xlabel='Year', 
                 ylabel='Count', 
                 _df=df, 
                 axis='x', 
                 ordered=True, 
                 asc=True, 
                 sort_by='values')

plot_count_graph(col='experience_level', 
                 title='Experience Level Distribution', 
                 xlabel='Level of Experience', 
                 ylabel='Count', 
                 _df=df, 
                 axis='x')

plot_count_graph(col='employment_type', 
                 title='Employment Type Distribution', 
                 xlabel='Employment Type', 
                 ylabel='Count', 
                 _df=df, 
                 axis='x')

machine learning project github
machine learning ideas
ml project ideas

plot_count_graph(col='job_title', 
                 title='Job Title Distribution', 
                 xlabel='Job Title', 
                 ylabel='Count', 
                 _df=df, 
                 axis='y', 
                 figsize=(12, 24),
                 ordered=True, 
                 sort_by='count')

plot_count_graph(col='salary_currency', 
                 title='Salary Currencies', 
                 xlabel='Salary Currencies', 
                 ylabel='Count', 
                 _df=df, 
                 ordered=True,
                 asc=True,
                 axis='x')

plot_count_graph(col='company_location', 
                 title='Company Locations', 
                 xlabel='Location', 
                 ylabel='Count', 
                 _df=df, 
                 ordered=True,
                 asc=False,
                 figsize=(10, 16),
                 axis='y')

plot_count_graph(col='employee_residence', 
                 title='Residence Distribution', 
                 xlabel='Residence', 
                 ylabel='Count', 
                 _df=df, 
                 ordered=True,
                 asc=False,
                 figsize=(10, 16),
                 axis='y')

cv machine learning
machine learning cv
machine learning projects github

plot_count_graph(col='company_size', 
                 title='Company Size Distribution', 
                 xlabel='Size', 
                 ylabel='Count', 
                 _df=df, 
                 ordered=True,
                 asc=False,
                 sort_by='values',
                 axis='x')

'salary', 'salary_in_usd', 'remote_ratio'

('salary', 'salary_in_usd', 'remote_ratio')

plot_count_graph(col='remote_ratio', 
                 title='Remote Ratio Distribution', 
                 xlabel='Ratio', 
                 ylabel='Count', 
                 _df=df, 
                 ordered=True,
                 asc=True,
                 sort_by='values',
                 axis='x')

plt.figure(figsize=(8, 6))
sns.histplot(data=df, x='salary', kde=True, color='skyblue', edgecolor='black')
plt.title('Distribution of Salary')
plt.grid(True)
plt.show()

/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):

ml projects ideas
project manager artificial intelligence
best machine learning courses reddit
machine learning projects for resume

plt.figure(figsize=(8, 3))
sns.histplot(data=df, x='salary_in_usd', kde=True, color='skyblue', edgecolor='black')
plt.title('Distribution of Salary')
plt.grid(True)
plt.show()

/opt/conda/lib/python3.10/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):

plt.figure(figsize=(16, 3))
sns.boxplot(data=df, x='salary_in_usd', color='skyblue')
plt.title('Box Plot of Salary in USD')
plt.grid(True)
plt.show()

ml model
machine learning projects
projects machine learning

Bivariate Analysis

plt.figure(figsize=(16, 8))
sns.boxplot(data=df, y='salary_in_usd', x='experience_level', palette='Pastel1')
plt.title('Distribution of Salary in USD Across Different Experience Levels')
plt.grid(True)
plt.show()

plt.figure(figsize=(16, 8))
sns.boxplot(data=df, y='salary_in_usd', x='employment_type', palette='Pastel1')
plt.title('Distribution of Salary in USD Across Different Employment Type')
plt.grid(True)
plt.show()

plt.figure(figsize=(16, 8))
sns.boxplot(data=df, y='salary_in_usd', x='company_size', palette='Pastel1')
plt.title('Distribution of Salary in USD Across Different Company Size')
plt.grid(True)
plt.show()

plt.figure(figsize=(20, 40))
sns.boxplot(data=df, x='salary_in_usd', y='job_title', palette='Pastel1')
plt.title('Distribution of Salary in USD Across Different Job Titles')
plt.grid(True)
plt.show()

plt.figure(figsize=(20, 40))
sns.boxplot(data=df, x='salary_in_usd', y='company_location', color='experience_level', palette='Pastel1', order=df['company_location'].value_counts().index)
plt.title('Distribution of Salary in USD Across Different Locations')
plt.grid(True)
plt.show()

stacked_data = df[~(df['company_location'] == 'US')].groupby(['company_location', 'employment_type']).size().unstack()

# Plot stacked bar graph
stacked_data.plot(kind='bar', stacked=True, figsize=(20, 10), colormap='Pastel1')
plt.title('Distribution of Employment Type Across Different Company Locations (Excluding US)')
plt.xlabel('Company Location')
plt.ylabel('Count')
plt.grid(True)
plt.legend(title='Employment Type')
plt.show()

stacked_data = df[(df['company_location'] == 'US')].groupby(['company_location', 'employment_type']).size().unstack()

# Plot stacked bar graph
stacked_data.plot(kind='bar', stacked=True, figsize=(20, 10), colormap='Pastel1')
plt.title('Distribution of Employment Type Across Different Company Locations (US Only)')
plt.xlabel('Company Location')
plt.ylabel('Count')
plt.grid(True)
plt.legend(title='Employment Type')
plt.show()

stacked_data = df[(df['employment_type']=='FT')].groupby(['experience_level', 'employment_type']).size().unstack()

# Plot stacked bar graph
stacked_data.plot(kind='bar', stacked=True, figsize=(20, 10), colormap='Pastel1')
plt.title('Distribution of Employment Type Across Different experience_level (Full Time Only)')
plt.xlabel('Company Location')
plt.ylabel('Count')
plt.grid(True)
plt.legend(title='Employment Type')
plt.show()

stacked_data = df[~(df['employment_type']=='FT')].groupby(['experience_level', 'employment_type']).size().unstack()

# Plot stacked bar graph
stacked_data.plot(kind='bar', stacked=True, figsize=(20, 10), colormap='Pastel1')
plt.title('Distribution of Employment Type Across Different experience_level (Excluding Full Time)')
plt.xlabel('Company Location')
plt.ylabel('Count')
plt.grid(True)
plt.legend(title='Employment Type')
plt.show()

le = LabelEncoder()
categorical_columns = ['experience_level', 'employment_type', 'job_title', 
                       'salary_currency', 'employee_residence', 'company_location', 'company_size']

df_encoded = df.copy().drop(columns=categorical_columns)
for column in categorical_columns:
    df_encoded[column + '_encoded'] = le.fit_transform(df[column])

p_corr = df_encoded.corr('spearman')
sns.set(rc={'figure.figsize':(20, 20)})
sns.heatmap(p_corr, annot=True)

<Axes: >

stacked_data = df.groupby(['job_title', 'experience_level']).size().unstack()
stacked_data['Total'] = stacked_data.sum(axis=1)
stacked_data = stacked_data.sort_values(by='Total', ascending=True)

stacked_data.drop(columns='Total').plot(kind='barh', stacked=True, figsize=(20, 40), colormap='Pastel1')
plt.title('Distribution of Employment Type Across Different Experience Levels (Excluding Full Time)')
plt.xlabel('Count')
plt.ylabel('Job Title')
plt.grid(True)
plt.legend(title='Experience Level')
plt.show()

_ = df.groupby(['work_year', 'job_title']).agg({'salary_in_usd': np.mean})

_

/tmp/ipykernel_18/4132009602.py:1: FutureWarning: The provided callable <function mean at 0x7942382084c0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  _ = df.groupby(['work_year', 'job_title']).agg({'salary_in_usd': np.mean})

		salary_in_usd
work_year	job_title
2020	AI Scientist	45896.000000
	Azure Data Engineer	100000.000000
	BI Data Analyst	98000.000000
	Big Data Engineer	97690.333333
	Business Data Analyst	110000.000000
…	…	…
2024	Research Analyst	127518.121212
	Research Engineer	206586.567164
	Research Scientist	204206.865741
	Robotics Engineer	140416.666667
	Robotics Software Engineer	196625.000000

362 rows × 1 columns

reshaped_df = _.reset_index().pivot_table(index='job_title', columns='work_year', values='salary_in_usd')
reshaped_df = reshaped_df.sort_index(axis=1)
reshaped_df = reshaped_df.fillna(0)

reshaped_df

work_year	2020	2021	2022	2023	2024
job_title
AI Architect	0.0	0.0	180000.0	250328.000000	258753.125000
AI Developer	0.0	0.0	275000.0	133266.823529	33333.000000
AI Engineer	0.0	0.0	107093.0	161487.829787	164797.028571
AI Product Manager	0.0	0.0	0.0	120000.000000	152650.000000
AI Programmer	0.0	0.0	40000.0	72858.800000	30000.000000
…	…	…	…	…	…
Sales Data Analyst	60000.0	0.0	0.0	0.000000	0.000000
Software Data Engineer	0.0	0.0	0.0	111627.666667	0.000000
Staff Data Analyst	29876.5	0.0	0.0	179998.000000	0.000000
Staff Data Scientist	164000.0	105000.0	0.0	0.000000	0.000000
Staff Machine Learning Engineer	0.0	185000.0	0.0	0.000000	0.000000

155 rows × 5 columns

Regional Disparities Analysis

average_salary_by_residence = df.groupby('employee_residence')['salary_in_usd'].mean().reset_index()
average_salary_by_residence.columns = ['Employee Residence', 'Average Salary']
print(average_salary_by_residence)

   Employee Residence  Average Salary
0                  AD    50745.000000
1                  AE    86000.000000
2                  AM    33500.000000
3                  AR    58461.538462
4                  AS    45555.000000
..                ...             ...
83                 UG    36000.000000
84                 US   157220.590351
85                 UZ    82000.000000
86                 VN    56733.333333
87                 ZA    53488.684211

[88 rows x 2 columns]

average_salary_by_location = df.groupby('company_location')['salary_in_usd'].mean().reset_index()
average_salary_by_location.columns = ['Company Location', 'Average Salary']
print(average_salary_by_location)

   Company Location  Average Salary
0                AD    50745.000000
1                AE    86000.000000
2                AM    50000.000000
3                AR    62444.444444
4                AS    31684.333333
..              ...             ...
72               TR    23094.666667
73               UA   105600.000000
74               US   156954.893355
75               VN    63000.000000
76               ZA    53488.684211

[77 rows x 2 columns]

import matplotlib.pyplot as plt

comparison = pd.merge(average_salary_by_residence, average_salary_by_location, left_on='Employee Residence', right_on='Company Location', how='outer', suffixes=('_Residence', '_Company'))

comparison_long = pd.melt(comparison, id_vars=['Employee Residence'], value_vars=['Average Salary_Residence', 'Average Salary_Company'],
                          var_name='Category', value_name='Average Salary')

plt.figure(figsize=(16, 32))
ax = sns.barplot(data=comparison_long, y='Employee Residence', x='Average Salary', hue='Category')
plt.title('Comparative Average Salary: Residence vs Company Location')
plt.xlabel('Average Salary in USD')
plt.ylabel('Employee Residence')
plt.legend(title='Category')
plt.show()

for p in ax.patches:
    ax.annotate(format(p.get_height(), '.0f'), 
                        (p.get_x() + p.get_width() / 2., p.get_height()), 
                        ha='center', va='center', 
                        xytext=(0, 9), 
                        textcoords='offset points')
plt.show()

Clustering

import warnings

warnings.filterwarnings("ignore")

def elbow_method(data):
    inertia = []
    K = range(1, 11)
    for k in K:
        model = KMeans(n_clusters=k, random_state=1)
        model.fit(data)
        inertia.append(model.inertia_)

    # Plot the elbow
    sns.set(style='whitegrid')
    plt.figure(figsize=(12, 6))
    sns.lineplot(x=K, y=inertia, marker='o', color='red')
    plt.title('Elbow plot for KMeans clustering')
    plt.xlabel('Number of clusters (k)')
    plt.ylabel('Inertia')
    plt.xticks(K)
    plt.show()

def silhouette_scores(data):
    K = range(2, 11)
    
    for k in K:
        model = KMeans(n_clusters=k, random_state=1)
        model.fit(data)
        s_score = silhouette_score(data, model.labels_)
        print("For k={}, silhouette score is {:.3f}".format(k, s_score))

def scale(data, scaler):
    scaled_data = scaler.fit_transform(data)
    return scaled_data, scaler

def visualize_centroid(data, model, x_col, y_col, x_center, y_center, title, x_label, y_label):
    plt.clf()
    fig, ax = plt.subplots(figsize=(10,7))
    sns.scatterplot(x=x_col, y=y_col, data=data, hue='Label', ax=ax, palette='deep')

    centers = model.cluster_centers_
    sns.scatterplot(x=centers[:,x_center], y=centers[:,y_center], s=500, alpha=0.8, marker='o',
                    ax=ax, legend=False, palette='deep')

    ax.set_title(title, fontsize=16)
    ax.set_xlabel(x_label, fontsize=12)
    ax.set_ylabel(y_label, fontsize=12)
    plt.show()

def train_kmeans(K, data, is_PCA=False, isScaled=False):
    columns = data.columns
    if isScaled:
        scaler = RobustScaler()
        data = scaler.fit_transform(data)
    else:
        scaler = None

    model = KMeans(n_clusters=K, random_state=1)
    model.fit(data)
    
    labels = model.labels_
    _data = pd.DataFrame(data, columns=columns).copy()
    _data['Label'] = labels

    if is_PCA:
        pca, _data_pca = pca_transform(_data.drop('Label', axis=1))
        _data_pca = pd.DataFrame(_data_pca)
        _data_pca['Label'] = labels
        return pca, model, _data_pca, scaler
    else:
        return model, _data, scaler

def pca_transform(dataset):
    pca = PCA(n_components=2)
    pca_df = pca.fit_transform(dataset)
    return pca, pca_df

elbow_method(df_encoded)

silhouette_scores(df_encoded)

For k=2, silhouette score is 0.983
For k=3, silhouette score is 0.975
For k=4, silhouette score is 0.961
For k=5, silhouette score is 0.561
For k=6, silhouette score is 0.564
For k=7, silhouette score is 0.537
For k=8, silhouette score is 0.538
For k=9, silhouette score is 0.538
For k=10, silhouette score is 0.537

df_encoded.columns

Index(['work_year', 'salary', 'salary_in_usd', 'remote_ratio',
       'experience_level_encoded', 'employment_type_encoded',
       'job_title_encoded', 'salary_currency_encoded',
       'employee_residence_encoded', 'company_location_encoded',
       'company_size_encoded'],
      dtype='object')

pca, model, clustered_data, scaler = train_kmeans(3, 
          df_encoded[['work_year', 'salary_in_usd', 'remote_ratio',
       'experience_level_encoded', 'employment_type_encoded',
       'job_title_encoded', 'salary_currency_encoded',
       'employee_residence_encoded', 'company_location_encoded',
       'company_size_encoded']], True)

visualize_centroid(clustered_data, model, 0, 1, 
                   0, 1, 
                   'K-Means Clustering of Professions by Job Title and Experience Level', 
                   'Job Title', 'Experience Level')

<Figure size 2000x2000 with 0 Axes>

df["Label"] = clustered_data['Label']

df.loc[(df['Label']) == 0].describe()

	work_year	salary	salary_in_usd	remote_ratio	Label
count	7704.000000	7704.000000	7704.000000	7704.000000	7704.0
mean	2023.228193	164109.469107	164190.948858	32.995846	0.0
std	0.661514	25136.066236	24868.952860	46.933011	0.0
min	2020.000000	104000.000000	126225.000000	0.000000	0.0
25%	2023.000000	142000.000000	142000.000000	0.000000	0.0
50%	2023.000000	160000.000000	160000.000000	0.000000	0.0
75%	2024.000000	185000.000000	185000.000000	100.000000	0.0
max	2024.000000	274965.000000	215300.000000	100.000000	0.0

df.loc[(df['Label']) == 1].describe()

	work_year	salary	salary_in_usd	remote_ratio	Label
count	6384.000000	6.384000e+03	6384.000000	6384.000000	6384.0
mean	2023.191416	1.245268e+05	88145.769893	33.505639	1.0
std	0.801073	5.400781e+05	25453.615700	46.316866	0.0
min	2020.000000	1.400000e+04	15000.000000	0.000000	1.0
25%	2023.000000	7.000000e+04	70000.000000	0.000000	1.0
50%	2023.000000	9.200000e+04	92000.000000	0.000000	1.0
75%	2024.000000	1.100000e+05	110000.000000	100.000000	1.0
max	2024.000000	3.040000e+07	126100.000000	100.000000	1.0

df.loc[(df['Label']) == 2].describe()

	work_year	salary	salary_in_usd	remote_ratio	Label
count	2406.000000	2.406000e+03	2406.000000	2406.000000	2406.0
mean	2023.303824	2.669317e+05	266719.057772	25.124688	2.0
std	0.613400	6.829962e+04	63748.333616	43.250047	0.0
min	2020.000000	2.000000e+05	215500.000000	0.000000	2.0
25%	2023.000000	2.300000e+05	230000.000000	0.000000	2.0
50%	2023.000000	2.500000e+05	250000.000000	0.000000	2.0
75%	2024.000000	2.810000e+05	281000.000000	50.000000	2.0
max	2024.000000	1.500000e+06	800000.000000	100.000000	2.0

df.loc[(df['Label']) == 0].describe(include='object')

	experience_level	employment_type	job_title	salary_currency	employee_residence	company_location	company_size
count	7704	7704	7704	7704	7704	7704	7704
unique	4	3	115	6	27	21	3
top	SE	FT	Data Scientist	USD	US	US	M
freq	5783	7694	1841	7633	7325	7333	7292

df.loc[(df['Label']) == 1].describe(include='object')

	experience_level	employment_type	job_title	salary_currency	employee_residence	company_location	company_size
count	6384	6384	6384	6384	6384	6384	6384
unique	4	4	143	23	85	75	3
top	SE	FT	Data Analyst	USD	US	US	M
freq	2959	6318	1708	5230	4810	4852	5769

df.loc[(df['Label']) == 2].describe(include='object')

	experience_level	employment_type	job_title	salary_currency	employee_residence	company_location	company_size
count	2406	2406	2406	2406	2406	2406	2406
unique	4	3	67	6	16	15	3
top	SE	FT	Machine Learning Engineer	USD	US	US	M
freq	1910	2402	519	2391	2292	2293	2207

machine learning project for resume
best machine learning projects
cool machine learning projects

AIML salaries 2022-2024 AutoViz+CatBoost+SHAP

Importing libraries and loading data

# Install Python packages using pip.

# The "!pip" command allows you to run shell commands in Jupyter Notebook or Colab cells.
# It is used here to install Python packages.
# The "-q" flag stands for "quiet," which means it will suppress output during installation.
# "feature_engine are the packages being installed.
# The "2>/dev/null" part redirects any error messages (stderr) to the null device, effectively silencing them.
# This is often used when you want to hide installation messages.
!pip install -q feature_engine autoviz>=0.1.803 dataprep 2>/dev/null

# Import necessary libraries
import numpy as np  # Import NumPy for handling numerical operations
import pandas as pd  # Import Pandas for data manipulation and analysis
import warnings  # Import Warnings to suppress unnecessary warnings

# Suppress warning messages
warnings.filterwarnings("ignore")

# Import AutoViz from the autoviz library for automated visualization of data
from autoviz import AutoViz_Class

# Import load_dataset and create_report from the dataprep library for data loading and EDA
from dataprep.datasets import load_dataset
from dataprep.eda import create_report

# Import SHAP for interpreting model predictions
import shap

# Import matplotlib for data visualization
import matplotlib.pyplot as plt

# Import CatBoostRegressor for building a regression model
from catboost import Pool, CatBoostRegressor

# Import mean_squared_error for evaluating model performance
from sklearn.metrics import mean_squared_error

# Import train_test_split for splitting the data into training and testing sets
from sklearn.model_selection import train_test_split

# Import RareLabelEncoder from feature_engine.encoding for encoding categorical features
from feature_engine.encoding import RareLabelEncoder

# Import CountVectorizer from sklearn.feature_extraction.text for text feature extraction
from sklearn.feature_extraction.text import CountVectorizer

# Import ast and re for working with text and regular expressions
import ast
import re

# Set Pandas options to display a maximum of 1000 rows
pd.set_option('display.max_rows', 1000)

Imported v0.1.901. Please call AutoViz in this sequence:
    AV = AutoViz_Class()
    %matplotlib inline
    dfte = AV.AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=1, lowess=False,
               chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30, save_plot_dir=None)

# Read the CSV file containing job salaries data into a DataFrame and remove duplicate rows
df0 = pd.read_csv("/kaggle/input/data-jobs-salaries/salaries.csv").drop_duplicates()

# Filter the DataFrame to include only rows where the 'work_year' is greater than or equal to 2022
df0 = df0[df0['work_year'] >= 2022]

# Print the shape of the DataFrame to display the number of rows and columns
print(df0.shape)

# Display a random sample of 5 rows from the DataFrame, transposed for better visibility
df0.sample(5).T

(11152, 11)

	17616	1355	5601	8480	9264
work_year	2023	2024	2024	2024	2023
experience_level	SE	MI	EN	MI	MI
employment_type	FT	FT	FT	FT	FT
job_title	Applied Scientist	Research Scientist	Data Analyst	Machine Learning Engineer	Business Intelligence Analyst
salary	72000	178200	137100	125100	119000
salary_currency	USD	USD	USD	USD	USD
salary_in_usd	72000	178200	137100	125100	119000
employee_residence	US	US	US	US	US
remote_ratio	0	0	0	0	0
company_location	US	US	US	US	US
company_size	L	M	M	M	M

# Read the CSV file containing data on data science salaries for 2023 into a DataFrame
df1 = pd.read_csv("/kaggle/input/data-science-salaries-2023/ds_salaries.csv").drop_duplicates()

# Filter the DataFrame to include only entries from the year 2022 and later
df1 = df1[df1['work_year'] >= 2022]

# Print the shape of the filtered DataFrame to show the number of rows and columns
print(df1.shape)

# Display a random sample of 5 rows from the filtered DataFrame, transposing for better readability
df1.sample(5).T

(2281, 11)

	177	2556	1397	1843	1859
work_year	2023	2022	2023	2022	2022
experience_level	SE	SE	EX	MI	SE
employment_type	FT	FT	FT	FT	FT
job_title	Data Engineer	Data Architect	Head of Data Science	Data Scientist	Data Scientist
salary	241000	230400	195800	150000	220000
salary_currency	USD	USD	USD	USD	USD
salary_in_usd	241000	230400	195800	150000	220000
employee_residence	US	US	US	US	US
remote_ratio	0	0	0	100	0
company_location	US	US	US	US	US
company_size	M	M	M	M	M

# Read the CSV file containing job salaries data into a Pandas DataFrame
df2 = pd.read_csv("/kaggle/input/data-jobs-salaries/salaries.csv").drop_duplicates()

# Filter the DataFrame to include only rows where the 'work_year' is greater than or equal to 2022
df2 = df2[df2['work_year'] >= 2022]

# Print the shape of the DataFrame to show the number of rows and columns
print(df2.shape)

# Display a random sample of 5 rows from the filtered DataFrame, transposing for better readability
df2.sample(5).T

(11152, 11)

	7758	5241	4957	5797	2103
work_year	2024	2024	2024	2024	2024
experience_level	EN	MI	EN	EN	EN
employment_type	FT	FT	FT	FT	FT
job_title	Data Analyst	Cloud Database Engineer	Machine Learning Engineer	Machine Learning Engineer	Data Analyst
salary	106875	106000	157900	99500	223500
salary_currency	USD	USD	USD	USD	USD
salary_in_usd	106875	106000	157900	99500	223500
employee_residence	US	US	CA	US	US
remote_ratio	0	0	100	0	0
company_location	US	US	CA	US	US
company_size	M	M	M	M	M

# Read the CSV file containing job salaries data into a Pandas DataFrame
df3 = pd.read_csv("/kaggle/input/d/willianoliveiragibin/data-jobs-salaries/salaries.csv").drop_duplicates()

# Filter the DataFrame to include only rows where the 'work_year' is greater than or equal to 2022
df3 = df3[df3['work_year']>=2022]

# Print the shape of the DataFrame to show the number of rows and columns
print(df3.shape)

# Display a random sample of 5 rows from the filtered DataFrame, transposing for better readability
df3.sample(5).T

(4402, 11)

	657	5581	2044	7248	5521
work_year	2023	2023	2023	2022	2023
experience_level	SE	SE	EN	MI	SE
employment_type	FT	FT	FT	FT	FT
job_title	Data Engineer	Data Scientist	Data Engineer	Data Engineer	Data Engineer
salary	60000	216100	35000	60000	163625
salary_currency	GBP	USD	GBP	GBP	USD
salary_in_usd	73824	216100	43064	73880	163625
employee_residence	GB	US	GB	GB	US
remote_ratio	0	0	100	0	100
company_location	GB	US	GB	GB	US
company_size	M	M	M	M	M

# Read the CSV file containing job salaries data into a Pandas DataFrame
df4 = pd.read_csv("/kaggle/input/global-ai-ml-data-science-salary/salaries.csv").drop_duplicates()

# Filter the DataFrame to include only rows where the 'work_year' is greater than or equal to 2022
df4 = df4[df4['work_year']>=2022]

# Print the shape of the DataFrame to show the number of rows and columns
print(df4.shape)

# Display a random sample of 5 rows from the filtered DataFrame, transposing for better readability
df4.sample(5).T

(4838, 11)

step machine learning
step of machine learning
ml projects
ml project

	4060	851	5345	8327	310
work_year	2023	2023	2023	2022	2023
experience_level	SE	EN	SE	SE	SE
employment_type	FT	FT	FT	FT	FT
job_title	Data Scientist	Data Scientist	Data Science Manager	Data Analyst	Data Science Consultant
salary	169000	18000	134236	120600	116000
salary_currency	USD	EUR	USD	USD	USD
salary_in_usd	169000	19434	134236	120600	116000
employee_residence	US	GR	US	US	US
remote_ratio	0	100	0	100	0
company_location	US	GR	US	US	US
company_size	M	L	M	M	M

# Read the CSV file containing job salaries data into a Pandas DataFrame
df5 = pd.read_csv("/kaggle/input/2023-data-scientists-salary/ds_salaries.csv").drop_duplicates()

# Filter the DataFrame to include only rows where the 'work_year' is greater than or equal to 2022
df5 = df5[df5['work_year']>=2022]

# Print the shape of the DataFrame to show the number of rows and columns
print(df5.shape)

# Display a random sample of 5 rows from the filtered DataFrame, transposing for better readability
df5.sample(5).T

(2281, 11)

	158	1960	2475	1816	654
work_year	2023	2022	2022	2022	2023
experience_level	MI	SE	SE	MI	SE
employment_type	FT	FT	FT	FT	FT
job_title	Data Analyst	Research Engineer	Data Analyst	Data Scientist	Data Analyst
salary	35000	249500	150000	1100000	121600
salary_currency	GBP	USD	USD	INR	USD
salary_in_usd	42533	249500	150000	13989	121600
employee_residence	GB	US	US	IN	US
remote_ratio	0	0	0	100	100
company_location	GB	US	US	IN	US
company_size	M	M	M	L	M

#  Read the CSV file containing job salaries data into a Pandas DataFrame
df6 = pd.read_csv('/kaggle/input/salary-data-analist/ds_salaries new.csv')

# Filter the DataFrame to include only rows where the 'work_year' is greater than or equal to 2022
df6 = df6[df6['work_year']>=2022]

# Print the shape of the DataFrame to show the number of rows and columns
print(df6.shape)

# Display a random sample of 5 rows from the filtered DataFrame, transposing for better readability
df6.sample(5).T

(3449, 11)

	977	1939	3098	2503	3017
work_year	2023	2022	2022	2022	2022
experience_level	SE	SE	SE	SE	SE
employment_type	FT	FT	FT	FT	FT
job_title	Data Analyst	Data Engineer	Data Science Manager	Data Engineer	Data Analyst
salary	122000	78000	249260	130000	175000
salary_currency	USD	USD	USD	USD	USD
salary_in_usd	122000	78000	249260	130000	175000
employee_residence	US	US	US	US	US
remote_ratio	100	0	0	0	100
company_location	US	US	US	US	US
company_size	M	M	M	M	M

#  Read the CSV file containing job salaries data into a Pandas DataFrame
df7 = pd.read_csv('/kaggle/input/data-science-job-salaries-2024/salaries.csv')

# Filter the DataFrame to include only rows where the 'work_year' is greater than or equal to 2022
df7 = df7[df7['work_year']>=2022]

# Print the shape of the DataFrame to show the number of rows and columns
print(df7.shape)

# Display a random sample of 5 rows from the filtered DataFrame, transposing for better readability
df7.sample(5).T

(13679, 11)

	4865	3207	8944	1769	12657
work_year	2023	2024	2023	2024	2022
experience_level	SE	SE	SE	MI	SE
employment_type	FT	FT	FT	FT	FT
job_title	Data Architect	BI Developer	Applied Scientist	Data Science	Data Scientist
salary	150000	80000	136000	86000	243900
salary_currency	USD	USD	USD	USD	USD
salary_in_usd	150000	80000	136000	86000	243900
employee_residence	US	IN	US	US	US
remote_ratio	0	100	0	0	100
company_location	US	IN	US	US	US
company_size	M	M	L	M	M

#  Read the CSV file containing job salaries data into a Pandas DataFrame
df8 = pd.read_csv('/kaggle/input/latest-data-science-job-salaries-2024/DataScience_salaries_2024.csv')

# Filter the DataFrame to include only rows where the 'work_year' is greater than or equal to 2022
df8 = df8[df8['work_year']>=2022]

# Print the shape of the DataFrame to show the number of rows and columns
print(df8.shape)

# Display a random sample of 5 rows from the filtered DataFrame, transposing for better readability
df8.sample(5).T

(14545, 11)

	14262	8570	13830	6927	3911
work_year	2023	2024	2023	2023	2023
experience_level	EN	SE	MI	MI	SE
employment_type	FT	FT	FT	FT	FT
job_title	Data Scientist	Data Manager	Data Specialist	Data Engineer	Research Scientist
salary	50000	131200	60000	147100	185000
salary_currency	USD	USD	USD	USD	USD
salary_in_usd	50000	131200	60000	147100	185000
employee_residence	IN	US	US	US	US
remote_ratio	100	100	0	0	0
company_location	US	US	US	US	US
company_size	M	M	M	M	M

projects on machine learning
machine learning project

#  Read the CSV file containing job salaries data into a Pandas DataFrame
df9 = pd.read_csv("/kaggle/input/machine-learning-engineer-salary-in-2024/salaries.csv")

# Filter the DataFrame to include only rows where the 'work_year' is greater than or equal to 2022
df9 = df9[df9['work_year']>=2022]

# Print the shape of the DataFrame to show the number of rows and columns
print(df9.shape)

# Display a random sample of 5 rows from the filtered DataFrame, transposing for better readability
df9.sample(5).T

(16201, 11)

	7410	5387	13553	14502	11737
work_year	2023	2024	2023	2023	2023
experience_level	SE	SE	MI	SE	EX
employment_type	FT	FT	FT	FT	FT
job_title	Data Analyst	Business Intelligence Engineer	Data Scientist	Data Engineer	Data Engineer
salary	150000	31200	45000	310000	204500
salary_currency	USD	EUR	EUR	USD	USD
salary_in_usd	150000	34666	48585	310000	204500
employee_residence	US	LV	ES	US	US
remote_ratio	0	0	0	0	0
company_location	US	LV	ES	US	US
company_size	M	M	M	M	M

#  Read the CSV file containing job salaries data into a Pandas DataFrame
df10 = pd.read_csv("/kaggle/input/data-engineer-salary-in-2024/salaries (2).csv")

# Filter the DataFrame to include only rows where the 'work_year' is greater than or equal to 2022
df10 = df10[df10['work_year']>=2022]

# Print the shape of the DataFrame to show the number of rows and columns
print(df10.shape)

# Display a random sample of 5 rows from the filtered DataFrame, transposing for better readability
df10.sample(5).T

(16241, 11)

	9777	9339	9462	10574	3207
work_year	2023	2023	2023	2023	2024
experience_level	SE	SE	SE	SE	MI
employment_type	FT	FT	FT	FT	FT
job_title	Applied Scientist	Machine Learning Engineer	Machine Learning Engineer	Data Scientist	Applied Scientist
salary	159100	280000	204500	140100	222200
salary_currency	USD	USD	USD	USD	USD
salary_in_usd	159100	280000	204500	140100	222200
employee_residence	US	US	US	US	US
remote_ratio	0	0	0	0	0
company_location	US	US	US	US	US
company_size	L	M	M	M	L

#  Read the CSV file containing job salaries data into a Pandas DataFrame
df11 = pd.read_csv("/kaggle/input/ai-ml-salaries/salaries.csv")

# Filter the DataFrame to include only rows where the 'work_year' is greater than or equal to 2022
df11 = df11[df11['work_year']>=2022]

# Print the shape of the DataFrame to show the number of rows and columns
print(df11.shape)

# Display a random sample of 5 rows from the filtered DataFrame, transposing for better readability
df11.sample(5).T

(17763, 11)

	9139	10496	5055	127	6567
work_year	2023	2023	2024	2024	2024
experience_level	SE	SE	SE	MI	MI
employment_type	FT	FT	FT	FT	FT
job_title	Machine Learning Engineer	Data Engineer	Data Analyst	Prompt Engineer	Research Scientist
salary	144400	385000	111200	500000	96200
salary_currency	USD	USD	USD	USD	USD
salary_in_usd	144400	385000	111200	500000	96200
employee_residence	CA	US	US	US	US
remote_ratio	0	0	0	0	100
company_location	CA	US	US	US	US
company_size	M	M	M	M	M

#  Read the CSV file containing job salaries data into a Pandas DataFrame
df12 = pd.read_csv("/kaggle/input/data-science-salary-data/salaries.csv")

# Filter the DataFrame to include only rows where the 'work_year' is greater than or equal to 2022
df12 = df12[df12['work_year']>=2022]

# Print the shape of the DataFrame to show the number of rows and columns
print(df12.shape)

# Display a random sample of 5 rows from the filtered DataFrame, transposing for better readability
df12.sample(5).T

(13437, 11)

	11374	1399	3457	870	3566
work_year	2023	2024	2023	2024	2023
experience_level	SE	SE	SE	SE	SE
employment_type	FT	FT	FT	FT	FT
job_title	Machine Learning Engineer	Data Analyst	Machine Learning Engineer	Data Architect	Analytics Engineer
salary	142200	110000	280000	90000	155000
salary_currency	USD	USD	USD	GBP	USD
salary_in_usd	142200	110000	280000	112500	155000
employee_residence	US	US	US	GB	US
remote_ratio	0	0	0	0	0
company_location	US	US	US	GB	US
company_size	M	M	M	M	M

# Concatenating DataFrames vertically
# This combines the rows of the DataFrames to create a new DataFrame
df = pd.concat([df0, df1, df2, df3, df4, df5, df6, df7, df8, df9, df10, df11, df12])

# Removing duplicate rows from the concatenated DataFrame
# This helps ensure that each row is unique in the final DataFrame
df = df.drop_duplicates()

# Printing the shape of the resulting DataFrame
# This provides information about the number of rows and columns in the DataFrame
print(df.shape)

(11938, 11)

df[df['salary_in_usd']==115573]

	work_year	experience_level	employment_type	job_title	salary	salary_currency	salary_in_usd	employee_residence	remote_ratio	company_location	company_size
18105	2022	SE	FT	Machine Learning Engineer	110000	EUR	115573	FR	100	FR	M
18886	2022	MI	FT	Data Scientist	110000	EUR	115573	NL	0	NL	M

Data visualisation

# An update taken from the nice work https://www.kaggle.com/code/anshtanwar/auto-eda-missing-migrants-interactive-charts 
# made by @anshtanwar

# Import the AutoViz_Class
# This class is used for automated exploratory data analysis and visualization.
AV = AutoViz_Class()

# Initialize variables
filename = ""  # Specify the filename of the dataset (empty in this case)
target_variable = 'salary_in_usd'  # Specify the target variable for analysis
custom_plot_dir = "custom_plot_directory"  # Specify the directory to save custom plots

# Perform automated EDA using the AutoViz library
# The following parameters are used:
# - filename: Empty in this case as the data is provided directly as 'df'
# - sep: Delimiter used in the data (comma in this case)
# - depVar: Target variable for analysis ('rating' in this case)
# - dfte: DataFrame to be analyzed ('df' is assumed to be defined earlier)
# - header: Indicates that the first row contains column names (0 for True)
# - verbose: Verbosity level (1 for verbose output)
# - lowess: Smoothing using Lowess algorithm (False for no smoothing)
# - chart_format: Format in which charts will be generated (HTML format in this case)
# - max_rows_analyzed: Maximum number of rows to analyze (up to 10,000 rows)
# - max_cols_analyzed: Maximum number of columns to analyze (up to 50 columns)
# - save_plot_dir: Directory to save the generated plots ('custom_plot_directory' in this case)
try:
    dft = AV.AutoViz(
        filename,
        sep=",",
        depVar=target_variable,
        dfte=df,
        header=0,
        verbose=1,
        lowess=False,
        chart_format="html",
        max_rows_analyzed=min([df.shape[0], 10**4]),
        max_cols_analyzed=min([df.shape[1], 50]),
        save_plot_dir=custom_plot_dir
    )
    
    # Import the necessary library for displaying HTML content
    from IPython.core.display import display, HTML

    # Import the pathlib library to work with file paths
    from pathlib import Path
    
    # Initialize an empty list to store file names
    file_names = []

    # Use pathlib to iterate through HTML files in a specific directory
    for file in Path(f'/kaggle/working/{custom_plot_dir}/{target_variable}/').glob('*.html'):

        # Extract the filename from the full path and add it to the list
        filename = str(file).split('/')[-1]
        file_names.append(filename)

    # Iterate through the list of file names and display each HTML file
    for file_name in file_names:

        # Construct the full file path for each HTML file
        file_path = f'/kaggle/working/{custom_plot_dir}/{target_variable}/{file_name}'

        # Open the HTML file for reading
        with open(file_path, 'r') as file:

            # Read the content of the HTML file
            html_content = file.read()

            # Display the HTML content using IPython
            display(HTML(html_content))
except Exception as e:
    print(f"Exception: {e}")

ml process
kaggle machine learning projects
machine learning project manager

    Since nrows is smaller than dataset, loading random sample of 10000 rows into pandas...
Shape of your Data Set loaded: (10000, 11)
#######################################################################################
######################## C L A S S I F Y I N G  V A R I A B L E S  ####################
#######################################################################################
Classifying variables in data set...
    Number of Numeric Columns =  0
    Number of Integer-Categorical Columns =  2
    Number of String-Categorical Columns =  6
    Number of Factor-Categorical Columns =  0
    Number of String-Boolean Columns =  0
    Number of Numeric-Boolean Columns =  0
    Number of Discrete String Columns =  1
    Number of NLP String Columns =  0
    Number of Date Time Columns =  1
    Number of ID Columns =  0
    Number of Columns to Delete =  0
    10 Predictors classified...
        No variables removed since no ID or low-information variables found in data set
Since Number of Rows in data 10000 exceeds maximum, randomly sampling 10000 rows for EDA...

################ Regression problem #####################

Saving scatterplots in HTML format
Saving pair_scatters in HTML format
Saving distplots_cats in HTML format
Saving distplots_nums in HTML format
Saving kde_plots in HTML format
Saving violinplots in HTML format
Saving heatmaps in HTML format
Saving timeseries_plots in HTML format
Saving cat_var_plots in HTML format
Time to run AutoViz (in seconds) = 13

Select_Variable X-AxisY-AxisColor Select_Variable X-AxisY-Axis X-AxisY-Axis Select_Cat_Variable

create_report(df.sample(10**3))

DataPrep Report Overview

Variables ≡Interactions Correlations Missing Values

Overview

Dataset Statistics

Number of Variables	11
Number of Rows	1000
Missing Cells	0
Missing Cells (%)	0.0%
Duplicate Rows	0
Duplicate Rows (%)	0.0%
Total Size in Memory	456.8 KB
Average Row Size in Memory	467.8 B
Variable Types	Categorical: 7 Numerical: 2 GeoGraphy: 2

Dataset Insights

salary and salary_in_usd have similar distributions	Similar Distribution
salary is skewed	Skewed
job_title has a high cardinality: 88 distinct values	High Cardinality
work_year has constant length 4	Constant Length
experience_level has constant length 2	Constant Length
employment_type has constant length 2	Constant Length
salary_currency has constant length 3	Constant Length
employee_residence has constant length 2	Constant Length
company_location has constant length 2	Constant Length
company_size has constant length 1	Constant Length

Variables

Sort byReverse order

work_year

Approximate Distinct Count	3
Approximate Unique (%)	0.3%
Missing	0
Missing (%)	0.0%
Memory Size	67.4 KB

experience_level

Approximate Distinct Count	4
Approximate Unique (%)	0.4%
Missing	0
Missing (%)	0.0%
Memory Size	65.4 KB

The largest value (SE) is over 1.96 times larger than the second largest value (MI)

employment_type

Approximate Distinct Count	4
Approximate Unique (%)	0.4%
Missing	0
Missing (%)	0.0%
Memory Size	65.4 KB

The largest value (FT) is over 247.75 times larger than the second largest value (CT)

job_title

Approximate Distinct Count	88
Approximate Unique (%)	8.8%
Missing	0
Missing (%)	0.0%
Memory Size	79.8 KB

salary

Approximate Distinct Count	559
Approximate Unique (%)	55.9%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Memory Size	15.6 KB
Mean	166471.106
Minimum	21000
Maximum	5000000
Zeros	0
Zeros (%)	0.0%
Negatives	0
Negatives (%)	0.0%

salary is skewed right (γ1 = 14.2514)

salary_currency

Approximate Distinct Count	10
Approximate Unique (%)	1.0%
Missing	0
Missing (%)	0.0%
Memory Size	66.4 KB

The largest value (USD) is over 19.57 times larger than the second largest value (GBP)

salary_in_usd

Approximate Distinct Count	608
Approximate Unique (%)	60.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Memory Size	15.6 KB
Mean	150069.385
Minimum	17511
Maximum	720000
Zeros	0
Zeros (%)	0.0%
Negatives	0
Negatives (%)	0.0%

salary_in_usd is skewed right (γ1 = 1.3168)

employee_residence

Approximate Distinct Count	35
Approximate Unique (%)	3.5%
Missing	0
Missing (%)	0.0%
Memory Size	65.4 KB

The largest value (US) is over 16.31 times larger than the second largest value (GB)

remote_ratio

Approximate Distinct Count	3
Approximate Unique (%)	0.3%
Missing	0
Missing (%)	0.0%
Memory Size	65.1 KB

The largest value (0) is over 2.1 times larger than the second largest value (100)

company_location

Approximate Distinct Count	34
Approximate Unique (%)	3.4%
Missing	0
Missing (%)	0.0%
Memory Size	65.4 KB

The largest value (US) is over 16.08 times larger than the second largest value (GB)

company_size

Approximate Distinct Count	3
Approximate Unique (%)	0.3%
Missing	0
Missing (%)	0.0%
Memory Size	64.5 KB

The largest value (M) is over 18.43 times larger than the second largest value (L)

machine learning project management
machine learning projects for masters students

Interactions

X-AxisY-Axis

Correlations

PearsonSpearmanKendallTau

Missing Values

Bar ChartSpectrumHeat MapDendrogram

Report generated with DataPrep

Data transformation

# Convert 'salary_in_usd' column to thousands of dollars per year
label = 'salary_in_usd'
df[label] = df[label] * 1e-3

# Exclude 1% of smallest and 1% of highest salaries to remove outliers
P = np.percentile(df[label], [1, 99])
df = df[(df[label] > P[0]) & (df[label] < P[1])]

# Replace 'ML Engineer' with 'Machine Learning Engineer' in the 'job_title' column
df['job_title'].replace('ML Engineer', 'Machine Learning Engineer', inplace=True)

# Rename 'experience_level' based on a dictionary mapping
exp_dict = {'EN': 'Entry-level / Junior', 'MI': 'Mid-level / Intermediate', 'SE': 'Senior-level / Expert', 'EX': 'Executive-level / Director'}
df['experience_level'] = df['experience_level'].replace(exp_dict)

# Rename 'employment_type' based on a dictionary mapping
empl_dict = {'PT': 'Part-time', 'FT': 'Full-time', 'CT': 'Contract', 'FL': 'Freelance'}
df['employment_type'] = df['employment_type'].replace(empl_dict)

# Rename 'remote_ratio' based on a dictionary mapping
remote_dict = {0: 'No remote work (less than 20%)', 50: 'Partially remote', 100: 'Fully remote (more than 80%)'}
df['remote_ratio'] = df['remote_ratio'].replace(remote_dict)

# Rename 'company_size' based on a dictionary mapping
company_dict = {'S': 'Small', 'M': 'Medium', 'L': 'Large'}
df['company_size'] = df['company_size'].replace(company_dict)

# Combine 'employee_residence' and 'company_location' into a new 'residence_location' column
df['residence_location'] = df['employee_residence'] + '/' + df['company_location']

# Convert 'work_year' column to strings
df['work_year'] = df['work_year'].astype(str)

# Set up the rare label encoder for selected columns, limiting the number of categories
# and replacing rare categories with 'Other'
for col in ['job_title', 'residence_location', 'experience_level', 'employment_type']:
    encoder = RareLabelEncoder(n_categories=1, max_n_categories=50, replace_with='Other', tol=20/df.shape[0])
    df[col] = encoder.fit_transform(df[[col]])

# Drop unused columns
cols2drop = ['salary', 'employee_residence', 'company_location', 'salary_currency']
df = df.drop(cols2drop, axis=1)

# Display the shape of the resulting DataFrame
print(df.shape)

(11698, 8)

df.sample(10).T

	4057	5001	3356	11098	18342	9928	5642	3186	11403	2746
work_year	2024	2024	2024	2023	2022	2023	2024	2024	2023	2022
experience_level	Entry-level / Junior	Senior-level / Expert	Mid-level / Intermediate	Senior-level / Expert	Senior-level / Expert	Senior-level / Expert	Executive-level / Director	Mid-level / Intermediate	Mid-level / Intermediate	Senior-level / Expert
employment_type	Full-time	Full-time	Full-time	Full-time	Full-time	Full-time	Full-time	Full-time	Full-time	Full-time
job_title	Business Intelligence Analyst	Business Intelligence Engineer	Data Engineer	Business Intelligence Engineer	Data Engineer	Data Engineer	Data Engineer	Data Analyst	Data Engineer	Machine Learning Engineer
salary_in_usd	111.4	204.5	149.9	180.0	172.2	143.0	190.0	59.469	83.9	131.3
remote_ratio	Fully remote (more than 80%)	No remote work (less than 20%)	No remote work (less than 20%)	Fully remote (more than 80%)	No remote work (less than 20%)	No remote work (less than 20%)	No remote work (less than 20%)	No remote work (less than 20%)	No remote work (less than 20%)	Fully remote (more than 80%)
company_size	Medium	Medium	Medium	Medium	Medium	Medium	Medium	Medium	Medium	Large
residence_location	US/US	US/US	US/US	US/US	US/US	US/US	US/US	US/US	US/US	US/US

Machine learning

# Extract the target variable 'label' and features 'X' from the DataFrame
y = df[label].values.reshape(-1,)
X = df.drop([label], axis=1)

# Identify categorical columns in the feature set
cat_cols = df.select_dtypes(include=['object']).columns

# Get the indices of categorical columns in the feature set
cat_cols_idx = [list(X.columns).index(c) for c in cat_cols]

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0, stratify=df[['residence_location']])

# Print the shapes of the training and testing sets to verify the split
print("Training set shapes - X_train: {}, y_train: {}".format(X_train.shape, y_train.shape))
print("Testing set shapes - X_test: {}, y_test: {}".format(X_test.shape, y_test.shape))

Training set shapes - X_train: (5849, 7), y_train: (5849,)
Testing set shapes - X_test: (5849, 7), y_test: (5849,)

# Initialize Pool: Creating CatBoost Pools for training and testing data, specifying categorical features.
train_pool = Pool(X_train, 
                  y_train, 
                  cat_features=cat_cols_idx)
test_pool = Pool(X_test,
                 y_test,
                 cat_features=cat_cols_idx)

# Specify Training Parameters: Configuring the CatBoostRegressor with specific hyperparameters.
model = CatBoostRegressor(iterations=1800, 
                          depth=6,
                          verbose=0,
                          early_stopping_rounds=100,
                          learning_rate=0.008, 
                          loss_function='RMSE')

# Train the Model: Fitting the model to the training data and using the test data for early stopping.
model.fit(train_pool, eval_set=test_pool)

# Make Predictions: Generating predictions on both the training and test sets.
y_train_pred = model.predict(train_pool)
y_test_pred = model.predict(test_pool)

# Evaluate Performance on Training and Test Sets: Calculating RMSE scores for both sets.
rmse_train = mean_squared_error(y_train, y_train_pred, squared=False)
rmse_test = mean_squared_error(y_test, y_test_pred, squared=False)

# Print Results: Displaying the RMSE scores for the training and test sets.
print(f"RMSE score for train {round(rmse_train, 1)} kUSD/year, and for test {round(rmse_test, 1)} kUSD/year")

RMSE score for train 51.4 kUSD/year, and for test 52.0 kUSD/year

# Baseline scores (assuming the same prediction for all data samples)

# Calculating the root mean squared error (RMSE) for the training set based on the mean prediction
rmse_bs_train = mean_squared_error(y_train, [np.mean(y_train)] * len(y_train), squared=False)

# Calculating the root mean squared error (RMSE) for the test set based on the mean prediction
rmse_bs_test = mean_squared_error(y_test, [np.mean(y_train)] * len(y_test), squared=False)

# Printing the RMSE baseline scores for both the training and test sets
print(f"RMSE baseline score for train: {round(rmse_bs_train, 1)} kUSD/year, and for test: {round(rmse_bs_test, 1)} kUSD/year")

RMSE baseline score for train: 64.7 kUSD/year, and for test: 64.1 kUSD/year

Explanations with SHAP values

%matplotlib inline

# Initialize the SHAP JavaScript visualization library
shap.initjs()

# Create a SHAP TreeExplainer for the given 'model'
ex = shap.TreeExplainer(model)

# Compute SHAP values for the test dataset 'X_test' using the TreeExplainer
shap_values = ex.shap_values(X_test)

# Generate a summary plot of SHAP values to visualize feature contributions
shap.summary_plot(shap_values, X_test)

# Accessing the expected values from the 'ex' object, assuming it contains the expected values.
expected_values = ex.expected_value

# Printing the average predicted salary rounded to one decimal place in kilo USD per year.
print(f"Average predicted salary is {round(expected_values, 1)} kUSD/year")

# Calculating and printing the average actual salary from the 'y_test' array, rounded to one decimal place in kilo USD per year.
print(f"Average actual salary is {round(np.mean(y_test), 1)} kUSD/year")

Average predicted salary is 147.9 kUSD/year
Average actual salary is 146.5 kUSD/year

# Function to visualize SHAP values for a specific feature
def show_shap(col, shap_values=shap_values):
    # Create a copy of the test dataset to avoid modifying the original data
    df_infl = X_test.copy()
    
    # Add a new column for SHAP values corresponding to the specified feature
    df_infl['shap_'] = shap_values[:, df_infl.columns.tolist().index(col)]
    
    # Calculate the mean and standard deviation of SHAP values grouped by the specified feature
    gain = round(df_infl.groupby(col)['shap_'].mean(), 5)
    gain_std = round(df_infl.groupby(col)['shap_'].std(), 5)
    
    # Count the number of instances for each category of the specified feature
    cnt = df_infl.groupby(col)['shap_'].count()
    
    # Create a dictionary to store the results
    dd_dict = {'col': list(gain.index), 'gain': list(gain.values), 'gain_std': list(gain_std.values), 'count': cnt}
    
    # Create a DataFrame from the dictionary and sort it by 'gain' in descending order
    df_res = pd.DataFrame.from_dict(dd_dict).sort_values('gain', ascending=False).set_index('col')
    
    # Plotting SHAP values with error bars
    plt.figure(figsize=(9, 6))
    plt.errorbar(df_res.index, df_res['gain'], yerr=df_res['gain_std'], fmt="o", color="r")
    plt.title(f'SHAP values for {col}')
    plt.ylabel('kUSD/year')
    plt.tick_params(axis="x", rotation=90)
    plt.show()
    
    # Display the results DataFrame
    print(df_res)
    
    return

# Iterate through all columns in the test dataset
for col in X_test.columns:
    print()
    print(col)
    print()
    
    # Call the show_shap function for each feature
    show_shap(col, shap_values)

work_year

       gain    gain_std  count
col                           
2024  0.86390   1.69680  2874 
2023  0.29864   1.82858  2418 
2022 -6.28405   1.38722   557 

experience_level

                              gain    gain_std  count
col                                                  
Executive-level / Director  32.96379   4.27869   212 
Senior-level / Expert       10.19685   1.17599  3392 
Mid-level / Intermediate   -17.35785   1.30479  1655 
Entry-level / Junior       -27.45064   3.77944   590 

employment_type

machine learning python projects
machine learning projects in python

             gain    gain_std  count
col                                 
Full-time  -0.02771   1.19619  5814 
Other      -7.58667   1.18367     5 
Contract   -9.80807   3.58763    11 
Part-time -11.75318   2.88548    19 

job_title

machine learning projects
machine learning projects with source code
machine learning projects github
machine learning projects for final year
machine learning projects for students

                                            gain    gain_std  count
col                                                                
Machine Learning Engineer                 27.57581   2.25142   657 
Computer Vision Engineer                  26.98830   3.90854    21 
Research Scientist                        26.01017   1.09076   209 
Data Science Engineer                     25.75089   2.08551    19 
Data Infrastructure Engineer              25.53922   1.28684    10 
Head of Data                              25.16594   1.96862    26 
Machine Learning Scientist                24.53864   0.86595    55 
Machine Learning Infrastructure Engineer  24.30945   1.31455    21 
Machine Learning Researcher               24.18054   3.23081    16 
Applied Scientist                         23.94772   2.22138    84 
Data Science Manager                      23.54227   2.51141    44 
Director of Data Science                  23.17807   4.01040    11 
AI Architect                              22.90402   2.52727    12 
Data Analytics Lead                       21.32742   1.94926    13 
Research Engineer                         19.02157   1.54630   132 
Data Science Lead                         11.94530   1.74500    14 
Data Scientist                             4.81905   1.68475  1173 
Data Science                               3.57735   1.41696    94 
Data Analytics Manager                     3.32220   0.85416    28 
Data Architect                             3.16300   0.83773   159 
AI Engineer                                2.55602   1.33826    73 
Data Engineer                              0.20027   1.68056  1005 
Data Product Manager                      -0.97918   0.67761    21 
Analytics Engineer                        -1.07788   0.80328   162 
MLOps Engineer                            -2.09056   0.72661    16 
ETL Developer                             -2.46553   0.44679    15 
AI Scientist                              -4.87655   1.39078    12 
Data Lead                                 -5.54521   1.49889    13 
Business Intelligence                     -6.58333   1.69320    56 
Data Modeler                             -10.56544   0.35704    17 
Other                                    -13.42975   1.41266   302 
Business Intelligence Manager            -13.67929   0.40962     7 
Business Intelligence Engineer           -13.74015   1.51243    63 
AI Developer                             -15.72698   2.82001    12 
Research Analyst                         -19.46592   2.14006    65 
Business Intelligence Analyst            -22.83291   2.80047   116 
Data Strategist                          -24.35387   0.70741    15 
Data Quality Analyst                     -24.93376   7.06758    10 
Data Science Consultant                  -26.41770   3.58909    24 
Insight Analyst                          -28.67048   4.51092    10 
Data Management Analyst                  -29.89674   3.70526    11 
Data Analyst                             -30.54220   4.40807   790 
BI Developer                             -31.72639   4.00586    38 
Data Management Specialist               -32.50803   3.76743    10 
Data Specialist                          -33.05490   4.40766    44 
Data Operations Analyst                  -33.05554   3.22688    13 
Data Manager                             -33.71244   3.41207    59 
BI Analyst                               -33.97450   4.80845    24 
Business Intelligence Developer          -34.48365   3.10161    31 
Data Developer                           -34.68734   1.58439    17 

remote_ratio

                                 gain    gain_std  count
col                                                     
No remote work (less than 20%)  0.52673   1.15238  3922 
Fully remote (more than 80%)   -1.42224   1.44277  1845 
Partially remote               -6.36967   1.72677    82 

company_size

         gain    gain_std  count
col                             
Medium -0.11769   0.98566  5532 
Large  -1.06947   2.54676   253 
Small  -5.99881   2.27519    64 

residence_location

         gain    gain_std  count
col                             
US/US   6.44578   1.08267  4908 
CA/CA  -3.74517   1.52402   236 
AU/AU -17.00825   3.28928    27 
DE/DE -35.03124   3.85496    45 
FR/FR -41.77947   3.72310    29 
IN/IN -41.83046   4.90604    18 
GB/GB -42.95816   5.21321   313 
Other -43.79319   5.34311   175 
NL/NL -48.00334   4.79647    15 
LT/LT -48.76303   5.22976    11 
ZA/ZA -49.16019   3.26999    10 
PT/PT -49.54464   5.33928    12 
ES/ES -50.48376   3.74628    40 
LV/LV -50.59173   4.38976    10

Additional analysis: ML eng vs Data Scientist gap analysis with SHAP values

def plot_gap(col, main_col="job_title", value1="Machine Learning Engineer", value2="Data Scientist"):
    df_infl = X_test.copy()
    df_infl['shap_gd'] = shap_values[:,int(list(X_test.columns).index(main_col))]
    df1_mean = pd.pivot_table(df_infl, values=['shap_gd'], index=[col, main_col], aggfunc=np.mean)
    df1_std = pd.pivot_table(df_infl, values=['shap_gd'], index=[col, main_col], aggfunc=np.std)
    df2_mean = pd.pivot(df1_mean.reset_index(), index=col, columns=main_col, values='shap_gd')[[value1, value2]].dropna(axis=0)
    df2_mean['gap'] = df2_mean[value1]-df2_mean[value2]
    df2_std = pd.pivot(df1_std.reset_index(), index=col, columns=main_col, values='shap_gd')[[value1, value2]]
    df2_std['std'] = np.sqrt(df2_std[value1]**2 + df2_std[value2]**2)
    df2 = df2_mean[['gap']].join(df2_std[['std']], how='inner')
    df2 = df2.dropna(axis=0).sort_values('gap', ascending=False).sort_values('gap', ascending=False)
    plt.figure(figsize=(12,8))
    plt.bar(x=df2.index, height=df2['gap'])
    plt.errorbar(df2.index, df2['gap'], yerr=df2['std'], fmt="o", color="r")
    plt.title(f'SHAP value of gap per {col}, yearly compensation')
    plt.ylabel('kUSD/year')
    plt.tick_params(axis="x", rotation=90)
    plt.show();
    print()
    print()
    df_infl['shap_'] = shap_values[:,int(list(X_test.columns).index(col))]
    df2['avg_pay'] = expected_values + df_infl.groupby(col)['shap_'].mean()
    df2['avg_pp'] = 100*df2['gap']/df2['avg_pay']
    df2 = df2.sort_values('avg_pp', ascending=False)
    plt.figure(figsize=(12,8))
    plt.bar(x=df2.index, height=df2['avg_pp'])
    plt.errorbar(df2.index, df2['avg_pp'], yerr=100*df2['std']/df2['avg_pay'], fmt="o", color="r")
    plt.title(f'Gap per {col} relative to average pay')
    plt.ylabel('Percentage points')
    plt.tick_params(axis="x", rotation=90)
    plt.show();
    return

for col in X_test.columns:
    if col != 'job_title':
        print(col)
        plot_gap(col)

work_year

projects on machine learning
machine learning project
project machine learning
machine learning certification
certification machine learning

experience_level

employment_type

remote_ratio

company_size

residence_location

ml model
machine learning projects
projects machine learning

Additional analysis: 2024 vs 2023 year analysis with SHAP values

for col in X_test.columns:
    if col != 'work_year':
        print(col)
        plot_gap(col, main_col="work_year", value1="2024", value2="2023")

experience_level

employment_type

job_title

remote_ratio

machine learning projects github
machine learning projects for final year
machine learning projects for students

company_size

residence_location

machine learning projects
machine learning projects with source code

Conclusion

Alright everyone, we’ve come to the conclusion of our in-depth analysis on Machine Learning Engineer salaries for 2024. Here’s what you need to know:

Key Points:

Salary Range: Machine Learning Engineers are earning competitive salaries due to their valuable skills and expertise.
Industry Demand: There is a high demand for AI professionals in various sectors, leading to an increase in salaries.
Geographical Impact: Where you live matters! Salaries can vary significantly based on location.

ml projects github
ml projects for final year
ml projects for students

Machine Learning Engineers are set for success in 2024, with attractive salaries and abundant opportunities. Whether you’re just starting out or a seasoned expert, the future looks promising in the AI field.

machine learning projects github
machine learning projects for final year
machine learning projects for students

Thank you for joining us on this journey. Stay tuned for more insights on the ever-changing tech industry!

Learn more

More info about our us

Facebook: Click

Telegram group of exercises: Click

YouTube: Click

Machine Learning Project 8: Machine Learning Engineer Salary in 2024

Published by Writer1 on May 27, 2024May 27, 2024

Table of Contents