Dataset:
You can create a CSV file named sales_data.csv
with the following columns:
Order_ID
: Unique identifier for each order.Customer_ID
: Unique identifier for each customer.Order_Date
: Date the order was placed.Product
: Name of the product sold.Category
: Category of the product (e.g., Electronics, Clothing, Books).Price
: Price of the product.Quantity
: Quantity of the product ordered.
Project Goals:
- Load the dataset and perform basic data cleaning.
- Explore sales trends over time.
- Analyze popular products and categories.
- Calculate total revenue and average order value.
- Identify high-value customers.
Python Code
import pandas as pd
import matplotlib.pyplot as plt
# Load the dataset
data = pd.read_csv('sales_data.csv')
# Data Cleaning
# Assuming data cleaning steps like handling missing values, data type conversion, etc.
# Explore sales trends over time
data['Order_Date'] = pd.to_datetime(data['Order_Date'])
data['Month'] = data['Order_Date'].dt.to_period('M')
monthly_sales = data.groupby('Month')['Price'].sum()
plt.figure(figsize=(10, 6))
plt.plot(monthly_sales, marker='o')
plt.title('Monthly Sales Trend')
plt.xlabel('Month')
plt.ylabel('Total Sales')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()
# Analyze popular products and categories
popular_products = data['Product'].value_counts().head(10)
popular_categories = data['Category'].value_counts()
# Calculate total revenue and average order value
total_revenue = data['Price'].sum()
average_order_value = total_revenue / data['Order_ID'].nunique()
print(f'Total Revenue: ${total_revenue:.2f}')
print(f'Average Order Value: ${average_order_value:.2f}')
# Identify high-value customers
customer_spending = data.groupby('Customer_ID')['Price'].sum()
high_value_customers = customer_spending[customer_spending > customer_spending.quantile(0.75)]
print('High-Value Customers:')
print(high_value_customers)
This is just a basic outline, and you can expand upon it by incorporating more complex analyses, visualizations, and even machine learning techniques depending on your skill level and interests.
Addictions
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataset
data = pd.read_csv('sales_data.csv')
# Data Cleaning and Basic Analysis (same as before)
# Product Category Analysis
plt.figure(figsize=(10, 6))
sns.countplot(data=data, x='Category')
plt.title('Product Category Distribution')
plt.xticks(rotation=45)
plt.xlabel('Category')
plt.ylabel('Count')
plt.show()
avg_price_per_category = data.groupby('Category')['Price'].mean()
print('Average Price per Category:')
print(avg_price_per_category)
# Seasonal Sales Patterns (using monthly data)
data['Year'] = data['Order_Date'].dt.year
data['Quarter'] = data['Order_Date'].dt.to_period('Q')
quarterly_sales = data.groupby('Quarter')['Price'].sum()
plt.figure(figsize=(10, 6))
plt.plot(quarterly_sales, marker='o')
plt.title('Quarterly Sales Trend')
plt.xlabel('Quarter')
plt.ylabel('Total Sales')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()
# Customer Demographics (hypothetical example)
# Assuming you have a 'Gender' column in your dataset
gender_sales = data.groupby('Gender')['Price'].sum()
plt.figure(figsize=(8, 5))
gender_sales.plot(kind='bar')
plt.title('Sales by Gender')
plt.xlabel('Gender')
plt.ylabel('Total Sales')
plt.xticks(rotation=0)
plt.show()
# Product Performance Analysis (top products)
top_products = data.groupby('Product')['Price'].sum().sort_values(ascending=False).head(10)
print('Top Selling Products:')
print(top_products)
# Correlation Analysis (hypothetical example)
correlation_matrix = data[['Price', 'Quantity']].corr()
plt.figure(figsize=(6, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
1 Comment
Machine Learning Project 1: Honda Motor Stocks Best Prices · May 23, 2024 at 6:59 pm
[…] E-commerce Sales Data Analysis […]