4.2 KiB
E-commerce Customer Order Behavior Dataset A synthetic e-commerce dataset containing 10,000 orders with realistic customer behavior patterns, suitable for e-commerce analytics and machine learning tasks.
Dataset Card for E-commerce Orders Dataset Summary This dataset simulates customer order behavior in an e-commerce platform, containing detailed information about orders, customers, products, and delivery patterns. The data is synthetically generated with realistic distributions and patterns.
Supported Tasks regression: Predict order quantities or prices classification: Predict delivery status or customer segments clustering: Identify customer behavior patterns time-series-forecasting: Analyze order patterns over time Languages Not applicable (tabular data)
Dataset Structure Data Instances Each instance represents a single e-commerce order with the following fields:
{ 'order_id': '5ea92c47-c5b2-4bdd-8a50-d77efd77ec89', 'customer_id': 2350, 'product_id': 995, 'category': 'Electronics', 'price': 403.17, 'quantity': 3, 'order_date': '2024-04-20 14:59:58.897063', 'shipping_date': '2024-04-22 14:59:58.897063', 'delivery_status': 'Delivered', 'payment_method': 'PayPal', 'device_type': 'Mobile', 'channel': 'Paid Search', 'shipping_address': '72166 Cunningham Crescent East Nicholasside Mississippi 85568', 'billing_address': '38199 Edwin Plain Johnborough Maine 81826', 'customer_segment': 'Returning' }
Data Fields Field Name Type Description Value Range order_id string Unique order identifier (UUID4) - customer_id int Customer identifier 1-3,000 product_id int Product identifier 1-1,000 category string Product category Electronics, Clothing, Home, Books, Beauty, Toys price float Product price $5.00-$500.00 quantity int Order quantity 1-10 order_date datetime Order placement timestamp Last 12 months shipping_date datetime Shipping timestamp 1-7 days after order_date delivery_status string Delivery status Pending, Shipped, Delivered, Returned payment_method string Payment method used Credit Card, PayPal, Debit Card, Apple Pay, Google Pay device_type string Ordering device Desktop, Mobile, Tablet channel string Marketing channel Organic, Paid Search, Email, Social shipping_address string Delivery address Street, City, State, ZIP billing_address string Billing address Street, City, State, ZIP customer_segment string Customer type New, Returning, VIP Data Splits This dataset is provided as a single CSV file without splits.
Dataset Creation Source Data This is a synthetic dataset generated using Python with pandas, numpy, and Faker libraries. The data generation process ensures:
Realistic customer behavior patterns Proper data distributions Valid relationships between fields Realistic address formatting Annotations No manual annotations (synthetic data)
Considerations for Using the Data Social Impact of Dataset This dataset is designed for:
Development of e-commerce analytics systems Testing of order processing systems Training of machine learning models for e-commerce Educational purposes in data science Discussion of Biases As a synthetic dataset, care has been taken to:
Use realistic distributions for order patterns Maintain proper relationships between dates Create realistic customer segments Avoid demographic biases in address generation However, users should note that:
The data patterns are simplified compared to real e-commerce data The customer behavior patterns are based on general assumptions Geographic distribution might not reflect real-world patterns Dataset Statistics Total Records: 10,000
Distribution Statistics:
Delivery Status:
Delivered: 70% Shipped: 20% Pending: 5% Returned: 5% Customer Segments:
VIP: ~15% Returning: ~35% New: ~50% Loading and Usage Using Huggingface Datasets:
from datasets import load_dataset
dataset = load_dataset("path/to/e-commerce-orders")
Example: Load as pandas DataFrame
df = dataset['train'].to_pandas()
Example: Access specific columns
orders = dataset['train']['order_id'] prices = dataset['train']['price']
Data Quality The dataset has been validated to ensure:
No missing values Proper value ranges Valid categorical values Proper date relationships Unique order IDs Valid address formats