Project Overview
In this project, I revisited data from "Xmas Everything," an e-commerce store I previously owned and operated. While running the business, I relied primarily on basic analytics tools provided by platforms like Shopify to make decisions. Now, with a more sophisticated data science toolkit, I've reanalyzed this historical data to extract deeper insights and demonstrate how my analytical capabilities have evolved.
The analysis focuses on several key business metrics: revenue trends over time, geographical distribution of orders, conversion rates compared to industry benchmarks, and payment method distribution. Each of these areas provides valuable business intelligence that would have been instrumental in making more informed strategic decisions.
Personal Note
This project was both nostalgic and revelatory. Looking back at the data from my e-commerce venture with new skills and perspectives allowed me to see missed opportunities and validate some of the intuitive decisions I made at the time. It was a powerful demonstration of how my analytical skills have grown over the years and reminded me of the real-world impact data science can have on business outcomes.
Revenue Analysis
The first part of this analysis focused on visualizing total revenue over time to understand sales trends and identify significant patterns in revenue generation.
Data Preparation
I loaded order data from a Shopify export file, ensuring the 'Paid at' column was in datetime format for accurate timeline plotting. After filtering out orders without payment dates, I used Pandas' groupby functionality to aggregate total revenue by date.
df_orders = pd.read_csv('orders_export_1.csv') df_orders['Paid at'] = pd.to_datetime(df_orders['Paid at']) df_orders = df_orders.dropna(subset=['Paid at']) total_revenue = df_orders.groupby(df_orders['Paid at'].dt.date)['Total'].sum().reset_index()
Visualization Approach
I created a time series plot with an orange line representing revenue trends. The visualization includes formatted axes (dates and currency values) and deliberate design choices like removing unnecessary spines to focus attention on the data.

Total Revenue Over Time - Note the significant increase followed by a decline
Key Insights:
- Revenue showed a clear upward trend initially, indicating successful marketing and product-market fit
- The sudden decline in revenue coincided with PayPal freezing the store's assets, which significantly impacted operations
- The total revenue during this period was substantial, demonstrating the business's potential
Geographical Analysis
To optimize inventory management and improve delivery times, I visualized the geographical distribution of orders across the United States. This was particularly relevant as the store had distribution centers in New Jersey and San Francisco.
Data Preparation
This analysis required combining order data with geographical data. I used GeoPandas and a US state shapefile from the Census Bureau to create a geographical representation of order distribution. The initial visualization included all territories, but I later filtered to focus on the continental United States for better clarity.
orders_by_state = df_orders['Shipping Province Name'].value_counts().reset_index() orders_by_state.columns = ['State', 'Orders'] gdf_states = gpd.read_file('cb_2018_us_state_20m.shp') gdf_merged = gdf_states.merge(orders_by_state, left_on='NAME', right_on='State', how='left') gdf_merged['Orders'] = gdf_merged['Orders'].fillna(0)
Visualization Refinement
After creating an initial visualization, I made several refinements to improve clarity and focus:
- Excluded territories and non-continental states (Alaska, Hawaii) to focus on the main market
- Used a color gradient that effectively shows the distribution intensity
- Added clear borders between states and appropriate legends

Heatmap showing order distribution across the continental United States
Key Insights:
- Order concentration was significantly higher on the East Coast, explaining why the New Jersey inventory depleted faster
- California showed strong order volume despite being on the opposite coast from the majority of customers
- Several Midwestern states had surprisingly low order volumes, suggesting potential untapped markets
Conversion Rate Analysis
Understanding how effectively the store converted visitors into customers was crucial for evaluating marketing performance. This analysis compared Xmas Everything's conversion rate against industry benchmarks.
Data Collection
I calculated the total conversion rate using website traffic data and order placement information. For benchmarking, I sourced industry data from Littledata's 2022 survey of 3000+ Shopify stores and from IPR Commerce's January 2024 data specific to the clothing industry.
df_traffic = pd.read_csv('visits_2019-10-01_2019-12-31.csv') total_conversion_rate = round((df_traffic['total_orders_placed'].sum() / df_traffic['total_sessions'].sum()) * 100, 2)
Visualization Approach
I used a horizontal bar graph to facilitate easy comparison between the store's performance and industry benchmarks. The visualization includes:
- Color differentiation between benchmarks and the store's performance
- Clear labeling of exact conversion rate percentages
- Simplified design with unnecessary elements removed

Conversion rate comparison between Xmas Everything and industry benchmarks
Key Insights:
- Xmas Everything's conversion rate outperformed both the general Shopify average and the clothing industry average
- This suggests effective targeting, compelling product offerings, or a well-designed customer journey
- The strong conversion rate indicated that traffic quality was high, even if overall volume could be improved
Payment Methods Analysis
The final analysis examined the distribution of payment methods used by customers. This was particularly relevant given the impact of PayPal's asset freeze on the business's operations.
Data Preparation
I extracted payment method data from the orders dataset, focusing specifically on the two primary payment processors: PayPal Express Checkout and Shopify Payments.
payment_methods = df_orders['Payment Method'].value_counts()[['PayPal Express Checkout', 'Shopify Payments']]
Visualization Approach
A pie chart was used to clearly show the proportion of orders processed through each payment method. I used distinct colors for each payment processor and included percentage labels for clarity.

Distribution of orders by payment method
Key Insights:
- PayPal processed a significant portion of orders, explaining the substantial impact when these funds were frozen
- The reliance on multiple payment processors demonstrated both a risk (vulnerability to freezes) and a benefit (diversification)
- The distribution informed a key lesson: maintaining healthy relationships with payment processors is critical for e-commerce operations
Business Impact & Lessons Learned
Inventory Optimization
The geographical analysis clearly explained why the New Jersey warehouse depleted inventory faster than San Francisco. With this data, a more optimal inventory distribution could have been implemented, allocating approximately 70% to the East Coast facility and 30% to the West Coast.
Payment Processing Resilience
The payment method analysis highlighted a critical business vulnerability. Having a substantial portion of orders processed through PayPal created a single point of failure that significantly impacted operations when those funds were frozen. A contingency plan for payment processing would have been valuable.
Marketing Effectiveness
The conversion rate analysis confirmed that once visitors reached the site, they converted at an above-average rate. This suggests that marketing efforts were effectively targeting the right audience, and the website was successfully converting interest into sales.
Revenue Pattern Identification
The time series analysis revealed clear patterns in revenue that could have informed marketing spend timing and inventory preparation. The identification of these patterns would have enabled more strategic decision-making for future seasons.
Technical Growth & Reflection
This project demonstrates significant growth in my data analysis capabilities from when I initially ran the business:
Then: Basic Analytics
- Relied on platform-provided dashboards
- Limited to predefined metrics
- Minimal geographical insights
- No comparative benchmarking
- Reactive decision-making
Now: Advanced Analytics
- Custom Python scripts for targeted analysis
- Data cleaning and transformation skills
- Geographic data visualization with GeoPandas
- Industry benchmarking and contextualization
- Proactive, data-driven recommendations
This reanalysis represents more than just a technical exercise; it demonstrates how enhanced analytical capabilities translate directly to business value. The insights gleaned would have enabled more strategic decision-making, potentially avoiding some of the challenges faced and capitalizing further on the store's strengths.
Future Work
If continuing this analysis, several additional avenues would provide valuable insights:
- Customer Segmentation: Apply clustering techniques to identify different customer groups and their purchasing behaviors
- Product Association Analysis: Implement market basket analysis to discover frequently co-purchased items
- Predictive Modeling: Develop models to forecast seasonal demand and optimize inventory levels
- Customer Lifetime Value: Calculate and analyze CLV to better understand the long-term value of customer acquisition
- Marketing Channel Attribution: Multi-touch attribution modeling to understand which channels drove the most valuable conversions