Retail Store Case Study
About Our Retail Dataset
This case study utilizes "The Retail Store Star Schema Dataset" which provides a realistic foundation for learning dimensional modeling concepts.
🏪 Business Scenario
You are working with data from a multi-location retail chain that sells various products across different stores. The business wants to analyze:
- •Sales Performance: Track revenue, profit, and units sold
- •Product Analysis: Understand which products perform best
- •Customer Insights: Analyze purchasing patterns by segment
- •Store Operations: Compare performance across locations
- •Temporal Trends: Identify seasonal patterns
Schema Diagram
Date Dimension
Product Dimension
Customer Dimension
Store Dimension
Sales Fact Table
🎯 End-to-End Kimball Methodology Application
Follow along as we apply each step of the Kimball methodology to create our retail dimensional model, based on the actual dataset structure used in this application.
Select the Business Process
Identify the core operational event to model.
🎯 Chosen Process: Retail Sales Transactions
Why this process?
- Revenue driver: Sales transactions directly impact business performance.
- High volume: Thousands of transactions daily provide rich analytical data.
- Stakeholder interest: Finance, Operations, and Marketing all depend on sales analytics.
- Operational core: The fundamental business event that drives everything else.
Other processes considered
Not chosen first — depends on sales data.
Lower priority — fewer immediate analytic questions.
Modeled later as attributes/dimensions on top of sales.
Declare the Grain
Define what each fact table row represents.
📐 Chosen Grain: One row per product per transaction
Each row represents one product purchased in one transaction.
- Maximum analytical flexibility — can aggregate any way needed.
- Supports basket analysis (what products are bought together).
- Enables detailed customer behavior analysis.
- Supports promotion effectiveness measurement.
Each row would represent total sales for one store on one day.
- Loses product-level detail needed for inventory and assortment analysis.
- Cannot analyze individual customer purchasing patterns.
- Cannot properly support promotion-level analysis by product.
- Pre-aggregates data, limiting future analytical possibilities.
Impact on fact table structure
Identify Dimensions
Determine the descriptive context for analysis.
🔍 Dimension Identification Process
Questions asked
Resulting dimensions
Identify Facts
Determine the measurable business metrics.
📊 Fact Identification Process
Criteria for fact selection
- Numeric: Must be quantitative measurements.
- Additive: Makes sense to sum across dimensions.
- Process-generated: Results from the sales transaction process.
- Business relevant: Supports important business decisions.
- Quantity: Units sold per line item.
- NetAmount: Revenue after discounts.
- GrossAmount: Revenue before discounts.
- DiscountAmount: Total discounts applied.
- TotalCost: Cost of goods sold.
- Profit: NetAmount - TotalCost.
- ProfitMargin: Profit / NetAmount × 100.
Validation against business questions
Data Explorer
| TransactionID | DateKey | ProductKey | CustomerKey | StoreKey | Quantity | NetAmount | Profit |
|---|---|---|---|---|---|---|---|
| 100001 | 20231016 | 1004 | 2239 | 3002 | 3 | $203.97 | $90.14 |
| 100002 | 20220718 | 1001 | 2149 | 3007 | 5 | $3,399.96 | $1,502.06 |
| 100003 | 20221124 | 1006 | 2479 | 3010 | 3 | $854.97 | $288.95 |
| 100004 | 20230305 | 1002 | 2156 | 3001 | 2 | $1,619.98 | $720.00 |
| 100005 | 20230812 | 1008 | 2301 | 3004 | 1 | $134.99 | $45.00 |
Showing 5 of 5 rows