Retail Store Case Study

Explore dimension modelling with a retail example

About Our Retail Dataset

This case study utilizes "The Retail Store Star Schema Dataset" which provides a realistic foundation for learning dimensional modeling concepts.

🏪 Business Scenario

You are working with data from a multi-location retail chain that sells various products across different stores. The business wants to analyze:

  • Sales Performance: Track revenue, profit, and units sold
  • Product Analysis: Understand which products perform best
  • Customer Insights: Analyze purchasing patterns by segment
  • Store Operations: Compare performance across locations
  • Temporal Trends: Identify seasonal patterns

Schema Diagram

Date Dimension

DateKey (PK)
Date, DayOfWeek, Month, Quarter, Year

Product Dimension

ProductKey (PK)
ProductName, Category, Brand, UnitPrice

Customer Dimension

CustomerKey (PK)
CustomerName, Email, Segment, City

Store Dimension

StoreKey (PK)
StoreName, StoreType, Region, Manager

Sales Fact Table

Foreign Keys:
DateKey (FK), ProductKey (FK), CustomerKey (FK), StoreKey (FK)
Measures:
Quantity, GrossAmount, NetAmount, Profit

🎯 End-to-End Kimball Methodology Application

Follow along as we apply each step of the Kimball methodology to create our retail dimensional model, based on the actual dataset structure used in this application.

1

Select the Business Process

Identify the core operational event to model.

🎯 Chosen Process: Retail Sales Transactions

Why this process?
  • Revenue driver: Sales transactions directly impact business performance.
  • High volume: Thousands of transactions daily provide rich analytical data.
  • Stakeholder interest: Finance, Operations, and Marketing all depend on sales analytics.
  • Operational core: The fundamental business event that drives everything else.
Other processes considered
🏪 Inventory Management

Not chosen first — depends on sales data.

👥 Customer Registration

Lower priority — fewer immediate analytic questions.

🎁 Promotional Events

Modeled later as attributes/dimensions on top of sales.

2

Declare the Grain

Define what each fact table row represents.

📐 Chosen Grain: One row per product per transaction

SelectedTransaction line item level

Each row represents one product purchased in one transaction.

Benefits:
  • Maximum analytical flexibility — can aggregate any way needed.
  • Supports basket analysis (what products are bought together).
  • Enables detailed customer behavior analysis.
  • Supports promotion effectiveness measurement.
Example: If a customer buys 2 laptops and 3 mice in a single transaction, that creates 2 fact table rows (one per product).
RejectedDaily sales by store

Each row would represent total sales for one store on one day.

Why rejected:
  • Loses product-level detail needed for inventory and assortment analysis.
  • Cannot analyze individual customer purchasing patterns.
  • Cannot properly support promotion-level analysis by product.
  • Pre-aggregates data, limiting future analytical possibilities.
Impact on fact table structure
TransactionIDProductKeyCustomerKeyStoreKeyDateKeyQuantityNetAmountProfit
100001100422393002202310163203.9790.14
3

Identify Dimensions

Determine the descriptive context for analysis.

🔍 Dimension Identification Process

Questions asked
🤔 Who is involved in sales transactions?
→ Customers (who buy) and potentially sales staff.
🤔 What products are being sold?
→ Products with categories, brands, specifications.
🤔 When do transactions occur?
→ Date and time, with calendar attributes for analysis.
🤔 Where do sales happen?
→ Stores with location, region, and type.
Resulting dimensions
Customer Dimension
Demographics: Name, Email, Segment
Geography: City, State, Region
Status: IsActive, RegistrationDate
Product Dimension
Identity: ProductName, Brand, SKU
Hierarchy: Category, SubCategory
Pricing: UnitPrice, Cost, Margin
Store Dimension
Location: StoreName, City, State, Region
Type: StoreType, Format
Management: Manager, OpenDate
Date Dimension
Calendar: Date, DayOfWeek, Month, Year
Business: Quarter, FiscalYear, IsHoliday
Periods: IsWeekend, MonthName
4

Identify Facts

Determine the measurable business metrics.

📊 Fact Identification Process

Criteria for fact selection
  • Numeric: Must be quantitative measurements.
  • Additive: Makes sense to sum across dimensions.
  • Process-generated: Results from the sales transaction process.
  • Business relevant: Supports important business decisions.
Core Sales FactsEssential
  • Quantity: Units sold per line item.
  • NetAmount: Revenue after discounts.
  • GrossAmount: Revenue before discounts.
  • DiscountAmount: Total discounts applied.
Calculated FactsDerived
  • TotalCost: Cost of goods sold.
  • Profit: NetAmount - TotalCost.
  • ProfitMargin: Profit / NetAmount × 100.
Validation against business questions
"What are our top-selling products by revenue?"
✅ Sum NetAmount by Product.
"Which stores are most profitable?"
✅ Sum Profit by Store.
"What's our discount impact by customer segment?"
✅ Sum DiscountAmount by Customer Segment.
"How do sales vary by day of week?"
✅ Sum NetAmount by DayOfWeek.

Data Explorer

TransactionIDDateKeyProductKeyCustomerKeyStoreKeyQuantityNetAmountProfit
100001202310161004223930023$203.97$90.14
100002202207181001214930075$3,399.96$1,502.06
100003202211241006247930103$854.97$288.95
100004202303051002215630012$1,619.98$720.00
100005202308121008230130041$134.99$45.00

Showing 5 of 5 rows

Final Dimensional Model Summary

SALES_FACT

Grain: One row per product per transaction
8 Facts: Quantity, Amounts, Costs, Profit
4 Dimension Keys: Customer, Product, Store, Date
Customer
Product
Store
Date