ITA BI Roundtable
    Dimensional Modeling: Organizing your Data for Analytics




    Jeff Block, Managing Consultant
   ...
Welcome
                                                        to the

                         ITA
                 Busi...
Who am I?




                                                         Jeff Block, Neudesic
                              ...
What are we talking about?


                                                       Today’s Agenda

          Brief Intro...
What are we talking about?


                                                       Today’s Agenda

          Brief Intro...
Why are we here?
    What kind of session is this?

    • 2nd Tuesday of every month; 8-10 AM
            – Here at the IT...
Why are we here?
    Topics and Target Audience

    • Business and technology leaders
            – Not going to spend mu...
Why are we here?
    In Scope

    • Business Intelligence
            – Vision and strategy
    • Planning and implementi...
Introduction
    What is Business Intelligence?

      Business Intelligence is the art and science of turning
       corp...
Why are we here?
     Classic BI Architecture

     Our focus is the stuff in this picture and the practices and
         ...
Why are we here?
     Out of Scope

     • Other random stuff
             – No matter how cool Aunt Ruth’s cat is, she’s ...
Why are we here?
     Some Quick Feedback




                                   How does this line up with
              ...
Why are we here?
     A Few Logistics

     • Grab on the way in...
             – A nametag
             – You too can ha...
What are we talking about?


                                                        Today’s Agenda

           Brief Int...
Who’s in the room?
     Brief Introductions

     Please share with the group…

                                          ...
What are we talking about?


                                                        Today’s Agenda

           Brief Int...
Where are you?

      When you talk about dimensional modeling, I …



       1                                     2     ...
Why a different model?
     What is Dimensional Modeling?

     Dimensional modeling is the art and science of modeling
  ...
Why a different model?
     What is Dimensional Modeling?

     Dimensional modeling is the art and science of modeling
  ...
Why a different model?
     What is Dimensional Modeling?

     Dimensional modeling is the art and science of modeling
  ...
Why a different model?
     Why a different model?




                                    Different Goals
               ...
… A Different Model
                                                         Why a different model?
     Different Goals

...
… A Different Model
                                                            Why a different model?
     Storage of His...
… A Different Model
                                                            Why a different model?
     Predictability...
Why a different model?
     How Data is Modeled

     • The dimensional model stores data in “star schemas”
     • Two cor...
Why a different model?
     Two Kinds of “Facts”

     • Measuring a business event
             –    A customer ordered a...
Why a different model?
     Seeing the Model in the Data

     • An example of a business event
             – “Sally purc...
Why a different model?
     Seeing the Model in the Data

               “Sally purchased milk and eggs from Clerk 12 at
 ...
Why a different model?
     How to Use the Model

               “Sally purchased milk and eggs from Clerk 12 at
         ...
Why a different model?
     How to Use the Model

               “Sally purchased milk and eggs from Clerk 12 at
         ...
An Example
     How to Build the Model

               “Sally purchased milk and eggs from Clerk 12 at
                   ...
Why a different model?
     What’s an “Analytic Cube”?




                                                        Product...
Why a different model?
     What’s an “Analytic Cube”?




                                                        Product...
Why a different model?
     Selecting Appropriate Grain for Facts

     • The “grain” of a fact table is the most granular...
Why a different model?
     Kimball’s Dimensional Design Process

     • Step 1: Select business process to model
        ...
Why a different model?
     Kimball’s Dimensional Design Process

     • Step 4: Identify numeric measure to populate fact...
Why a different model?
     Dimensional Conformity

     • The power of the enterprise data warehouse is making
       a “...
… Dimensional Conformity
                                                              Why a different model?
     Beautif...
… Dimensional Conformity
                                                           Why a different model?
     Anarchy if...
… Dimensional Conformity
                                                           Why a different model?
     But you ca...
Why a different model?
     Other (Advanced?) Topics

     •     Snowflakes
     •     Slowly Changing Dimensions
     •  ...
What are we talking about?


                                                        Today’s Agenda

           Brief Int...
Discussion Time




43    © 2001-2010 Neudesic, LLC. All rights reserved.
What are we talking about?


                                                        Today’s Agenda

           Brief Int...
Why a different model?
     Coming Up…

     • March 9, 2010; 8-10 AM at the ITA
             – Topic: What Thomas Edison ...
Effective Data Modeling
Upcoming SlideShare
Loading in …5
×

Effective Data Modeling

1,724 views

Published on

A quick, practical guide to top issues, hot topics, and best practices in modeling your data for analytics.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,724
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Effective Data Modeling

  1. 1. ITA BI Roundtable Dimensional Modeling: Organizing your Data for Analytics Jeff Block, Managing Consultant Jeff.Block@neudesic.com (847) 924-1317 1 © 2001-2010 Neudesic, LLC. All rights reserved.
  2. 2. Welcome to the ITA Business Intelligence Roundtable 2 © 2001-2010 Neudesic, LLC. All rights reserved.
  3. 3. Who am I? Jeff Block, Neudesic BI Roundtable Chairman 3 © 2001-2010 Neudesic, LLC. All rights reserved.
  4. 4. What are we talking about? Today’s Agenda  Brief Introduction  Who’s in the room?  Presentation: Organizing your Data for Analytics  Discussion / Networking  Coming up Next Month 4 4 © 2001-2010 Neudesic, LLC. All rights reserved.
  5. 5. What are we talking about? Today’s Agenda  Brief Introduction  Who’s in the room?  Presentation: Organizing your Data for Analytics  Discussion / Networking  Coming up Next Month 5 5 © 2001-2010 Neudesic, LLC. All rights reserved.
  6. 6. Why are we here? What kind of session is this? • 2nd Tuesday of every month; 8-10 AM – Here at the ITA TechNexus unless there’s a good reason to change venues • Sometimes a presentation – My ideas, your ideas, case studies, best practices, panel discussions, new developments, etc • Sometimes an outside speaker – Love to have some of you step up to the plate • Always discussion – Collaboration is the whole point of this group • Always networking – Meet people who will be valuable connections 6 6 © 2001-2010 Neudesic, LLC. All rights reserved.
  7. 7. Why are we here? Topics and Target Audience • Business and technology leaders – Not going to spend much time deep in the technical weeds • Those who want to – Learn from each other – Collaborate on solutions – Network in the BI space • ITA members and their friends and their friends and … 7 7 © 2001-2010 Neudesic, LLC. All rights reserved.
  8. 8. Why are we here? In Scope • Business Intelligence – Vision and strategy • Planning and implementing BI initiatives – High-level architecture – Best practices / Anti-patterns – Case studies – Etc • What about data warehousing? – It’s in! (part of BI, in my world) 8 8 © 2001-2010 Neudesic, LLC. All rights reserved.
  9. 9. Introduction What is Business Intelligence? Business Intelligence is the art and science of turning corporate data into practical, accessible, actionable knowledge assets, and leveraging them to make empirically-based strategic or operational decisions which increase an organization’s capacity to fulfill its mission. To this end, BI requires:  A disciplined, well-governed culture  A specialized, analytic engine  A well-designed data architecture 9 9 © 2001-2010 Neudesic, LLC. All rights reserved.
  10. 10. Why are we here? Classic BI Architecture Our focus is the stuff in this picture and the practices and processes that get it there effectively. BI Presentation Components Data Data Data Data Mart Mart Mart Mart OLAP Services Source Source Systems ETL Data Warehouse ETL Systems 10 10 © 2001-2010 Neudesic, LLC. All rights reserved.
  11. 11. Why are we here? Out of Scope • Other random stuff – No matter how cool Aunt Ruth’s cat is, she’s out of scope • Building the tech together • Arguing over low-level details • Generally, if we talk about – Project management / SDLC – Architecture and design – Business processes – Etc then it will be in the context of BI / DW / EDM 11 11 © 2001-2010 Neudesic, LLC. All rights reserved.
  12. 12. Why are we here? Some Quick Feedback How does this line up with your expectations? 12 12 © 2001-2010 Neudesic, LLC. All rights reserved.
  13. 13. Why are we here? A Few Logistics • Grab on the way in... – A nametag – You too can have a spiffy nametag; just pre-register.  • Let me know you’re here – Toss a card in the fish bowl – No spam policy – No card? No problem. Sign the list. • Join our LinkedIn group – http://www.linkedin.com/groups?gid=1801350 – Don’t worry, we’ll send you an invite • Restrooms, etc… 13 13 © 2001-2010 Neudesic, LLC. All rights reserved.
  14. 14. What are we talking about? Today’s Agenda  Brief Introduction  Who’s in the room?  Presentation: Organizing your Data for Analytics  Discussion / Networking  Coming up Next Month 14 14 © 2001-2010 Neudesic, LLC. All rights reserved.
  15. 15. Who’s in the room? Brief Introductions Please share with the group… • Name • Company • Role • What you want to get out of this session? 15 15 © 2001-2010 Neudesic, LLC. All rights reserved.
  16. 16. What are we talking about? Today’s Agenda  Brief Introduction  Who’s in the room?  Presentation: Organizing your Data for Analytics  Discussion / Networking  Coming up Next Month 16 16 © 2001-2010 Neudesic, LLC. All rights reserved.
  17. 17. Where are you? When you talk about dimensional modeling, I … 1 2 3 4 5 Think you’re talking Know enough Could model about Star Trek to be dangerous Aunt Ruth’s cat 17 © 2001-2010 Neudesic, LLC. All rights reserved.
  18. 18. Why a different model? What is Dimensional Modeling? Dimensional modeling is the art and science of modeling data for the purposes of fast, efficient and intuitive retrieval (typically from a data warehouse) for use in online analytic processing. 18 © 2001-2010 Neudesic, LLC. All rights reserved.
  19. 19. Why a different model? What is Dimensional Modeling? Dimensional modeling is the art and science of modeling data for the purposes of fast, efficient and intuitive retrieval (typically from a data warehouse) for use in online analytic processing. 19 © 2001-2010 Neudesic, LLC. All rights reserved.
  20. 20. Why a different model? What is Dimensional Modeling? Dimensional modeling is the art and science of modeling data for the purposes of fast, efficient and intuitive retrieval (typically from a data warehouse) for use in online analytic processing. • Completely different data modeling approach – Than most of us are used to • Two strategic goals: – Fast, efficient data retrieval – Intuitive interface to the data 20 © 2001-2010 Neudesic, LLC. All rights reserved.
  21. 21. Why a different model? Why a different model? Different Goals Storage of Historic Records Predictability of Requirements 21 © 2001-2010 Neudesic, LLC. All rights reserved.
  22. 22. … A Different Model Why a different model? Different Goals • Transactional systems – An effective interface between a business process and a user – Effective execution of a single business transaction • OLAP systems – An effective interface between a corporate decision-maker and analytic analysis data – Effective analysis of a set of business transactions • Note the absence of “efficient storage” goals. Why? 22 © 2001-2010 Neudesic, LLC. All rights reserved.
  23. 23. … A Different Model Why a different model? Storage of Historic Records • Transactional systems – No need to know history – Optimized for the current transaction • OLAP systems – Business should be able to arbitrarily define the longevity of data – Optimized for consistent historic and predictive analysis • Why no history in operational systems? 23 © 2001-2010 Neudesic, LLC. All rights reserved.
  24. 24. … A Different Model Why a different model? Predictability of Requirements • Transactional systems – Very predictable usage requirements – Every interaction follows the same transactional process • OLAP systems – Very unpredictable usage requirements – Ad-hoc / business-configured queries – Every interaction potentially follows a completely different pattern than the previous interaction • Why are OLAP queries so unpredictable? 24 © 2001-2010 Neudesic, LLC. All rights reserved.
  25. 25. Why a different model? How Data is Modeled • The dimensional model stores data in “star schemas” • Two core elements: “facts” and “dimensions” • Facts – Core data of a business event – The “verb” in the sentence describing the event – Also called a “measure” • Dimensions – Context in which the event (measurement) occurred – The “nouns” in the sentence 25 © 2001-2010 Neudesic, LLC. All rights reserved.
  26. 26. Why a different model? Two Kinds of “Facts” • Measuring a business event – A customer ordered a widget – A new book was published – A relationship was established – A lead was converted • Taking a snapshot of reality – Inventory looks like this at this time – Membership looks like this on this date – Current workflow is at this stage at this time 26 © 2001-2010 Neudesic, LLC. All rights reserved.
  27. 27. Why a different model? Seeing the Model in the Data • An example of a business event – “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on Tuesday at 3:28PM” • Implies a dimensional model with… You tell me Huddle up, and list the facts and dimensions in this event 27 © 2001-2010 Neudesic, LLC. All rights reserved.
  28. 28. Why a different model? Seeing the Model in the Data “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on Tuesday at 3:28PM” • Implies a dimensional model with – One fact › “customer purchased items” › Two lines written to fact table; one for each item purchased – Several dimensions › Customer “Sally” › Inventory items “milk” and “eggs” with specific SKUs › A particular “Wal-Mart” store with a specific identifier › A particular clerk, identified as “Clerk 12” › Date “Tuesday” › Time “3:28PM” 28 © 2001-2010 Neudesic, LLC. All rights reserved.
  29. 29. Why a different model? How to Use the Model “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on Tuesday at 3:28PM” • “Pivot” the context on the measurement taken – Offers various perspectives on the data • Aggregate many measures to achieve analytic report • If aggregated… – Hundreds, thousands, millions of times • What questions could you ask these data? 29 © 2001-2010 Neudesic, LLC. All rights reserved.
  30. 30. Why a different model? How to Use the Model “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on Tuesday at 3:28PM” • A few questions I thought of… – In what regions of the country do we sell the most dairy products in the first quarter? – Which three clerks sold the most impulse items in each Super Wal-Mart in the mid-west this year? – What is the correlation between the sale of milk and eggs in summer vs. winter months? – Who are our most loyal customers? – At what time of day do we typically not sell any dairy products? – Does staying open later on the weekends result in more dairy product sales? 30 © 2001-2010 Neudesic, LLC. All rights reserved.
  31. 31. An Example How to Build the Model “Sally purchased milk and eggs from Clerk 12 at Wal-Mart on Tuesday at 3:28PM” Dimensions Dimensions Customers Fact Clerks Customer Items Dates Purchased Item Store Times See why they call it a “star schema”? 31 © 2001-2010 Neudesic, LLC. All rights reserved.
  32. 32. Why a different model? What’s an “Analytic Cube”? Product Store Purchase Date • Purchase measure is the pivot point • Joins 2 or more dimensions (context) 32 © 2001-2010 Neudesic, LLC. All rights reserved.
  33. 33. Why a different model? What’s an “Analytic Cube”? Product Store Purchase Date • Extrapolate to a cube – Several measure sharing a set of dimensions • Pivot cube on any point to get different analytic views of the data • Really N-dimensional, but we mere mortals can’t visualize that – So it’s a cube 33 © 2001-2010 Neudesic, LLC. All rights reserved.
  34. 34. Why a different model? Selecting Appropriate Grain for Facts • The “grain” of a fact table is the most granular level of information that can be retrieved from the table. • Shoot for “Atomic” grain facts – Irreducibly complex; cannot be subdivided – Dimensionally unconstrained › Rolls up in any and all possible ways › BI requires cutting through details in precise ways – Required for drilling into reports › One of the core strengths of BI – Required for ad-hoc querying – Can always create other fact tables or business views with aggregations 34 © 2001-2010 Neudesic, LLC. All rights reserved.
  35. 35. Why a different model? Kimball’s Dimensional Design Process • Step 1: Select business process to model – Natural business activity performed – Not a department or business function • Step 2: Declare grain of the business process – Level of detail associated with fact measurement – Define exactly what a fact table row represents – Atomic data is typically best • Step 3: Choose dimensions applying to each fact table row – Context in which we’re taking measurements – Answer: “How do businesspeople describe the data that results from the business process?” – List dimensions, then all attributes per dimension 35 © 2001-2010 Neudesic, LLC. All rights reserved.
  36. 36. Why a different model? Kimball’s Dimensional Design Process • Step 4: Identify numeric measure to populate fact tables – Numeric fact info which will populate the rows of the fact table – Answer: “What are we measuring?” – Measure only in the determined grain – Different grain requires different fact table 36 © 2001-2010 Neudesic, LLC. All rights reserved.
  37. 37. Why a different model? Dimensional Conformity • The power of the enterprise data warehouse is making a “single source of the truth” available to the business – Only possible with conformed dimensions – Kimball’s “enterprise bus” model favors this • Dimensions are nouns – “Product”, “Customer”, “Store”, “Person”, etc – If more than one definition of a noun, sentences start to have conflicting meanings • Only one definition of a dimension means it’s “conformed” 37 © 2001-2010 Neudesic, LLC. All rights reserved.
  38. 38. … Dimensional Conformity Why a different model? Beautiful if you have it… • Cross-functional view of data • Whole organization working in concert – Trend analysis – Predictive analysis – Drilling down into the true root cause of problems – Accurate and complete financial pictures 38 © 2001-2010 Neudesic, LLC. All rights reserved.
  39. 39. … Dimensional Conformity Why a different model? Anarchy if you don’t… • Missed opportunities from silo’d data • Nearly redundant departmental databases – Nearly redundant data development – Nearly redundant administration – Nearly redundant storage – Nearly redundant system development – A lot of wasted time, energy and money • Even more waste comes from trying to reconcile slightly different versions of the truth 39 © 2001-2010 Neudesic, LLC. All rights reserved.
  40. 40. … Dimensional Conformity Why a different model? But you can Restore Order • Three requirements 1. Political clout 2. Financial means 3. Willingness / ability to challenge the status quo • Pick a silo where you can drive a stake in the ground – I call it “bedrock data” • Expand out from there – Analyze and graft other silos onto the bedrock – DO NOT start ANY initiative that creates a new center of data 40 © 2001-2010 Neudesic, LLC. All rights reserved.
  41. 41. Why a different model? Other (Advanced?) Topics • Snowflakes • Slowly Changing Dimensions • Denormalized Dimensions • Factless Fact Tables • Degenerate Dimensions • Master Data Management • Much more Interested in a follow-up? 41 © 2001-2010 Neudesic, LLC. All rights reserved.
  42. 42. What are we talking about? Today’s Agenda  Brief Introduction  Who’s in the room?  Presentation: Organizing your Data for Analytics  Discussion / Networking  Coming up Next Month 42 42 © 2001-2010 Neudesic, LLC. All rights reserved.
  43. 43. Discussion Time 43 © 2001-2010 Neudesic, LLC. All rights reserved.
  44. 44. What are we talking about? Today’s Agenda  Brief Introduction  Who’s in the room?  Presentation: Organizing your Data for Analytics  Discussion / Networking  Coming up Next Month 44 44 © 2001-2010 Neudesic, LLC. All rights reserved.
  45. 45. Why a different model? Coming Up… • March 9, 2010; 8-10 AM at the ITA – Topic: What Thomas Edison would do with your data – Speaker: Sarah Miller Caldicott › Great grandniece of Thomas Edison › Co-author: “Innovate Like Edison” › Founder: The Power Patterns of Innovation • April 13, 2010; 8-10:30AM at the ITA – Topic: Grudge Match II – Another Smackdown – Proposed featured BI product vendors: › Microsoft › Oracle › Info Bright › MicroStrategy 45 © 2001-2010 Neudesic, LLC. All rights reserved.

×