Your SlideShare is downloading. ×
Agenda
•

Introduction / a2c Overview

•

Modeling for End Users

•

Role of Dimensional Models in Big Data

•

Example: e...
Introduction
•

a2c
•

•

Data Warehousing

•

Master Data Management

•

Closed Look Analytics and Visualization

•
•

Bo...
a2c Corporate Overview
& Industry Experience

!4
Company Overview
•

Technology Solution Consultancy headquartered in Philadelphia with
regional offices in New York and Bos...
Competitive Advantage
•

Founders of a2c were part of the fastest growing privately held IT consulting and staff
augmentat...
Representative Clients

03/19/12

!7
a2c Solution Engagement Structures
•

Technology Strategy & Roadmap Formulation

•

Needs & Readiness Assessment

•

Packa...
a2c Solutions Capabilities
•

Enterprise Data Management Practice helps clients manage their complete Information
Lifecycl...
Agile DW Design
Overview

!10
Modeling for End Users
•

How to Design to Answer
Business Questions?
•

Think about how questions are articulated

•

And...
How Do We Ask Questions?
Who

What

When

“How do this quarter’s sales by sales rep of
electronic products that we promote...
How Do We Ask Questions?
•

Events / Transactions
•
•

•

e.g. Sale
a immutable "fact" that occurs in a time and (typicall...
Dimensional Value Proposition
•

It makes sense to present answers to people using the same
taxonomy of events and interro...
How

Wh

ho
W

en
How
Many

Wh

at

Why

h
W

re
e

!15
Scenarios
•

A brief discussion of how and where
dimensional modeling and/or
databases fit within common and
emerging “big ...
Kimball Dimensional DW
Dimensional BI Semantic Layer
Dimensional Data Warehouse
Data Movement / Integration
Source Data
(S...
Kimball with Big Data
Dimensional BI Semantic Layer
Dimensional Data Warehouse

Big Data
Capture

Big Data
Discovery

(e.g...
Corporate Information Factory (CIF)
Dimensional BI Semantic Layer
Dimensional Tier
(Virtual or Physical)

Corporate Inform...
CIF with Big Data
Dimensional BI Semantic Layer
Dimensional Tier
(Virtual or Physical)

Big Data
Capture

Big Data
Discove...
Data Vault
Dimensional BI Semantic Layer
Dimensional Tier
(Virtual or Physical)

Data Vault

Data Movement / Integration
S...
Data Vault with Big Data
Dimensional BI Semantic Layer
Dimensional Tier
(Virtual or Physical)

Big Data
Capture

Big Data
...
Etc.

!23
Common Framework
Dimensional BI Semantic Layer
Dimensional Tier
[Physical (Kimball) or Virtual (CIF or Data Vault)
Persist...
Common Framework
Dining Room
Readily Accessible to End Users
(and BI Developers)
Safe, Hospital Environment
Data Assets “R...
eCommerce Example: Clickstream
Semi-Structured
Recording of every page request
made by a user
Includes some structural ele...
Typical Clickstream “Page View” Dimensional
Model
What

When

What

Who

Why

!27
eCommerce Example: Web Sales
•

•

Time

•

Customer

•

•

Referring URL / Search
Phrase

•

Promotion / Campaign

•

The...
eCommerce Dimensionality
Facts (below) &
Dimensions (right)
Page Visit
Detailed Product
View
Shopping Cart
Activity

Time!...
Agile DW Design
Overview

!30
The first dimensional modeler:

Rudyard Kipling
Ralph Kimball?
R.K.

!31
I keep six honest serving-men

(They taught me all I knew);

Their names are What and Why and When 

And How and Where and...
Who
!33
What
!34
When
!35
Where
!36
Why
!37
How
!38
How Many
!39
The

7Ws
Framework
How	

Many

Why

e
r
e
h
W

How

Wh
en

o
h
W
Wh
at
How did we get here?
DW Architectures: A Brief History
Corporate Information
Factory	

!
Data-Driven Analysis

Undisciplined Dimensional	

!
Re...
7Ws Dimensional Model
When	


Who	


Time	


Customer	


Day	


How – Facts:	


Employee	


Month	


Much	


Third Party	
...
How	

Many

o
Wh
Wh
at

Why

re
he
W

How

BEAM

Wh
en

Business Event Analysis & Modeling
How
do you design a data warehouse?
Tech Design Artifacts?
CALENDAR

PRODUCT

Date Key

Product Key

Date
Day
Day in Week
Day in Month
Day in Qtr
Day in Year
...
OK, Now Validate with
Why
Agile Data Warehousing?
Waterfall BI/DW
Limited Stakeholder interaction

Analysis
Design
Development
This Year

BDUF

Stakeholder	

 Requirements
...
Agile DW/BI Development
Stakeholder interaction

?

JEDUF

BI	

Prototyping

ETL

Review	

Release

This Year

Next Year

...
State of The
DW Field
Solid:
Dimensional Data Warehouse Design is Mature
Proven Design Patterns Exist for Common
Requireme...
Modelstorming
Quick

Inclusive

Data

Modeler

Interactive

BI Stakeholders

Fun
BEAM✲ Methodology
Structured, non-technical, collaborative working
conversation directly with BI Users

BEAM✲
BI User’s Bu...
Requirements =
Design
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
Upcoming SlideShare
Loading in...5
×

Agile Data Warehouse Design for Big Data Presentation

3,454

Published on

Synopsis:
[Video link: http://www.youtube.com/watch?v=ZNrTxSU5IQ0 ]
Jim Stagnitto and John DiPietro of consulting firm a2c) will discuss Agile Data Warehouse Design - a step-by-step method for data warehousing / business intelligence (DW/BI) professionals to better collect and translate business intelligence requirements into successful dimensional data warehouse designs.


The method utilizes BEAM✲ (Business Event Analysis and Modeling) - an agile approach to dimensional data modeling that can be used throughout analysis and design to improve productivity and communication between DW designers and BI stakeholders. BEAM✲ builds upon the body of mature "best practice" dimensional DW design techniques, and collects "just enough" non-technical business process information from BI stakeholders to allow the modeler to slot their business needs directly and simply into proven DW design patterns.


BEAM✲ encourages DW/BI designers to move away from the keyboard and their entity relationship modeling tools and begin "white board" modeling interactively with BI stakeholders. With the right guidance, BI stakeholders can and should model their own BI data requirements, so that they can fully understand and govern what they will be able to report on and analyze.


The BEAM✲ method is fully described in

Agile Data Warehouse Design - a text co-written by Lawrence Corr and Jim Stagnitto.



About the speaker:

Jim Stagnitto Director of a2c Data Services Practice



Data Warehouse Architect: specializing in powerful designs that extract the maximum business benefit from Intelligence and Insight investments.

Master Data Management (MDM) and Customer Data Integration (CDI) strategist and architect.

Data Warehousing, Data Quality, and Data Integration thought-leader: co-author with Lawrence Corr of "Agile Data Warehouse Design", guest author of Ralph Kimball’s “Data Warehouse Designer” column, and contributing author to Ralph and Joe Caserta's latest book: “The DW ETL Toolkit”.

John DiPietro Chief Technology Officer at A2C IT Consulting



John DiPietro is the Chief Technology Officer for a2c. Mr. DiPietro is responsible
for setting the vision, strategy, delivery, and methodologies for a2c’s Solution
Practice Offerings for all national accounts. The a2c CTO brings with him an
expansive depth and breadth of specialized skills in his field.

Sponsor Note:

Thanks to:

Microsoft NERD for providing awesome venue for the event.

http://A2C.com IT Consulting for providing the food/drinks.

http://Cognizeus.com for providing book to give away as raffle.

Published in: Education, Technology, Business

Transcript of "Agile Data Warehouse Design for Big Data Presentation"

  1. 1. Agile Data Warehouse Design with Big Data John DiPietro & Jim Stagnitto !1
  2. 2. Agenda • Introduction / a2c Overview • Modeling for End Users • Role of Dimensional Models in Big Data • Example: eCommerce • Structured Data: Sales • Semi-structured Data: Clickstream • Agile Dimensional Modeling Overview • Case Study Review • Q&A !2
  3. 3. Introduction • a2c • • Data Warehousing • Master Data Management • Closed Look Analytics and Visualization • • Boutique EDM (Enterprise Data Management) consultancy firm: Data & Application Architecture John DiPietro • • Principal, Chief Technology Officer Jim Stagnitto • Data Warehouse & MDM Architect !3
  4. 4. a2c Corporate Overview & Industry Experience !4
  5. 5. Company Overview • Technology Solution Consultancy headquartered in Philadelphia with regional offices in New York and Boston • Servicing Healthcare, Life Science, Tel-Com and Financial Services industries with recent obtainment of our GSA schedule to pursue Federal Government opportunities • Consultant base of over 2500 proven IT professionals throughout the North East Region with a recruiting network which provides national coverage • Flexible approach to helping our clients with their initiatives • Project-based Solutions • Staff Augmentation • Managed Service Offerings – “On-Shore QA , Development & Application Support” • Executive & Professional Search !5
  6. 6. Competitive Advantage • Founders of a2c were part of the fastest growing privately held IT consulting and staff augmentation firm in the US from 1994-2002. Our Executive Management Team has over a 100 years collective experience and been responsible for delivering over a half-billion dollars of IT Consulting and staff augmentation revenue from 1994 through to the present day. • a2c’s Recruiting Engine and Methodology is one of the best in the industry, capable of producing quality results, on-demand for our clients • Resource Managers continually “Silo” disciplines with available candidates whom have proven their abilities with us over the last 10 years • Our solutions organization is instrumentally involved during the screening and selection process to ensure that candidates submitted to our clients are an ideal match • a2c’s Culture provides an ability to attract and retain the best talent in the industry and fosters creativity, integrity, growth and teamwork • a2c provides our clients with an alternative solution to a “Big 4” consultancy at substantial savings for projects that are between $500K and $5M due to our flexibility, agility and focus !6
  7. 7. Representative Clients 03/19/12 !7
  8. 8. a2c Solution Engagement Structures • Technology Strategy & Roadmap Formulation • Needs & Readiness Assessment • Package & Platform Selections • Proof of Concept Implementation • Requirements Discovery & Specifications • Program/Project Management • Full Life Cycle & Application Development • Infrastructure & Facilities Initiatives • Managed Services & Maintenance Support !8
  9. 9. a2c Solutions Capabilities • Enterprise Data Management Practice helps clients manage their complete Information Lifecycle from their On-line Transactional systems to their Data Warehousing, Enterprise Reporting, Data Migration, Back-Up and Recovery Strategies (See Slide 7) • Business Architecture & Optimization Practice utilizes “Six Sigma Lean” methodologies to analyze, re-engineer and automate our client’s business processes to leverage human workflow and business rules engine technologies to create efficiencies and provide business unit owners with the necessary metrics to continually improve performance • Program Management Office oversees all aspects of solutions planning and delivery across client engagement teams and provides the methodology and frameworks which are based on PMI® industry standards • Application Development & Managed Services Practice helps clients architect, implement and deploy the latest Microsoft and Enterprise Java based applications which are built on proven frameworks and architectures for the enterprise • a2c's SDLC Delivery Model is comprised of over 20 years collective best practices and industry proven methodologies that allow our delivery teams to rapidly design, develop and implement solutions. Our SDLC model has been designed to complement our project management methodology, utilizing iterative development cycles that enable project teams to provide consistently high quality, on-time deliverables, regardless of technology platform !9
  10. 10. Agile DW Design Overview !10
  11. 11. Modeling for End Users • How to Design to Answer Business Questions? • Think about how questions are articulated • And how the answers should be deliveredIdentify a common question framework • Design an architecture that embraces and leverages this common question framework • Utilize the best designs and technologies to: • (a) derive the answers • (b) present them in compelling ways that lead to the next interesting question! !11
  12. 12. How Do We Ask Questions? Who What When “How do this quarter’s sales by sales rep of electronic products that we promoted to retail customers in the east compare with last year’s? What Who Where Why !12 When
  13. 13. How Do We Ask Questions? • Events / Transactions • • • e.g. Sale a immutable "fact" that occurs in a time and (typically a) place Interrogatives: • Who, What, When, Where, Why • Descriptive context that fully describes the event • a set of “dimensions" that describe events !13
  14. 14. Dimensional Value Proposition • It makes sense to present answers to people using the same taxonomy of events and interrogatives (aka: facts and dimensions - dimensional structure) that they use when forming questions • Events are instances of processes : • It’s best to present information to people who will ask the system questions in dimensional form • This is true regardless of the type of information being interrogated, it’s source, or IT stuff (like database technologies utilized) • It’s best to model this presentation layer based on the events (aka: business processes) that underlie the questions !14
  15. 15. How Wh ho W en How Many Wh at Why h W re e !15
  16. 16. Scenarios • A brief discussion of how and where dimensional modeling and/or databases fit within common and emerging “big data” data warehousing architectures !16
  17. 17. Kimball Dimensional DW Dimensional BI Semantic Layer Dimensional Data Warehouse Data Movement / Integration Source Data (Structured) !17
  18. 18. Kimball with Big Data Dimensional BI Semantic Layer Dimensional Data Warehouse Big Data Capture Big Data Discovery (e.g. HDFS) (e.g. MR) Data Movement / Integration Tier Data Movement / Integration Tier Source Data Tier Source Data Tier (Un/Semi-Structured) (Structured) !18
  19. 19. Corporate Information Factory (CIF) Dimensional BI Semantic Layer Dimensional Tier (Virtual or Physical) Corporate Information Factory 3NF DW Data Movement / Integration Source Data (Structured) !19
  20. 20. CIF with Big Data Dimensional BI Semantic Layer Dimensional Tier (Virtual or Physical) Big Data Capture Big Data Discovery (e.g. HDFS) (e.g. MR) Corporate Information Factory 3NF DW Data Movement / Integration Tier Data Movement / Integration Tier Source Data Tier Source Data Tier (Un/Semi-Structured) (Structured) !20
  21. 21. Data Vault Dimensional BI Semantic Layer Dimensional Tier (Virtual or Physical) Data Vault Data Movement / Integration Source Data (Structured) !21
  22. 22. Data Vault with Big Data Dimensional BI Semantic Layer Dimensional Tier (Virtual or Physical) Big Data Capture Big Data Discovery (e.g. HDFS) (e.g. MR) Data Vault Data Movement / Integration Tier Data Movement / Integration Tier Source Data Tier Source Data Tier (Un/Semi-Structured) (Structured) !22
  23. 23. Etc. !23
  24. 24. Common Framework Dimensional BI Semantic Layer Dimensional Tier [Physical (Kimball) or Virtual (CIF or Data Vault) Persistant Un/ Semi-Structured Staging Area Unstructured -> Structured Data Discovery Processing Persistent Structured Data Repository (not needed for Kimball) Un/Semi-Structured Data Movement Structured Data Movement Un/Semi-Structured Source Data Structured Source Data (Structured) !24 Insight Generation / Data Mining
  25. 25. Common Framework Dining Room Readily Accessible to End Users (and BI Developers) Safe, Hospital Environment Data Assets “Ready for Primetime” Dimensionally Structured Dimensional BI Semantic Layer Dimensional Tier [Physical (Kimball) or Virtual (CIF or Data Vault) Persistant Un/ Semi-Structured Staging Area Unstructured -> Structured Data Discovery Processing Persistent Structured Data Repository Kitchen (not needed for Kimball) Un/Semi-Structured Data Movement Structured Data Movement Un/Semi-Structured Source Data Structured Source Data (Structured) Clickstream Data Off Limits to End Users Data Professionals Only Please Dangerous / Inhospitable Environment Data Assets “Not Ready for Primetime” Structured Variably For Data Processing eCommerce Sale eCommerce Example !25
  26. 26. eCommerce Example: Clickstream Semi-Structured Recording of every page request made by a user Includes some structural elements – such as when the request was made and who the user is Requires significant prep work in order to fit into a traditional rowbased relational database Apples and Oranges: PreSessionized Page Visits, Detailed Product Views, Catalogue Requests, Shopping Cart Adds / Deletes / Abandons, etc. Needs to be converted into seperate-but-relatable dimensional facts - with many shared (conformed) dimensions !26 Raw Clickstream Data! 25 52 164 240 274 328 368 448 538 561 630 687 730 775 825 834 39 120 124 205 401 581 704 814 825 834 35 249 674 712 733 759 854 950 39 422 449 704 825 857 895 937 954 964 15 229 262 283 294 352 381 708 738 766 853 883 966 978 26 104 143 320 569 620 798 7 185 214 350 529 658 682 782 809 849 883 947 970 979 227 390 71 192 208 272 279 280 300 333 496 529 530 597 618 674 675 720 855 914 932 183 193 217 256 276 277 374 474 483 496 512 529 626 653 706 878 939 161 175 177 424 490 571 597 623 766 795 853 910 960 125 130 327 698 699 839 392 461 569 801 862 27 78 104 177 733 775 781 845 900 921 938 101 147 229 350 411 461 572 579 657 675 778 803 842 903 71 208 217 266 279 290 458 478 523 614 766 853 888 944 969 43 70 176 204 227 334 369 480 513 703 708 835 874 895 25 52 278 730 151 432 504 830 890 71 73 118 274 310 327 388 419 449 469 484 706 722 795 810 844 846 918 130 274 432 528 967 188 307 326 381 403 523 526 722 774 788 789 834 950 975 89 116 198 201 333 395 653 720 846 70 171 227 289 462 538 541 623 674 701 805 946 964 143 192 317 471 487 631 638 640 678 735 780 865 888 935 17 242 471 758 763 837 956 52 145 161 283 375 385 676 721 731 790 792 885 182 229 276 529 43 522 565 617 859
  27. 27. Typical Clickstream “Page View” Dimensional Model What When What Who Why !27
  28. 28. eCommerce Example: Web Sales • • Time • Customer • • Referring URL / Search Phrase • Promotion / Campaign • The Sale Transaction typically carries all fundamental dimensions: Purchase and/or Shipment (Geo or URL) Locations • Fully Structured • • Etc. And “How Many” Measures • • !28 Discount Amounts • Product Unit and Price Quantities / Amounts Etc
  29. 29. eCommerce Dimensionality Facts (below) & Dimensions (right) Page Visit Detailed Product View Shopping Cart Activity Time! (When) View Start View End Session Start Session End View Start View End Session Start Session End Activity Start Activity End Customer! Web Page! (Who) (Where) Visitor Current
 Previous Next Prospect Current
 Previous Next Product! (What) Referring URL! (Where) Promotion / Campaign (Why) Activity Type (How) ✔ ✔ ✔ Prospect ✔ ✔ ✔ ✔ ✔ ✔ ✔ Sale (Checkout) Sale Start Sale End Customer ✔ Shipment / Delivery Shipment Delivery Customer Delivery Recipient ✔ !29
  30. 30. Agile DW Design Overview !30
  31. 31. The first dimensional modeler: Rudyard Kipling Ralph Kimball? R.K. !31
  32. 32. I keep six honest serving-men
 (They taught me all I knew);
 Their names are What and Why and When 
 And How and Where and Who… –Rudyard Kipling !32 !32
  33. 33. Who !33
  34. 34. What !34
  35. 35. When !35
  36. 36. Where !36
  37. 37. Why !37
  38. 38. How !38
  39. 39. How Many !39
  40. 40. The 7Ws Framework
  41. 41. How Many Why e r e h W How Wh en o h W Wh at
  42. 42. How did we get here?
  43. 43. DW Architectures: A Brief History Corporate Information Factory ! Data-Driven Analysis Undisciplined Dimensional ! Report-Driven Analysis Dimensional Bus Architecture ! Process-Driven Analysis
  44. 44. 7Ws Dimensional Model When Who Time Customer Day How – Facts: Employee Month Much Third Party Fiscal Period Many Organization Often £$€ Where What Location Product ?? Why Service Store Causal Transactions Ship To Promotion Hospital Reason Geographic Weather Competition
  45. 45. How Many o Wh Wh at Why re he W How BEAM Wh en Business Event Analysis & Modeling
  46. 46. How do you design a data warehouse?
  47. 47. Tech Design Artifacts? CALENDAR PRODUCT Date Key Product Key Date Day Day in Week Day in Month Day in Qtr Day in Year Month Qtr Year Weekday Flag Holiday Flag Product Code Product Description Product Type Brand Subcategory Category SALES FACT Date Key Product Key Store Key Promotion Key Quantity Sold Revenue Cost Basket Count STORE PROMOTION Store Key Promotion Key Store Code Store Name URL Store Manager Region Country Promotion Code Promotion Name Promotion Type Discount Type Ad Type
  48. 48. OK, Now Validate with
  49. 49. Why Agile Data Warehousing?
  50. 50. Waterfall BI/DW Limited Stakeholder interaction Analysis Design Development This Year BDUF Stakeholder Requirements Input Data Model Next Year Test Release ETL BI DATA VALUE?
  51. 51. Agile DW/BI Development Stakeholder interaction ? JEDUF BI Prototyping ETL Review Release This Year Next Year Iteration 1 VALUE? Iteration 2 ETL BI Iteration 3Rev ADM VALUE Iteration … VALUE! DATA Iteration n VALUE! VALUE!
  52. 52. State of The DW Field Solid: Dimensional Data Warehouse Design is Mature Proven Design Patterns Exist for Common Requirements Hit or Miss: Collecting Unambiguous and Thorough Requirements Slotting Requirements into Proven Design Patterns End-User Ownership and Validation Too Often: Snatching Defeat from the Jaws of Victory !52
  53. 53. Modelstorming Quick Inclusive Data
 Modeler Interactive BI Stakeholders Fun
  54. 54. BEAM✲ Methodology Structured, non-technical, collaborative working conversation directly with BI Users BEAM✲ BI User’s Business Process, Organizational, Hierarchical, and Data Knowledge • Focused Data Profiling • Data
 Modeler BI Stakeholders • Logical and Physical (Kimball-esque) Dimensional Data Models • Example data • Detailed and Testable ETL Specification • Instantiated DW Prototype
  55. 55. Requirements = Design

×