• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data Warehouse Agility Array Conference2011
 

Data Warehouse Agility Array Conference2011

on

  • 2,158 views

Hans Hultgren Agile Data Warehousing presentation from the Array BI conference in Netherlands May 2011

Hans Hultgren Agile Data Warehousing presentation from the Array BI conference in Netherlands May 2011

Statistics

Views

Total Views
2,158
Views on SlideShare
2,157
Embed Views
1

Actions

Likes
0
Downloads
87
Comments
1

1 Embed 1

https://twitter.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Name Value Pair is actually Key=Value pair modeling, and has been around for a very long time. This technique is already applied by Hadoop, and a number of other data storage mechanisms out there.

    This is the physical data layers are changing the way the models work, however the logical data modeling still needs to be defined in these areas.

    Thanks for the great show!
    Dan Linstedt
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Data Warehouse Agility Array Conference2011 Data Warehouse Agility Array Conference2011 Presentation Transcript

    • 25568 Genesee Trail Rd Golden, Colorado 80401 (303) 526-0340 Data Vault Modeling and Approach  DW2.0 and Unstructured Data  Master Data Management and Metadata Data Warehousing Agility BI-Event May 17 Hans Hultgren© 2011 Genesee Academy, LLC 25568 Genesee Trail Rd Golden, Colorado 80401 © 2011 Genesee Academy, LLC
    • Welcome • Definition of agility • Types of agility • Discuss current approaches • Hyper-agility • Observations from the field – Also topics of operational data warehousing, operational bi, agile project management techniques, agility oriented tools, and operational integration
    • Data Warehouse Agility • Agility – The overall measure of adaptability in terms of speed & scope. – Overall performance in adapting to change. NOTE: Not warehouse machine throughput, near real time (NRT) processing, and operational DW performance… Ability of the data warehouse to adapt to change Versus Performance of an existing (steady state) warehouse
    • Data Warehouse Agility • Agility – Agile in IT • Agile Project Management • Agile Software Development – Agile Manifesto We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan That is, while there is value in the items on the right, we value the items on the left more. • Agile Modeling Driven Design (AMDD) • Test-Driven Design (TDD)
    • Data Warehouse Agility • Agility in the Data Warehouse – Agility in terms of Data Warehousing is related to the ability to build incrementally. – The approach today is more concerned with the development of a business intelligence, data warehousing program – the capability to increment (adapt and grow). – Since the business is always changing (new reporting needs, new business processes, new business units, new data sources, etc.) the EDW program is an ongoing initiative that needs to focus on adapting to these changes. – Note: distinguish between operational integration and data warehousing.
    • Types of Data Warehouse Agility Change DW New Source New Mart Data Warehouse New Attribute New Subject Area
    • Types of Data Warehouse Agility – Presentation Layer Agility – ability to adapt to new business requirements based on existing data elements in the EDW. • Bottom Line: Ability to quickly and flexibly spin off new data marts – New Data Source Agility – ability to assimilate new data sources into the EDW architecture from stage to CDW+ and existing data marts. • Bottom Line: Ability to quickly adapt to new data sources * using existing structures – New Attribute Agility – ability to absorb new attributes into the EDW architecture such that they can be loaded from the sources and integrate new attributes in terms of business context. • Bottom Line: Ability to quickly incorporate new attributes in the EDW and apply business context to these attributes – EDW Machine Agility – ability of the EDW machine (business and technical) to accommodate a new subject area from stage to mart. • Bottom Line: EDW response time; a function of people, process & tools – Changes in the DW – ability to absorb other changes such as integration logic, mappings, and business rules. Current © 2011 Genesee Academy, LLC
    • Presentation Layer Agility – Presentation Layer Agility - ability to adapt to new business requirements based on existing data elements in the EDW. • Bottom Line: Ability to quickly and flexibly spin off new data marts – In this layer, agility is measured as a function of the time it takes to design, construct and deliver a new data mart. – Variables in this layer include: • Strength of the BI team to capture requirements and define data mart. • Ability of ETL integration team to understand EDW model and mart. • Strength and repeatability of ETL processes for sourcing the EDW. • Strength and repeatability of ETL development, testing and delivery. – Constraints: • Dependent upon the existence of the data in the EDW. • Dependent upon the level of business alignment of the data in the EDW. © 2011 Genesee Academy, LLC
    • New Data Source Agility – New Data Source Agility - ability to assimilate new data sources into the EDW architecture from stage to CDW+ and existing data marts. • Bottom Line: Ability to quickly adapt to new data sources * using existing structures – In this layer, agility is measured as a function of the time it takes to design, model, build and load data into the EDW from a new source. – Variables in this layer include: • Strength of the DW team to design the required model changes. • Strength and repeatability of EDW development, testing and delivery. • Ability of ETL integration team to understand new EDW model. • Strength and repeatability of ETL processes for mapping and loading new source into the EDW. – Constraints: • Level of alignment of the new source data with the existing model. • Dependent upon the level of business alignment with the data in the EDW © 2011 Genesee Academy, LLC
    • New Attribute Technical Agility – New Attribute (Technical) Agility - ability to absorb new attributes into the EDW architecture such that they can be loaded from the sources. • Bottom Line: Ability to quickly incorporate new attributes in the EDW – In this layer, agility is measured as a function of the time it takes to design, map, add and load a new attribute from a source. – Variables in this layer include: • Strength of the DW team to design the required model changes. • Strength and repeatability of EDW development, testing and delivery. • Ability of ETL integration team to understand new EDW attribute(s). • Strength and repeatability of ETL processes for mapping and loading new source attributes into the EDW. – Constraints: • Level of alignment of the new attribute with the existing model. • Dependent upon business context being defined. © 2011 Genesee Academy, LLC
    • New Attribute Business Context – New Attribute (Business) Context Agility - ability to integrate new attributes in terms of business context. • Bottom Line: Ability to quickly apply business context to new attributes – In this layer, agility is measured as a function of the time it takes to align business context with a new attribute from a source. – Variables in this layer include: • Ability of the BI / DW team to accurately assess the business context of the new source attribute. – Constraints: • Level of alignment of the new attribute with the existing model. • Dependent upon the level of business alignment with the data in the EDW © 2011 Genesee Academy, LLC
    • EDW Machine Agility – EDW Machine Agility – ability of the EDW machine (business and technical) to accommodate a new subject area from stage to mart. • Bottom Line: EDW response time; a function of people, process & tools – In this layer, agility is measured as an overall function of the EDW machine to integrate a new subject area from stage to mart. – Variables in this layer include: • Strength of the BI / DW development team. • Strength and repeatability of EDW development, testing and delivery. • Strength and ability of ETL integration team. • Strength and repeatability of all BI / DW processes. – Constraints: • Executive sponsorship of the EDW program. • Well defined organizational structure for BIW, BICC, Architecture and Governance. © 2011 Genesee Academy, LLC
    • CURRENT APPROACHES
    • DW Agility Current Approaches – Incremental Data Warehouse Development • Data Vault modeling, 2G, Anchor, etc. – Agile BI Programs (People, Process, Models & Data) • Methodologies (Centennium, Platon, etc.) • Templates, Tools & Automation (Wherescape, etc.) – Alternate & New Paradigms for the Agile DW © 2011 Genesee Academy, LLC
    • DW Agility Components – Absorb Changes • Capture the Change • Understand the Change – A major constraint on agility is the required data warehouse modeling changes... • So we can capture the data (create the buckets) • So we can understand the data (context, meaning) – Align to business keys, classify, describe (metadata) © 2011 Genesee Academy, LLC
    • Data Warehouse Agility • Why create a Data Model for the DW? • Model Data versus Meaning? – Separate the capture of data from the meaning? – The structure of a table versus the semantics – Business meaning versus data loading – As XML is to EDI
    • HYPER AGILITY AND THE NAME VALUE PAIR (NVP)
    • Concept of Name/Value Pair Cust_ID Lname Fname Add City State Zip Bdate 121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977 123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982 139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988 119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967 120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976 Each Value or ”data item” (record value for each attribute), is provided in a List format paired with the corresponding Name or ”field name” (column header) from the normalized table structure. Moving to Name / Value Pair…
    • Concept of Name/Value Pair Name Value Cust_ID Lname Fname Add City State Zip Bdate 121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977 Cust_ID Lname Fname Add City State Zip Bdate 123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982 Cust_ID Lname Fname Add City State Zip Bdate 139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988 Cust_ID Lname Fname Add City State Zip Bdate 119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967 Cust_ID Lname Fname Add City State Zip Bdate 120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976
    • Moving to Name/Value Pair Cust_ID Lname Fname Add City State Zip Bdate 121202 Lundquist Carl 22 Bird St NYC NY 98291 10/9/1977 123335 Dahlgren Eva 7 Academy Madison NJ 07940 2/12/1982 139090 Lundberg Scott 444 7th St Tuborg MN 70098 4/22/1988 119944 Hultquist Darla 17 South Randolf PA 91121 9/22/1967 120334 Forsberg Sven 117 East A NYC NY 98292 8/19/1976 V N A A L M U E E Transpose …with column headings…
    • Name ValueCust_IDLname 121202 Lundquist Name/Value PairFname CarlAdd 22 Bird StCity NYCState NYZip 98291Bdate 10/9/1977Cust_ID 123335Lname DahlgrenFname EvaAdd 7 AcademyCity MadisonState NJZip 7940Bdate 2/12/1982Cust_ID 139090Lname LundbergFname Scott
    • Name ValueCust_ID 121202Lname LundquistFname CarlAdd 22 Bird St The concept of the ”record” is effectivelyCity NYC lost in this transformation.State NYZip 98291 Now a RECORD is a set of Name/Value PairBdate 10/9/1977 instances…Cust_ID 123335Lname Dahlgren CON Lose resolution on the record.Fname EvaAdd 7 AcademyCity MadisonState NJZip 7940Bdate 2/12/1982Cust_ID 139090Lname LundbergFname Scott
    • Name ValueCust_ID 121202Lname LundquistFname CarlAdd 22 Bird StCity NYCState NYZip 98291Bdate 10/9/1977Cust_ID 123335Lname DahlgrenFname Eva Also, the attributes are not defined inAdd 7 Academy advance – we don’t know what to expect andCity Madison we can’t check for attribute meaning,State NJ definitions, domain values or data types.Zip 7940Bdate 2/12/1982 CON Attributes are not pre-defined.Cust_ID 139090Lname LundbergFname Scott
    • Name ValueCust_ID 121202Lname LundquistFname CarlAdd 22 Bird St New attributes that are introduced into theCity NYC source feed are added instantly to the DW.State NY There is no modeling delay, no codeZip 98291 change, and no ETL impact…Bdate 10/9/1977CustClass BigCust_ID 123335 PRO Absorb new attributes instantly.Lname DahlgrenFname EvaAdd 7 AcademyCity MadisonState NJZip 7940Bdate 2/12/1982CustClass SmallCust_ID 139090
    • Hyper Agility• The solution to deal with these issues requires a further level of abstraction which in effect moves the persisted (historized, permanent, integrated) data store even further away from the business context that it is intended to represent.• The DW model – the data model itself – is then not readable (not understandable). In fact ETL professionals will also find themselves further removed from this model. To the extent that a model is intuitive, self-descriptive, and aligned with business meaning, this approach takes a step in the other direction.• Moving towards addressing these business driven agility requirements casues the model itself to move much further away (an order of magnitude away) from the business. So far as to become effectively a technical solution utilizing only abstract representations.
    • Hyper Agility• The context – the meaning of the data – will in these cases need to be managed in a different way.• This can include a form of persisted and historized metadata concerning the mappings and business rules. In effect a form of EAI within the DW.• Or it might include a more traditional secondary DW layer.
    • DW AGILITY SUMMARY • Consider specific Agility Requirements • Classify Agility Types and consider Alternatives • Distinguish between operational integration and DW • Look to modeling techniques optimized for Data Warehouse • Look at entire picture – people, process, models and data • Consider specific methodologies, templates and tools • Determine if hyper agility is a requirement
    • Questions? www.GeneseeAcademy.com CDVDM Certification Seminar June 23-24 October 27-28© 2011 Genesee Academy, LLC Hans@GeneseeAcademy.com 25568 Genesee Trail Rd USA +1 303.526.0340 Golden, Colorado 80401 Sverige 070 250 2102 28