The document introduces Data Vault modeling as an agile approach to data warehousing. It discusses how Data Vault addresses some limitations of traditional dimensional modeling by allowing for more flexible, adaptable designs. The Data Vault model consists of three simple structures - hubs, links, and satellites. Hubs contain unique business keys, links represent relationships between keys, and satellites hold descriptive attributes. This structure supports incremental development and rapid changes to meet evolving business needs in an agile manner.
This is a presentation I gave in 2006 for Bill Inmon. The presentation covers Data Vault and how it integrates with Bill Inmon's DW2.0 vision. This is focused on the business intelligence side of the house.
IF you want to use these slides, please put (C) Dan Linstedt, all rights reserved, http://LearnDataVault.com
Given at Oracle Open World 2011: Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It has been in use globally for over 10 years now but is not widely known. The purpose of this presentation is to provide an overview of the features of a Data Vault modeled EDW that distinguish it from the more traditional third normal form (3NF) or dimensional (i.e., star schema) modeling approaches used in most shops today. Topics will include dealing with evolving data requirements in an EDW (i.e., model agility), partitioning of data elements based on rate of change (and how that affects load speed and storage requirements), and where it fits in a typical Oracle EDW architecture. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
(Updated deck) As we move more and more towards the need for everyone to do Agile Data Warehousing, we need a data modeling method that can be agile with us. Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for over 10 years but is still not widely known or understood. The purpose of this presentation is to provide attendees with an introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics:
• What the basic components of a DV model are
• How to build, and design structures incrementally, without constant refactoring
DAMA, Oregon Chapter, 2012 presentation - an introduction to Data Vault modeling. I will be covering parts of the methodology, comparison and contrast of issues in general for the EDW space. Followed by a brief technical introduction of the Data Vault modeling method.
After the presentation i I will be providing a demonstration of the ETL loading layers, LIVE!
You can find more on-line training at: http://LearnDataVault.com/training
Data Vault Modeling and Methodology introduction that I provided to a Montreal event in September 2011. It covers an introduction and overview of the Data Vault components for Business Intelligence and Data Warehousing. I am Dan Linstedt, the author and inventor of Data Vault Modeling and methodology.
If you use the images anywhere in your presentations, please credit http://LearnDataVault.com as the source (me).
Thank-you kindly,
Daniel Linstedt
I gave this presentation at the Advanced Architecture Conference, Bill Inmon, 2011 in Evergreen, Colorado. This presentation covers a new breed of data warehousing called Operational Data Warehousing. These are the next steps in business intelligence towards self-service BI and enabling users to do more with their enterprise data warehouse solution. Specifically, it talks about how the Data Vault model fits in to this picture.
If you would like to use the slides, please e-mail me first, I'd be happy to discuss it with you.
Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for the last 10 years but is still not widely known or understood. The purpose of this presentation is to provide attendees with a detailed introduction to the technical components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics for how to build, and design structures when using the Data Vault modeling technique. The target audience is anyone wishing to explore implementing a Data Vault style data model for an Enterprise Data Warehouse, Operational Data Warehouse, or Dynamic Data Integration Store. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.
This is a presentation I gave in 2006 for Bill Inmon. The presentation covers Data Vault and how it integrates with Bill Inmon's DW2.0 vision. This is focused on the business intelligence side of the house.
IF you want to use these slides, please put (C) Dan Linstedt, all rights reserved, http://LearnDataVault.com
Given at Oracle Open World 2011: Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It has been in use globally for over 10 years now but is not widely known. The purpose of this presentation is to provide an overview of the features of a Data Vault modeled EDW that distinguish it from the more traditional third normal form (3NF) or dimensional (i.e., star schema) modeling approaches used in most shops today. Topics will include dealing with evolving data requirements in an EDW (i.e., model agility), partitioning of data elements based on rate of change (and how that affects load speed and storage requirements), and where it fits in a typical Oracle EDW architecture. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
(Updated deck) As we move more and more towards the need for everyone to do Agile Data Warehousing, we need a data modeling method that can be agile with us. Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for over 10 years but is still not widely known or understood. The purpose of this presentation is to provide attendees with an introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics:
• What the basic components of a DV model are
• How to build, and design structures incrementally, without constant refactoring
DAMA, Oregon Chapter, 2012 presentation - an introduction to Data Vault modeling. I will be covering parts of the methodology, comparison and contrast of issues in general for the EDW space. Followed by a brief technical introduction of the Data Vault modeling method.
After the presentation i I will be providing a demonstration of the ETL loading layers, LIVE!
You can find more on-line training at: http://LearnDataVault.com/training
Data Vault Modeling and Methodology introduction that I provided to a Montreal event in September 2011. It covers an introduction and overview of the Data Vault components for Business Intelligence and Data Warehousing. I am Dan Linstedt, the author and inventor of Data Vault Modeling and methodology.
If you use the images anywhere in your presentations, please credit http://LearnDataVault.com as the source (me).
Thank-you kindly,
Daniel Linstedt
I gave this presentation at the Advanced Architecture Conference, Bill Inmon, 2011 in Evergreen, Colorado. This presentation covers a new breed of data warehousing called Operational Data Warehousing. These are the next steps in business intelligence towards self-service BI and enabling users to do more with their enterprise data warehouse solution. Specifically, it talks about how the Data Vault model fits in to this picture.
If you would like to use the slides, please e-mail me first, I'd be happy to discuss it with you.
Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for the last 10 years but is still not widely known or understood. The purpose of this presentation is to provide attendees with a detailed introduction to the technical components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics for how to build, and design structures when using the Data Vault modeling technique. The target audience is anyone wishing to explore implementing a Data Vault style data model for an Enterprise Data Warehouse, Operational Data Warehouse, or Dynamic Data Integration Store. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.
This presenation explains basics of ETL (Extract-Transform-Load) concept in relation to such data solutions as data warehousing, data migration, or data integration. CloverETL is presented closely as an example of enterprise ETL tool. It also covers typical phases of data integration projects.
This is my presentation at SQLBits 8, Brighton, 9th April 2011. This session is about advanced dimensional modelling topics such as Fact Table Primary Key, Vertical Fact Tables, Aggregate Fact Tables, SCD Type 6, Snapshotting Transaction Fact Tables, 1 or 2 Dimensions, Dealing with Currency Rates, When to Snowflake, Dimensions with Multi Valued Attributes, Transaction-Level Dimensions, Very Large Dimensions, A Dimension With Only 1 Attribute, Rapidly Changing Dimensions, Banding Dimension Rows, Stamping Dimension Rows and Real Time Fact Table. Prerequisites: You need have a basic knowledge of dimensional modelling and relational database design.
My name is Vincent Rainardi. I am a data warehouse & BI architect. I wrote a book on SQL Server data warehousing & BI, as well as many articles on my blog, www.datawarehouse.org.uk. I welcome questions and discussions on data warehousing on vrainardi@gmail.com. Enjoy the presentation.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Data Warehouse:
A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format.
Reconciled data: detailed, current data intended to be the single, authoritative source for all decision support.
Extraction:
The Extract step covers the data extraction from the source system and makes it accessible for further processing. The main objective of the extract step is to retrieve all the required data from the source system with as little resources as possible.
Data Transformation:
Data transformation is the component of data reconcilation that converts data from the format of the source operational systems to the format of enterprise data warehouse.
Data Loading:
During the load step, it is necessary to ensure that the load is performed correctly and with as little resources as possible. The target of the Load process is often a database. In order to make the load process efficient, it is helpful to disable any constraints and indexes before the load and enable them back only after the load completes. The referential integrity needs to be maintained by ETL tool to ensure consistency.
(OTW13) Agile Data Warehousing: Introduction to Data Vault ModelingKent Graziano
This is the presentation I gave at OakTable World 2013 in San Francisco. #OTW13 was held at the Children's Creativity Museum next to the Moscone Convention Center and was in parallel with Oracle OpenWorld 2013.
The session discussed our attempts to be more agile in designing enterprise data warehouses and how the Data Vault Data Modeling technique helps in that approach.
Data Warehouse Design and Best PracticesIvo Andreev
A data warehouse is a database designed for query and analysis rather than for transaction processing. An appropriate design leads to scalable, balanced and flexible architecture that is capable to meet both present and long-term future needs. This session covers a comparison of the main data warehouse architectures together with best practices for the logical and physical design that support staging, load and querying.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
(updated slides used for North Texas DAMA meetup Oct 2018) As we move more and more towards the need for everyone to do Agile Data Warehousing, we need a data modeling method that can be agile with us. Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for over 15 years and is now growing in popularity. The purpose of this presentation is to provide attendees with an introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics:
• What the basic components of a DV model are
• How to build, and design structures incrementally, without constant refactoring
Metadata management is critical for organizations looking to understand the context, definition and lineage of key data assets. Data models play a key role in metadata management, as many of the key structural and business definitions are stored within the models themselves. Can data models replace traditional metadata solutions? Or should they integrate with larger metadata management tools & initiatives?
Join this webinar to discuss opportunities and challenges around:
How data modeling fits within a larger metadata management landscape
When can data modeling provide “just enough” metadata management
Key data modeling artifacts for metadata
Organization, Roles & Implementation Considerations
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
The process of data warehousing is undergoing rapidtransformation, giving rise to various new terminologies, especially due to theshift from the traditional ETL to the new ELT. Forsomeone new to the process, these additional terminologies and abbreviationsmight seem overwhelming, some may even ask, “Why does it matter if the L comesbefore the T?”
The answer lies in the infrastructure and the setup. Here iswhat the fuss is all about, the sequencing of the words and more importantly,why you should be shifting from ETL to ELT.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Logical Data Fabric and Data Mesh – Driving Business OutcomesDenodo
Watch full webinar here: https://buff.ly/3qgGjtA
Presented at TDWI VIRTUAL SUMMIT - Modernizing Data Management
While the technological advances of the past decade have addressed the scale of data processing and data storage, they have failed to address scale in other dimensions: proliferation of sources of data, diversity of data types and user persona, and speed of response to change. The essence of the data mesh and data fabric approaches is that it puts the customer first and focuses on outcomes instead of outputs.
In this session, Saptarshi Sengupta, Senior Director of Product Marketing at Denodo, will address key considerations and provide his insights on why some companies are succeeding with these approaches while others are not.
Watch On-Demand and Learn:
- Why a logical approach is necessary and how it aligns with data fabric and data mesh
- How some of the large enterprises are using logical data fabric and data mesh for their data and analytics needs
- Tips to create a good data management modernization roadmap for your organization
This presenation explains basics of ETL (Extract-Transform-Load) concept in relation to such data solutions as data warehousing, data migration, or data integration. CloverETL is presented closely as an example of enterprise ETL tool. It also covers typical phases of data integration projects.
This is my presentation at SQLBits 8, Brighton, 9th April 2011. This session is about advanced dimensional modelling topics such as Fact Table Primary Key, Vertical Fact Tables, Aggregate Fact Tables, SCD Type 6, Snapshotting Transaction Fact Tables, 1 or 2 Dimensions, Dealing with Currency Rates, When to Snowflake, Dimensions with Multi Valued Attributes, Transaction-Level Dimensions, Very Large Dimensions, A Dimension With Only 1 Attribute, Rapidly Changing Dimensions, Banding Dimension Rows, Stamping Dimension Rows and Real Time Fact Table. Prerequisites: You need have a basic knowledge of dimensional modelling and relational database design.
My name is Vincent Rainardi. I am a data warehouse & BI architect. I wrote a book on SQL Server data warehousing & BI, as well as many articles on my blog, www.datawarehouse.org.uk. I welcome questions and discussions on data warehousing on vrainardi@gmail.com. Enjoy the presentation.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Data Warehouse:
A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format.
Reconciled data: detailed, current data intended to be the single, authoritative source for all decision support.
Extraction:
The Extract step covers the data extraction from the source system and makes it accessible for further processing. The main objective of the extract step is to retrieve all the required data from the source system with as little resources as possible.
Data Transformation:
Data transformation is the component of data reconcilation that converts data from the format of the source operational systems to the format of enterprise data warehouse.
Data Loading:
During the load step, it is necessary to ensure that the load is performed correctly and with as little resources as possible. The target of the Load process is often a database. In order to make the load process efficient, it is helpful to disable any constraints and indexes before the load and enable them back only after the load completes. The referential integrity needs to be maintained by ETL tool to ensure consistency.
(OTW13) Agile Data Warehousing: Introduction to Data Vault ModelingKent Graziano
This is the presentation I gave at OakTable World 2013 in San Francisco. #OTW13 was held at the Children's Creativity Museum next to the Moscone Convention Center and was in parallel with Oracle OpenWorld 2013.
The session discussed our attempts to be more agile in designing enterprise data warehouses and how the Data Vault Data Modeling technique helps in that approach.
Data Warehouse Design and Best PracticesIvo Andreev
A data warehouse is a database designed for query and analysis rather than for transaction processing. An appropriate design leads to scalable, balanced and flexible architecture that is capable to meet both present and long-term future needs. This session covers a comparison of the main data warehouse architectures together with best practices for the logical and physical design that support staging, load and querying.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
(updated slides used for North Texas DAMA meetup Oct 2018) As we move more and more towards the need for everyone to do Agile Data Warehousing, we need a data modeling method that can be agile with us. Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for over 15 years and is now growing in popularity. The purpose of this presentation is to provide attendees with an introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics:
• What the basic components of a DV model are
• How to build, and design structures incrementally, without constant refactoring
Metadata management is critical for organizations looking to understand the context, definition and lineage of key data assets. Data models play a key role in metadata management, as many of the key structural and business definitions are stored within the models themselves. Can data models replace traditional metadata solutions? Or should they integrate with larger metadata management tools & initiatives?
Join this webinar to discuss opportunities and challenges around:
How data modeling fits within a larger metadata management landscape
When can data modeling provide “just enough” metadata management
Key data modeling artifacts for metadata
Organization, Roles & Implementation Considerations
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
The process of data warehousing is undergoing rapidtransformation, giving rise to various new terminologies, especially due to theshift from the traditional ETL to the new ELT. Forsomeone new to the process, these additional terminologies and abbreviationsmight seem overwhelming, some may even ask, “Why does it matter if the L comesbefore the T?”
The answer lies in the infrastructure and the setup. Here iswhat the fuss is all about, the sequencing of the words and more importantly,why you should be shifting from ETL to ELT.
Modernizing to a Cloud Data ArchitectureDatabricks
Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.
Logical Data Fabric and Data Mesh – Driving Business OutcomesDenodo
Watch full webinar here: https://buff.ly/3qgGjtA
Presented at TDWI VIRTUAL SUMMIT - Modernizing Data Management
While the technological advances of the past decade have addressed the scale of data processing and data storage, they have failed to address scale in other dimensions: proliferation of sources of data, diversity of data types and user persona, and speed of response to change. The essence of the data mesh and data fabric approaches is that it puts the customer first and focuses on outcomes instead of outputs.
In this session, Saptarshi Sengupta, Senior Director of Product Marketing at Denodo, will address key considerations and provide his insights on why some companies are succeeding with these approaches while others are not.
Watch On-Demand and Learn:
- Why a logical approach is necessary and how it aligns with data fabric and data mesh
- How some of the large enterprises are using logical data fabric and data mesh for their data and analytics needs
- Tips to create a good data management modernization roadmap for your organization
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
First we interview the users, then we design a reporting model based on those interviews. We follow that up with mounds of ETL development to load the new model, basically keeping the user community in the dark during all that development. Does this sound familiar?
This presentation will demonstrate an alternative approach using the Data Vault Data Modeling technique to build a flexible, easily-extensible “Foundation” layer in our data warehouse with an Agile, iterative methodology. Relying on the Business Model and Mapping (BMM) functionality of OBIEE, we can rapidly virtualize a dimensional reporting model using the pattern-based Data Vault Foundation layer to decrease the time, and money, it takes to get BI content in front of end users. Attendees will see a sample Data Vault model designed iteratively and deployed to the semantic model of OBIEE.
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
Two #ModernDataStack talks and one DevOps talk: https://youtu.be/4R--iLnjCmU
1. "From Data-driven Business to Business-driven Data: Hands-on #DataModelling exercise" by Jacob Frackson of Montreal Analytics
2. "Trends in the #DataEngineering Consulting Landscape" by Nadji Bessa of Infostrux Solutions
3. "Building Secure #Serverless Delivery Pipelines on #GCP" by Ugo Udokporo of Google Cloud Canada
We ran out of time for the 4th presenter, so the event will CONTINUE in March... stay tuned! Compliments of #ServerlessTO.
Watch full webinar here: https://buff.ly/2mHGaLA
What started to evolve as the most agile and real-time enterprise data fabric, data virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
• What data virtualization really is
• How it differs from other enterprise data integration technologies
• Why data virtualization is finding enterprise-wide deployment inside some of the largest organizations
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...InfluxData
In this presentation, you'll learn how InfluxDB is a component to Storj’s Tardigrade service and workflows. John Gleeson and Ben Sirb of Storj Lab will Storj’s redefinition of a cloud object storage network, how InfluxData fits into Storj’s Open Source Partner Program, and how to collect and manage high-volume, real-time telemetry data from a distributed network.
Original: Lean Data Model Storming for the Agile EnterpriseDaniel Upton
This original publication, aimed at data project leaders, describes a set of methods for agile modeling and delivery of an enterprise data warehouse, which together make it quicker to deliver, faster to load, and more easily adaptable to unexpected changes in source data, business rules or reporting/analytic requirements.
With this set of methods, the parts of data warehouse development that used to be the most resistant to sprint-sized / agile work breakdown -- data modeling and ETL -- are now completely agile, so that this tasking, too, can now be sized purely based on customer requirements, rather than the dictates of a traditional data warehouse architecture.
Data Science Operationalization: The Journey of Enterprise AIDenodo
Watch full webinar here: https://bit.ly/3kVmYJl
As we move into a world driven by AI initiatives, we find ourselves facing new and diverse challenges when it comes to operationalization. Creating a solution and putting it into practice, is certainly not the same. The challenges span various organizational and data facades. In many instances, the data scientists may be working in silos and connecting to the live data may not always be possible. But how does one guarantee their developed model in a silo is still relevant to live data? How can we manage the data flow and data access across the entire AI operationalization cycle?
Watch on-demand to explore:
- The journey and challenges of the Data Scientist
- How Denodo data virtualization with data movement streamlines operationalization
- The best practices and techniques when dealing with siloed data
- How customers have used data virtualization in their data science initiatives
Government GraphSummit: And Then There Were 15 StandardsNeo4j
Todd Pihl PhD., Technical Project Mgr. & Mark Jensen, Director of Data Managements and Interoperability, National Institute of Health, Frederick National Labs for Cancer Research
Data repositories such as NCI’s Cancer Research Data Commons receive data that use a variety of data models and vocabularies. This presents a significant obstacle to finding and using the data outside of their original purpose. In this talk we’ll show how using Neo4j allows different data models to be represented and mapped to each other, giving data managers a new way to provide harmonized data to their users.
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Denodo
Watch full webinar here: https://bit.ly/3xj6fnm
Presented at Chief Data Officer Live 2021 A/NZ
The world is changing faster than ever. And for companies to compete and succeed they need to be agile in order to respond quickly to market changes and emerging opportunities. Data plays an integral role in achieving this business agility. However, given the complex nature of the enterprise data architecture finding and analysing data is an increasingly challenging task. Data virtualization is a modern data integration technique that integrates data in real-time, without having to physically replicate it.
Watch on-demand this session to understand what data virtualization is and how it:
- Delivers data in real-time, and without replication
- Creates a logical architecture to provide a single view of truth
- Centralises the data governance and security framework
- Democratises data for faster decision making and business agility
Oracle OpenWorld London - session for Stream Analysis, time series analytics, streaming ETL, streaming pipelines, big data, kafka, apache spark, complex event processing
Speeding Time to Insight with a Modern ELT ApproachDatabricks
The availability of new tools in the modern data stack is changing the way data teams operate. Specifically, the modern data stack supports an “ELT” approach for managing data, rather than the traditional “ETL” approach. In an ELT approach, data sources are automatically loaded in a normalized state into Delta Lake and opinionated transformations happen in the data destination using dbt. This workflow allows data analysts to move more quickly from raw data to insight, while creating repeatable data pipelines robust to changes in the source datasets. In this presentation, we’ll illustrate how easy it is for even a data analytics team of one to to develop an end-to-end data pipeline. We’ll load data from GitHub into Delta Lake, then use pre-built dbt models to feed a daily Redash dashboard on sales performance by manager, and use the same transformed models to power the data science team’s predictions of future sales by segment.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
How you can gain rapid insights and create more flexibility by capturing and storing data from a variety of sources and structures into a NoSQL database.
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDATAVERSITY
The increasing focus on data in today’s organization has increased demand for critical roles such as data architect, data engineer, and data modeler. But there is often confusion and ambiguity around what these roles entail, and what overlap exists between them. This webinar will discuss these data-centric roles and their place in the data-driven organization.
"We can all agree that streaming is super cool. And for a while now, the adoption conversation has been largely led with an all-in mentality. But that’s silly. The only concerns end users have are:
-The freshness of their data
-Latency they require to meet their SLAs from source to consumption
-All while maintaining data quality and governance.
Luckily, the industry has realized this and we have seen a shift of streaming capabilities surfacing as an in-database technology, via objects as trivial to analytics engineers as views - materialized that is. With this convergence of streaming capabilities and batch level accessibility, this is when ELT tools like dbt can join in and expand out the adoption story.
dbt is the T in ELT, Extract Load and Transform. In dbt, analytics engineers design models - SQL (and occasional python) statements that encapsulate business logic. At runtime, dbt will wrap that logic in a DDL statement and send it over to the data platform to execute.
In this session, we’ll discuss how we see streaming at dbt Labs. We will dive into how we are extending dbt to support low-latency scenarios and the recent additions we have made to make batch and streaming allies in a DAG rather than archenemies."
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
Similar to Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling (20)
Balance agility and governance with #TrueDataOps and The Data CloudKent Graziano
DataOps is the application of DevOps concepts to data. The DataOps Manifesto outlines WHAT that means, similar to how the Agile Manifesto outlines the goals of the Agile Software movement. But, as the demand for data governance has increased, and the demand to do “more with less” and be more agile has put more pressure on data teams, we all need more guidance on HOW to manage all this. Seeing that need, a small group of industry thought leaders and practitioners got together and created the #TrueDataOps philosophy to describe the best way to deliver DataOps by defining the core pillars that must underpin a successful approach. Combining this approach with an agile and governed platform like Snowflake’s Data Cloud allows organizations to indeed balance these seemingly competing goals while still delivering value at scale.
Given in Montreal on 14-Dec-2021
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
HOW TO SAVE PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...Kent Graziano
A good data model, done right the first time, can save you time and money. We have all seen the charts on the increasing cost of finding a mistake/bug/error late in a software development cycle. Would you like to reduce, or even eliminate, your risk of finding one of those errors late in the game? Of course you would! Who wouldn't? Nobody plans to miss a requirement or make a bad design decision (well nobody sane anyway). No data modeler or database designer worth their salt wants to leave a model incomplete or incorrect. So what can you do to minimize the risk?
In this talk I will show you a best practice approach to developing your data models and database designs that I have been using for over 15 years. It is a simple, repeatable process for reviewing your data models. It is one that even a non-modeler could follow. I will share my checklist of what to look for and what to ask the data modeler (or yourself) to make sure you get the best possible data model. As a bonus I will share how I use SQL Developer Data Modeler (a no-cost data modeling tool) to collect the information and report it.
This talk will introduce you to the Data Cloud, how it works, and the problems it solves for companies across the globe and across industries. The Data Cloud is a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Wherever data or users live, Snowflake delivers a single and seamless experience across multiple public clouds. Snowflake’s platform is the engine that powers and provides access to the Data Cloud
Delivering Data Democratization in the Cloud with SnowflakeKent Graziano
This is a brief introduction to Snowflake Cloud Data Platform and our revolutionary architecture. It contains a discussion of some of our unique features along with some real world metrics from our global customer base.
Demystifying Data Warehousing as a Service (GLOC 2019)Kent Graziano
Extended deck from the 2019 GLOC event in Cleveland. Discusses what a DWaaS is, the top 10 features of Snowflake that represent that, and a check list for what questions to ask when choosing a cloud based data warehouse.
[Given at DAMA WI, Nov 2018] With the increasing prevalence of semi-structured data from IoT devices, web logs, and other sources, data architects and modelers have to learn how to interpret and project data from things like JSON. While the concept of loading data without upfront modeling is appealing to many, ultimately, in order to make sense of the data and use it to drive business value, we have to turn that schema-on-read data into a real schema! That means data modeling! In this session I will walk through both simple and complex JSON documents, decompose them, then turn them into a representative data model using Oracle SQL Developer Data Modeler. I will show you how they might look using both traditional 3NF and data vault styles of modeling. In this session you will:
1. See what a JSON document looks like
2. Understand how to read it
3. Learn how to convert it to a standard data model
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsKent Graziano
From a talk I gave at WWDVC and ECO in 2015 about how we built virtual dimensions (views) on a data vault-style data warehouse (see Data Warehousing in the Real World for full details on that architecture)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
This is from the talk I gave at the 30th Anniversary NoCOUG meeting in San Jose, CA.
We all know that data warehouses and best practices for them are changing dramatically today. As organizations build new data warehouses and modernize established ones, they are turning to Data Warehousing as a Service (DWaaS) in hopes of taking advantage of the performance, concurrency, simplicity, and lower cost of a SaaS solution or simply to reduce their data center footprint (and the maintenance that goes with that).
But what is a DWaaS really? How is it different from traditional on-premises data warehousing?
In this talk I will:
• Demystify DWaaS by defining it and its goals
• Discuss the real-world benefits of DWaaS
• Discuss some of the coolest features in a DWaaS solution as exemplified by the Snowflake Elastic Data Warehouse.
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSKent Graziano
(This is the talk I gave at Houston DAMA and Agile Denver BI meetups)
At a past client, in order to meet timelines to fulfill urgent, unmet reporting needs, I found it necessary to build a virtualized Operational Data Store as the first phase of a new Data Vault 2.0 project. This allowed me to deliver new objects, quickly and incrementally to the report developer so we could quickly show the business users their data. In order to limit the need for refactoring in later stages of the data warehouse development, I chose to build this virtualization layer on top of a Type 2 persistent staging layer. All of this was done using Oracle SQL Developer Data Modeler (SDDM) against (gasp!) a MS SQL Server Database. In this talk I will show you the architecture for this approach, the rationale, and then the tricks I used in SDDM to build all the stage tables and views very quickly. In the end you will see actual SQL code for a virtual ODS that can easily be translated to an Oracle database.
Agile Methods and Data Warehousing (2016 update)Kent Graziano
This presentation takes a look at the Agile Manifesto and the 12 Principles of Agile Development and discusses how these apply to Data Warehousing and Business Intelligence projects. Several examples and details from my past experience are included. Includes more details on using Data Vault as well. (I gave this presentation at OUGF14 in Helsinki, Finland and again in 2016 for TDWI Nashville.)
These are the slides from my talk at Data Day Texas 2016 (#ddtx16).
The world of data warehousing has changed! With the advent of Big Data, Streaming Data, IoT, and The Cloud, what is a modern data management professional to do? It may seem to be a very different world with different concepts, terms, and techniques. Or is it? Lots of people still talk about having a data warehouse or several data marts across their organization. But what does that really mean today in 2016? How about the Corporate Information Factory (CIF), the Data Vault, an Operational Data Store (ODS), or just star schemas? Where do they fit now (or do they)? And now we have the Extended Data Warehouse (XDW) as well. How do all these things help us bring value and data-based decisions to our organizations? Where do Big Data and the Cloud fit? Is there a coherent architecture we can define? This talk will endeavor to cut through the hype and the buzzword bingo to help you figure out what part of this is helpful. I will discuss what I have seen in the real world (working and not working!) and a bit of where I think we are going and need to go in 2016 and beyond.
Worst Practices in Data Warehouse DesignKent Graziano
This presentation was given at OakTable World 2014 (#OTW14) in San Francisco. After many years of designing data warehouses and consulting on data warehouse architectures, I have seen a lot of bad design choices by supposedly experienced professional. A sense of professionalism, confidentiality agreements, and some sense of common decency have prevented me from calling people out on some of this. No more! In this session I will walk you through a typical bad design like many I have seen. I will show you what I see when I reverse engineer a supposedly complete design and walk through what is wrong with it and discuss options to correct it. This will be a test of your knowledge of data warehouse best practices by seeing if you can recognize these worst practices.
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureKent Graziano
This presentation was given at OakTable World 2014 (#OTW14) in San Francisco as a short Ted-style 10 minute talk. In it I introduce Data Vault 2.0 and its innovative approach to doing change data capture in a data warehouse by using MD5 Hash columns.
I gave this presentation at OUGF14 in Helsinki, Finland and again for TDWI Nashville. This presentation takes a look at the Agile Manifesto and the 12 Principles of Agile Development and discusses how these apply to Data Warehousing and Business Intelligence projects. Several examples and details from my past experience are included.
Top Five Cool Features in Oracle SQL Developer Data ModelerKent Graziano
This is the presentation I gave at OUGF14 in Helsinki, Finland in June 2014.
Oracle SQL Developer Data Modeler (SDDM) has been around for a few years now and is up to version 4.x. It really is an industrial strength data modeling tool that can be used for any data modeling task you need to tackle. Over the years I have found quite a few features and utilities in the tool that I rely on to make me more efficient (and agile) in developing my models. This presentation will demonstrate at least five of these features, tips, and tricks for you. I will walk through things like modifying the delivered reporting templates, how to create and applying object naming templates, how to use a table template and transformation script to add audit columns to every table, and using the new meta data export tool and several other cool things you might not know are there. Since there will likely be patches and new releases before the conference, there is a good chance there will be some new things for me to show you as well. This might be a bit of a whirlwind demo, so get SDDM installed on your device and bring it to the session so you can follow along.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Knowledge engineering: from people to machines and back
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
1. Agile Data Warehouse Modeling:
Introduction to Data Vault Modeling
Kent Graziano
Data Warrior LLC
Twitter @KentGraziano
2. Agenda
Bio
What do we mean by Agile?
What is a Data Vault?
Where does it fit in an DW/BI architecture
How to design a Data Vault model
Being “agile”
#OUGF14
3. My Bio
Oracle ACE Director
Certified Data Vault Master and DV 2.0 Architect
Member: Boulder BI Brain Trust
Data Architecture and Data Warehouse Specialist
● 30+ years in IT
● 25+ years of Oracle-related work
● 20+ years of data warehousing experience
Co-Author of
● The Business of Data Vault Modeling
● The Data Model Resource Book (1st Edition)
Past-President of ODTUG and Rocky Mountain Oracle
User Group
#OUGF14
4. Manifesto for Agile Software Development
“We are uncovering better ways of developing
software by doing it and helping others do it.
Through this work we have come to value:
Individuals and interactions over processes and
tools
Working software over comprehensive
documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
That is, while there is value in the items on the right,
we value the items on the left more.”
http://agilemanifesto.org/
#OUGF14
5. Applying the Agile Manifesto to DW
User Stories instead of
requirements documents
Time-boxed iterations
● Iteration has a standard length
● Choose one or more user stories to fit in that
iteration
Rework is part of the game
● There are no “missed requirements”... only
those that haven’t been delivered or
discovered yet.
(C) Kent Graziano
#OUGF14
6. Data Vault Definition
The Data Vault is a detail oriented, historical tracking
and uniquely linked set of normalized tables that
support one or more functional areas of business.
It is a hybrid approach encompassing the best of
breed between 3rd normal form (3NF) and star
schema. The design is flexible, scalable, consistent
and adaptable to the needs of the enterprise.
Dan Linstedt: Defining the Data Vault
TDAN.com Article
Architected specifically to meet the needs
of today’s enterprise data warehouses
#OUGF14
7. What is Data Vault Trying to Solve?
What are our other Enterprise
Data Warehouse options?
● Third-Normal Form (3NF): Complex
primary keys (PK’s) with cascading
snapshot dates
● Star Schema (Dimensional): Difficult to
reengineer fact tables for granularity
changes
Difficult to get it right the first
time
Not adaptable to rapid
business change
NOT AGILE!
(C) Kent Graziano
#OUGF14
8. Data Vault Time Line
20001960 1970 1980 1990
E.F. Codd invented
relational modeling
Chris Date and
Hugh Darwen
Maintained and
Refined
Modeling
1976 Dr Peter Chen
Created E-R
Diagramming
Early 70’s Bill
Inmon Began
Discussing Data
Warehousing
Mid 60’s Dimension & Fact
Modeling presented by
General Mills and Dartmouth
University
Mid 70’s AC Nielsen
Popularized
Dimension & Fact Terms
Mid – Late 80’s Dr Kimball
Popularizes Star Schema
Mid 80’s Bill Inmon
Popularizes Data
Warehousing
Late 80’s – Barry
Devlin and Dr Kimball
Release “Business
Data Warehouse”
1990 – Dan Linstedt
Begins R&D on Data
Vault Modeling
2000 – Dan Linstedt
releases first 5
articles on Data Vault
Modeling
#OUGF14
9. Data Vault Evolution
The work on the Data Vault approach began in the
early 1990s, and completed around 1999.
Throughout 1999, 2000, and 2001, the Data Vault
design was tested, refined, and deployed into specific
customer sites.
In 2002, the industry thought leaders were asked to
review the architecture.
● This is when I attend my first DV seminar in Denver and met
Dan!
In 2003, Dan began teaching the modeling techniques
to the mass public.
Now in 2014, Dan introduced DV 2.0!
(C) Kent Graziano
#OUGF14
12. How to be Agile using DV
Model iteratively
● Use Data Vault data modeling technique
● Create basic components, then add over time
Virtualize the Access Layer
● Don’t waste time building facts and dimensions up front
● ETL and testing takes too long
● “Project” objects using pattern-based DV model with
database views (or BI meta layer)
Users see real reports with real data
Can always build out for performance in
another iteration
(C) Kent Graziano
#OUGF14
16. 1. Hub = Business Keys
Hubs = Unique Lists of Business Keys
Business Keys are used to
TRACK and IDENTIFY key information
New: DV 2.0 includes MD5 of the BK to
link to Hadoop/NoSQL
(C) Kent Graziano #OUGF14
17. 2: Links = Associations
Links =
Transactions and
Associations
They are used to
hook together
multiple sets of
information
In DV 2.0 the BK
attributes migrate
to the Links for
faster query
(C) Kent Graziano
#OUGF14
18. Modeling Links - 1:1 or 1:M?
Today:
● Relationship is a 1:1 so why model a Link?
Tomorrow:
● The business rule can change to a 1:M.
● You discover new data later.
With a Link in the Data Vault:
● No need to change the EDW structure.
● Existing data is fine.
● New data is added.
(C) Kent Graziano
#OUGF14
19. 3. Satellites = Descriptors
•Satellites provide
context for the
Hubs and the
Links
•Tracks changes
over time
•Like SCD 2
(C) Kent Graziano
#OUGF14
21. Data Vault Model Flexibility (Agility)
Goes beyond standard 3NF
• Hyper normalized
● Hubs and Links only hold keys and meta data
● Satellites split by rate of change and/or source
• Enables Agile data modeling
● Easy to add to model without having to change existing
structures and load routines
• Relationships (links) can be dropped and created on-demand.
● No more reloading history because of a missed requirement
Based on natural business keys
• Not system surrogate keys
• Allows for integrating data across functions and source
systems more easily
● All data relationships are key driven.
#OUGF14
22. Data Vault Extensibility
Adding new components to
the EDW has NEAR ZERO
impact to:
• Existing Loading
Processes
• Existing Data Model
• Existing Reporting & BI
Functions
• Existing Source Systems
• Existing Star Schemas
and Data Marts
(C) LearnDataVault.com #OUGF14
23. Standardized modeling rules
• Highly repeatable and learnable modeling technique
• Can standardize load routines
● Delta Driven process
● Re-startable, consistent loading patterns.
• Can standardize extract routines
● Rapid build of new or revised Data Marts
• Can be automated
‣ Can use a BI-meta layer to virtualize the reporting
structures
‣ Example: OBIEE Business Model and Mapping tool
‣ Example: BOBJ Universe Business Layer
‣ Can put views on the DV structures as well
‣ Simulate ODS/3NF or Star Schemas
Data Vault Productivity
(C) Kent Graziano
#OUGF14
24. • The Data Vault holds granular historical
relationships.
• Holds all history for all time, allowing any
source system feeds to be reconstructed on-
demand
• Easy generation of Audit Trails for data lineage
and compliance.
• Data Mining can discover new relationships
between elements
• Patterns of change emerge from the historical
pictures and linkages.
• The Data Vault can be accessed by power-users
Data Vault Adaptability
(C) Kent Graziano
#OUGF14
25. Other Benefits of a Data Vault
Modeling it as a DV forces integration of the Business Keys
upfront.
• Good for organizational alignment.
An integrated data set with raw data extends it’s value beyond BI:
• Source for data quality projects
• Source for master data
• Source for data mining
• Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture).
Upfront Hub integration simplifies the data integration routines
required to load data marts.
• Helps divide the work a bit.
It is much easier to implement security on these granular pieces.
Granular, re-startable processes enable pin-point failure
correction.
It is designed and optimized for real-time loading in its core
architecture (without any tweaks or mods).
#OUGF14
27. Worlds Smallest Data Vault
The Data Vault doesn’t have to be
“BIG”.
An Data Vault can be built
incrementally.
Reverse engineering one component
of the existing models is not
uncommon.
Building one part of the Data Vault,
then changing the marts to feed from
that vault is a best practice.
The smallest Enterprise Data
Warehouse consists of two tables:
● One Hub,
● One Satellite
Hub_Cust_Seq_ID
Hub_Cust_Num
Hub_Cust_Load_DTS
Hub_Cust_Rec_Src
Hub Customer
Hub_Cust_Seq_ID
Sat_Cust_Load_DTS
Sat_Cust_Load_End_DTS
Sat_Cust_Name
Sat_Cust_Rec_Src
Satellite Customer Name
#OUGF14
28. Notably…
In 2008 Bill Inmon stated that the “Data Vault
is the optimal approach for modeling the EDW
in the DW2.0 framework.” (DW2.0)
The number of Data Vault users in the US
surpassed 500 in 2010 and grows rapidly
(http://danlinstedt.com/about/dv-customers/)
#OUGF14
29. Organizations using Data Vault
WebMD Health Services
Anthem Blue-Cross Blue Shield
MD Anderson Cancer Center
Denver Public Schools
Independent Purchasing Cooperative (IPC, Miami)
• Owner of Subway
Kaplan
US Defense Department
Colorado Springs Utilities
State Court of Wyoming
Federal Express
US Dept. Of Agriculture
#OUGF14
32. Summary
• Data Vault provides a data
modeling technique that
allows:
‣ Model Agility
‣ Enabling rapid changes and additions
‣ Productivity
‣ Enabling low complexity systems with high
value output at a rapid pace
‣ Easy projections of dimensional models
‣ So? Agile Data Warehousing?
#OUGF14
33. Super Charge Your Data Warehouse
Available on Amazon.com
Soft Cover or Kindle Format
Now also available in PDF at
LearnDataVault.com
Hint: Kent is the Technical
Editor
#OUGF14