The document provides an introduction to a lecture on data warehousing and data warehouse architecture given by Andreas Buckenhofer from Daimler TSS, including information about the lecturer, the structure and topics to be covered in the lecture, as well as employment opportunities in data warehousing. The lecture aims to help participants understand data warehousing concepts like architectures, data modeling, ETL processes, and trends in the industry.
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
Part 4(4)
The slides contain a DWH lecture given for students in 5th semester. Content:
- Introduction DWH and Business Intelligence
- DWH architecture
- DWH project phases
- Logical DWH Data Model
- Multidimensional data modeling
- Data import strategies / data integration / ETL
- Frontend: Reporting and anaylsis, information design
- OLAP
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
Part 3(4)
The slides contain a DWH lecture given for students in 5th semester. Content:
- Introduction DWH and Business Intelligence
- DWH architecture
- DWH project phases
- Logical DWH Data Model
- Multidimensional data modeling
- Data import strategies / data integration / ETL
- Frontend: Reporting and anaylsis, information design
- OLAP
The presentation compares Data Lakes with classical DWHs. Topics like schema-on-read, schema-on-write, security, JSON, data modeling, data integration are covered.
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
Part 2(4)
The slides contain a DWH lecture given for students in 5th semester. Content:
- Introduction DWH and Business Intelligence
- DWH architecture
- DWH project phases
- Logical DWH Data Model
- Multidimensional data modeling
- Data import strategies / data integration / ETL
- Frontend: Reporting and anaylsis, information design
- OLAP
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
Part 4(4)
The slides contain a DWH lecture given for students in 5th semester. Content:
- Introduction DWH and Business Intelligence
- DWH architecture
- DWH project phases
- Logical DWH Data Model
- Multidimensional data modeling
- Data import strategies / data integration / ETL
- Frontend: Reporting and anaylsis, information design
- OLAP
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
Part 3(4)
The slides contain a DWH lecture given for students in 5th semester. Content:
- Introduction DWH and Business Intelligence
- DWH architecture
- DWH project phases
- Logical DWH Data Model
- Multidimensional data modeling
- Data import strategies / data integration / ETL
- Frontend: Reporting and anaylsis, information design
- OLAP
The presentation compares Data Lakes with classical DWHs. Topics like schema-on-read, schema-on-write, security, JSON, data modeling, data integration are covered.
Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
Part 2(4)
The slides contain a DWH lecture given for students in 5th semester. Content:
- Introduction DWH and Business Intelligence
- DWH architecture
- DWH project phases
- Logical DWH Data Model
- Multidimensional data modeling
- Data import strategies / data integration / ETL
- Frontend: Reporting and anaylsis, information design
- OLAP
Designing high performance datawarehouseUday Kothari
Just when the world of “Data 1.0” showed some signs of maturing; the “Outside In” driven demands seem to have already initiated some the disruptive changes to the data landscape. Parallel growth in volume, velocity and variety of data coupled with incessant war on finding newer insights and value from data has posed a Big Question: Is Your Data Warehouse Relevant?
In short, the surrounding changes happening real time is the new “Data 2.0”. It is characterized by feeding the ever hungry minds with sharper insights whether it is related to regulation, finance, corporate action, risk management or purely aimed at improving operational efficiencies. The source in this new “Data 2.0” has to be commensurate to the outside in demands from customers, regulators, stakeholders and business users; and hence, you would need a high relformance (relevance + performance) data warehouse which will be relevant to your business eco-system and will have the power to scale exponentially.
We starts this webinar by giving the audiences a sneak preview of what happened in the Data 1.0 world & which characteristics are shaping the new Data 2.0 world. It then delves deep on the challenges that growing data volumes have posed to the Data warehouse teams. It also presents the audiences some of the practical and proven methodologies to address these performance challenges. Finally, in the end it will highlight some of the thought provoking ways to turbo charge your data warehouse related initiatives by leveraging some of the newer technologies like Hadoop. Overall, the webinar will educate audiences with building high performance and relevant data warehouses which is capable of meeting the newer demands while significantly driving down the total cost of ownership.
This document describes the overview of SAP BusinessObjects Rapid Marts, available Rapid Mart
packages, how Rapid Mart packages helps and accelerates in Data Warehouse implementation process
Making MySQL Great For Business IntelligenceCalpont
This presentation describes how to make MySQL a great database for business intelligence, and presents a special focus on column databases and InfiniDB from Calpont
sap hana|sap hana database| Introduction to sap hanaJames L. Lee
SAP HANA, sap hana implementation scenarios, sap hana deployment scenarios, SAP HANA Implementations, sap hana implementation and modeling, sap hana implementation cost, sap hana implementation partners, Applications based on SAP HANA, SAP HANA Databases.
Designing high performance datawarehouseUday Kothari
Just when the world of “Data 1.0” showed some signs of maturing; the “Outside In” driven demands seem to have already initiated some the disruptive changes to the data landscape. Parallel growth in volume, velocity and variety of data coupled with incessant war on finding newer insights and value from data has posed a Big Question: Is Your Data Warehouse Relevant?
In short, the surrounding changes happening real time is the new “Data 2.0”. It is characterized by feeding the ever hungry minds with sharper insights whether it is related to regulation, finance, corporate action, risk management or purely aimed at improving operational efficiencies. The source in this new “Data 2.0” has to be commensurate to the outside in demands from customers, regulators, stakeholders and business users; and hence, you would need a high relformance (relevance + performance) data warehouse which will be relevant to your business eco-system and will have the power to scale exponentially.
We starts this webinar by giving the audiences a sneak preview of what happened in the Data 1.0 world & which characteristics are shaping the new Data 2.0 world. It then delves deep on the challenges that growing data volumes have posed to the Data warehouse teams. It also presents the audiences some of the practical and proven methodologies to address these performance challenges. Finally, in the end it will highlight some of the thought provoking ways to turbo charge your data warehouse related initiatives by leveraging some of the newer technologies like Hadoop. Overall, the webinar will educate audiences with building high performance and relevant data warehouses which is capable of meeting the newer demands while significantly driving down the total cost of ownership.
This document describes the overview of SAP BusinessObjects Rapid Marts, available Rapid Mart
packages, how Rapid Mart packages helps and accelerates in Data Warehouse implementation process
Making MySQL Great For Business IntelligenceCalpont
This presentation describes how to make MySQL a great database for business intelligence, and presents a special focus on column databases and InfiniDB from Calpont
sap hana|sap hana database| Introduction to sap hanaJames L. Lee
SAP HANA, sap hana implementation scenarios, sap hana deployment scenarios, SAP HANA Implementations, sap hana implementation and modeling, sap hana implementation cost, sap hana implementation partners, Applications based on SAP HANA, SAP HANA Databases.
Präsentation auf der DOAG Konferenz
Metadaten sind ein häufig vernachlässigtes Thema, da Metadaten als langweilig betrachtet oder auch nicht bewusst wahr genommen werden. Auch die eher abstrakten Beschreibungen wie "Metadaten sind Daten über Daten" sind nicht gerade hilfreich.
In der Präsentation werden die verschiedenen Arten von Metadaten (fachlich, technisch, prozessual) besprochen. Es wird darauf eingegangen, wie diese in einem Data Vault Projekt genutzt wurden, um z.B. Vorgaben festzulegen oder Code zu generieren.
Презентация компании Строй Полимер. Компания профессионально занимается гидроизоляцией полимочевиной и бесшовным утеплением пенополиуретаном (ппу).
Сайт : http://stroy-polimery.ru/
Телефон: +7(499)301-00-02
Evernote is one of the most productive tool for running business and day to day activities. Evernote note is an app designed to help you stay organized with your document and make them easily accessible for you. There are lots of things or tasks you can carry out on Evernote, ranging from adding text, image or audio to even scanning documents and files.
Evernote is a note app just like your notepads but with many more awesome features. It’s a note app that let you collect your resources in one searchable place and let you work on it on the go. Evernote is like non messy file organizer, you can organize different notes with common topics under the same categories using tags making it easy for you to access related resources on any topic you store on Evernote.
СтройПолимер - утепление ппу и гидроизоляция полимочевинойСтрой Полимер
Основной вид деятельности компании ООО «СтройПолимер» гидроизоляция и теплоизоляция помещений с использованием современных материалов: полимочевина и пенополиуретан. http://stroy-polimery.ru/
Mad Mimi is an email marketing service. It is the easiest way to create, send , share and track email newsletters online. Mad Mimi is for people who want email marketing to be simple. And it allows users of their email campaigns in a fresh novel way without using templates.
Mad Mimi is founded 2007 by Gary Levitt and developed by Tobie Langel, Dave Hoover and Jeff Patton. It was launched April 2008.
The purpose of business intelligence is to support better business decision making. BI systems provide historical, current, and predictive views of business operations, most often using data that has been gathered into a data warehouse or a data mart and occasionally working from operational data.
One of my old presentation to our management covers the following topics
History and Milestones
Traditional Data Warehouse
Key trends breaking the traditional data warehouse
Modern Data Warehouse
Multiple parallel processing (MPP) architecture
Hadoop Ecosystem
Technical Innovation on Hadoop
A First Look at San Francisco’s New ETL Job PlatformSafe Software
One of the strategies to achieve the City and County of San Francisco’s goal of increasing the number and timeliness of datasets on the city’s official open data portal (SF OpenData) is to “develop our program to automate the publication of data”. Toward that end, the team’s technical staff have designed and deployed an ETL job platform which prominently features FME technology. This talk will highlight San Francisco’s historic use of FME, the impetus for improving its ETL job platform, the design and architecture of this new platform, and some thoughts about the platform’s future. This discussion will be of most interest to those attendees whose organizations are considering whether to undertake an enterprise-level effort to automate the publication of its data to an open data portal.
I gave this presentation at the Advanced Architecture Conference, Bill Inmon, 2011 in Evergreen, Colorado. This presentation covers a new breed of data warehousing called Operational Data Warehousing. These are the next steps in business intelligence towards self-service BI and enabling users to do more with their enterprise data warehouse solution. Specifically, it talks about how the Data Vault model fits in to this picture.
If you would like to use the slides, please e-mail me first, I'd be happy to discuss it with you.
The seminar is about Data warehousing, in here we are gonna discuss about what is data warehousing, comparison b/w database and data warehouse, different data warehouse models.about Data mart, and disadvantages of data warehousing.
Retrieving and managing data effectively is crucial to gain useful information and allow for the best decision making. But how long does it take you to get the information?
First, ETL time is usually a pain point as the size of data is often huge. Second, query time is critical as well, especially for ad hoc analyses.
Say goodbye to all this waiting!
Sadas Engine was specifically designed to achieve outstanding performances in DWH environments.
Foundation for Success: How Big Data Fits in an Information ArchitectureInside Analysis
BDIA Roundtable
Live Webcast on April 9, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=c84869fcca958d278b210cfca2a023a0
Big Data can offer big value and big challenges, and there are lots of solutions and promises out there. But in order to harness the most insight from Big Data, organizations need to solve pain points with more than triage. Since data challenges continue to permeate the information landscape, businesses would do well to incorporate solutions that fit into the infrastructure and provide a sustainable method for managing and analyzing Big Data.
Register for this Roundtable Webcast to hear veteran Analysts Robin Bloor, Mike Ferguson and Richard Winter as they offer their perspectives on the evolving Big Data industry. They’ll comment on the proposed Big Data Information Architecture, and take questions from the audience. This is the second event of The Bloor Group's Interactive Research Report for 2014 which will focus on illuminating optimal Big Data Information Architectures. The series will include a dozen interviews with today's Big Data visionaries, plus three interactive Webcasts and a detailed findings report.
Visit InsideAnlaysis.com for more information.
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
Richard Vermillion, CEO of After, Inc. and Fulcrum Analytics, Inc. discusses data lakes and their value in supporting the warranty and extended service plain chain.
MammothDB is the first inexpensive enterprise analytics database, offered in the cloud or on-premises.
It's pointless to have big, or even medium sized data, if you don't have the ability to easily use and understand that data. We're making enterprise analytics accessible to every company in the world, particularly the under-served 88% of global companies that don't have enterprise analytics/business intelligence today.
I will discuss the growth of big data and the evolution of traditional enterprise models with addition of critical building blocks to handle the intense development of data in the enterprise. According to IDC approximations the size of the digital universe in 2011 will be 1.8 zettabytes. With statistics evolution beyond Moore’s Law the average enterprise will need to manage 50 times more information by the year 2020 while cumulative IT team by only 1.5 percent. With this challenge in mind, the combination of big data models into existing enterprise infrastructures is a critical element when seeing the addition of new big data building blocks while bearing in mind the efficiency.
Data warehousing has quickly evolved into a unique and popular busin.pdfapleather
Data warehousing has quickly evolved into a unique and popular business application class.
Early builders of data warehouses already consider their systems to be key components of their
IT strategy and architecture. Numerous examples can be cited of highly successful data
warehouses developed and deployed for businesses of all sizes and all types. Hardware and
software vendors have quickly developed products and services that specifically target the data
warehousing market. This paper will introduce key concepts surrounding the data warehousing
systems.
What is a data warehouse? A simple answer could be that a data warehouse is managed data
situated after and outside the operational systems. A complete definition requires discussion of
many key attributes of a data warehouse system. Later in Section 2, we will identify these key
attributes and discuss the definition they provide for a data warehouse. Section 3 briefly reviews
the activity against a data warehouse system. Initially in Section 1, however, we will take a brief
tour of the traditions of managing data after it passes through the operational systems and the
types of analysis generated from this historical data.
Evolution of an application class
This section reviews the historical management of the analysis data and the factors that have led
to the evolution of the data warehousing application class.
Traditional approaches to historical data
In reviewing the development of data warehousing, we need to begin with a review of what had
been done with the data before of evolution of data warehouses. Let us first look at how the kind
of data that ends up in today\'s data warehouses had been managed historically.
Throughout the history of systems development, the primary emphasis had been given to the
operational systems and the data they process. It is not practical to keep data in the operational
systems indefinitely; and only as an afterthought was a structure designed for archiving the data
that the operational system has processed. The fundamental requirements of the operational and
analysis systems are different: the operational systems need performance, whereas the analysis
systems need flexibility and broad scope. It has rarely been acceptable to have business analysis
interfere with and degrade performance of the operational systems.
Data from legacy systems
In the 1970s virtually all business system development was done on the IBM mainframe
computers using tools such as Cobol, CICS, IMS, DB2, etc. The 1980s brought in the new mini-
computer platforms such as AS/400 and VAX/VMS. The late eighties and early nineties made
UNIX a popular server platform with the introduction of client/server architecture.
Despite all the changes in the platforms, architectures, tools, and technologies, a remarkably
large number of business applications continue to run in the mainframe environment of the
1970s. By some estimates, more than 70 percent of business data for large corporations still
resi.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Introduction to Data Warehousing: Introduction, Necessity, Framework
of the datawarehouse, options, developing datawarehouses, end points.
Data Warehousing Design Consideration and Dimensional Modeling:
Defining Dimensional Model, Granularity of Facts, Additivity of Facts,
Functional dependency of the Data, Helper Tables, Implementation manyto-
many relationships between fact and dimensional modelling.
IT + Line of Business - Driving Faster, Deeper Insights TogetherDATAVERSITY
Marketo helps customers master the science of digital marketing with the analytics it provides customers. Internally, Marketo found itself afflicted with “Excel mania” and suffering from the side effects that come with it, including slow time to insights and hours lost on mundane but critical data prep. This quickly changed when they bet their BI strategy on Alteryx, Amazon Web Services (AWS), and Tableau.
Join us and hear from Tim Chandler, head of BI and data solutions, and learn how:
the stack is enabling more efficient analytics processes, as well as providing governance and scalability
IT and line of business (LOB) are effectively working together to uncover more insights, faster – saving time and resources in the process
an enterprise-class data architecture is driving business engagement and dashboard adoption across the entire company
Register now to learn how you can improve your analytics processes - leading to faster, deeper insights.
Workshop on "Data Management - The Foundation of all Analytics" given by John Aidoo, Data Analytics Manager at Central Insurance Company, Van Wert, Ohio.
Similar to Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW) (20)
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
3. DAIMLER TSS. IT EXCELLENCE: COMPREHENSIVE, INNOVATIVE, CLOSE.
We're a specialist and strategic business partner for innovative IT Solutions within Daimler –
not just another supplier!
As a 100% subsidiary of Daimler, we live the culture of excellence and aspire to take an
innovative and technological lead.
With our outstanding technological and methodical competence we are a competent provider
of services that help those who benefit from them to stand out from the competition. When it
comes to demanding IT questions we create impetus, especially in the core fields car IT and
mobility, information security, analytics, shared services and digital customer experience.
Data Warehouse / DHBWDaimler TSS GmbH 3
TSS 2 0 2 0 ALWAYS ON THE MOVE.
4. Daimler TSS GmbH 4
LOCATIONS
Data Warehouse / DHBW
Daimler TSS China
Hub Beijing
6 Employees
Daimler TSS Malaysia
Hub Kuala Lumpur
38 Employees
Daimler TSS India
Hub Bangalore
16 Employees
Daimler TSS Germany
6 Locations
More than1000 Employees
Ulm (Headquarters)
Stuttgart Area
Böblingen, Echterdingen,
Leinfelden, Möhringen
Berlin
5. This lecture is about the classical DWH
• 6 sessions
Mr. Bollinger’s lecture is about Big Data and Data Mining
DWH, BIG DATA, DATA MINING
Data Warehouse / DHBWDaimler TSS 5
6. • Describe different DWH architectures
• Explain DWH data modeling methods and design logical models
• Name DB techniques that are well-suited for DWHs
• Explain ETL processes
• Specify reporting & project management & meta data requirements
• Name current DWH trends
DWH LECTURE - LEARNING TARGETS
Data Warehouse / DHBWDaimler TSS 6
7. • 16.02.2017Introduction to Data Warehouse
• 23.02.2017DWH Architectures, Data Modeling
• 02.03.2017Data Modeling, OLAP
• 09.03.2017OLAP, ETL
• 16.03.2017ETL, Metadata, DWH Projects
• 23.03.2017DWH Projects, Advanced Topics
OVERVIEW OF THE LECTURE
Data Warehouse / DHBWDaimler TSS 7
8. Structure of the lecture
• Review of the preceding lecture
• Presentation of content
• Group tasks, exercises
15:30 – 17:45
• 1x15min break
ABOUT THIS LECTURE
Data Warehouse / DHBWDaimler TSS 8
9. Data Warehousing is a major topic of computer science
After the end of this lecture you will be able to
• Understand the basic business and technology drivers for data warehousing
• Describe the characteristics of a data warehouse
• Describe the differences between production and data warehouse systems
• Understand logical standard DWH architecture
• Describe different layers and their meaning
• Describe advantages and disadvantages of further DWH architectures
WHAT YOU WILL LEARN TODAY
Data Warehouse / DHBWDaimler TSS 9
10. DWH department in every (bigger) end user company, also in many medium-
sized or small-sized companies
DWH department in every (bigger) consulting company
DWH-only specialized consulting companies
DWH tool vendors
MANY EMPLOYMENT OPPORTUNITIES
Data Warehouse / DHBWDaimler TSS 10
11. DWHs are complex, much more complex compared to most OLTP systems
Challenging job profiles with comprehensive requirements
• Data Architecture
• Data Integration / ETL
• Data Modeling (not only 3NF)
• Data Visualization
• Data Quality
• Data Security
• Requirements Engineering
• Project Management
MANY EMPLOYMENT OPPORTUNITIES – CHALLENGING JOB
REQUIREMENTS
Data Warehouse / DHBWDaimler TSS 11
15. Often used as synonym
DWH more technical focus
BI more business / process focus
• “Business intelligence is a set of methodologies, processes, architectures, and
technologies that transform raw data into meaningful and useful information used to
enable more effective strategic, tactical, and operational insights and decision
making.” (Boris Evelson, Forrester Research, 2008)
DATA WAREHOUSE (DWH) OR
BUSINESS INTELLIGENCE (BI)?
Data Warehouse / DHBWDaimler TSS 15
16. Many systems throughout the enterprises for dedicated purposes
• Support daily transactions / day-to-day business
• Target: replace manual and time consuming activities
Data embedded in process-specific application
• Process-orientation + dedicated purpose
Customer data, order data, etc. spread over many systems in many
variations and with contradictions
INFORMATION TECHNOLOGY (1960’IES – 80‘IES)
Data Warehouse / DHBWDaimler TSS 16
17. Flight Reservation System
Planes
SAMPLE APPLICATIONS FOR AN AIRLINE
Data Warehouse / DHBWDaimler TSS 17
Airline Frequent Flyer System
Internal Human Ressources System
Inventory Purchasing Systems
Operational Planning
Maintenance Tracking
Billing System
CRM System, e.g. campaigns
Customer data
Customer data
Customer data
Customer data
Planes
Planes PlanesCrews
Crews
SeatsFood / Drinks
Seats
Seats
18. Flight Reservation System
Planes
NEED FOR DECISION SUPPORT SYSTEM / MANAGEMENT
INFORMATION SYSTEM
Data Warehouse / DHBWDaimler TSS 18
Airline Frequent Flyer System
Internal Human Ressources System
Inventory Purchasing Systems
Operational Planning
Maintenance Tracking
Billing System
CRM System, e.g. campaigns
Customer data
Customer data
Customer data
Customer data
Planes
Planes PlanesCrews
Crews
SeatsFood / Drinks
Seats
Seats
DCS / MIS
19. Can be characterized as “Unplanned decision support” or “Unplanned
Management Information Systems (MIS)”
• Management needs reports / combined data from different systems to make decisions
for company
• Reports are manually written by IT people
• Extract, combine, accumulate data
• Can take several days to write report and to get the data
• Error prone and labour-intensive
• Relevant information may be forgotten or combined in a wrong way
Did not really work
EARLY DECISION SUPPORT SYSTEMS (1960’IES – 80‘IES)
Data Warehouse / DHBWDaimler TSS 19
20. Data still spread across many applications, but additional requirements
Data as Asset, getting more and more important also in production
industries
• Not only classical data-intensive companies like Google or Facebook
• Increasing interest e.g. in insurance, health care, automotive, …
• Connected cars, Smart Home, Tailor-made insurances, etc.
Hype technologies
• New databases technologies like NoSQL and Big Data
DWH still booming with additional stimuli coming from Big Data, Digitization,
Internet Of Things IOT, Industry 4.0, Real Time, Time To Market, etc.
INFORMATION TECHNOLOGY TODAY
FURTHER REQUIREMENTS
Data Warehouse / DHBWDaimler TSS 20
21. Outline the at least 5 operational systems for a vehicle manufacturer
• which data is stored by these systems
• characterize which operations are performed by them
• which questions can be answered by these systems (and which questions
can not be answered = major problems for decision support)
EXERCISE – OLTP SYSTEMS
Data Warehouse / DHBWDaimler TSS 21
22. SAMPLE OLTP SYSTEMS
Data Warehouse / DHBWDaimler TSS 22
Vehicle production
Vehicle
Plant
Worker
Robot
Car rentals
Driver
Booking
Vehicle
Route
Parts Logistics
Part
Plant
Supplier
Route
Financial Services
Credit
Customer
Bank
account
Workshop
Repair data
Parts
Vehicle
Diagnostic
data
Vehicle Sales
Customer
Seller
Vehicle
Production
date
23. SAMPLE OLTP SYSTEMS
Data Warehouse / DHBWDaimler TSS 23
Truck fleet management
Truck
Route
Driver
Engineering, Research and development
Engineer
Prototype
Vehicle
Tests
Website and Car configurator
Vehicle
CRM Lead
Interior
etc
…
…
…
…
24. How to get an overall view
across OLTP applications / functions that works?
CHALLENGE
Data Warehouse / DHBWDaimler TSS 24
25. Distributed data
Different data structures
Historic data
System workload
Inadequate technology
MAJOR PROBLEMS FOR EFFECTIVE DECISION SUPPORT
Data Warehouse / DHBWDaimler TSS 25
26. Problem: Data resides on
• different systems / storages
• different applications
• different technologies
Solution: Data has to be accumulated on one system for further analysis
• Data is inhomogeneous, e.g. each system has their own customer number or order
number, etc.
• How to combine the data?
• Data must be ingested regularly, e.g. daily and not ad-hoc
DISTRIBUTED DATA
Data Warehouse / DHBWDaimler TSS 26
27. Problem: Systems developed independently from each other
• Different data types
• E.g.: zip-code as integer or character string
• Different encodings
• E.g.: kilometer or miles
• Different data modeling
• E.g.: last name / first name in different fields vs last name / first name (badly
modelled) in one single field
Solution: Dedicated system required that harmonizes / standardizes the data
DIFFERENT DATA STRUCTURES
Data Warehouse / DHBWDaimler TSS 27
28. Problem: Data is updated and deleted or archived after max. 3 months
• daily transactions produce lots of data
• limited size of storage high amounts of data fill up systems
Historic data is required for decision support
• e.g. how did sales figures develop compared to last month / last year /
etc.
Solution: All data (changes) have to be stored in a system capable of dealing
with huge amounts of data
ISSUES WITH HISTORIC DATA
Data Warehouse / DHBWDaimler TSS 28
29. Problem: Performance not optimized for new workloads
• Systems stressed by additional load (due to reports)
• Not optimized for this kind of workload
• Performance of daily transaction business jeopardized
• May possibly lead to system failure!
• Imagine what happens if a system like Amazon gets slow
Solution: Dedicated system that handles complex (arithmetic) queries on
huge amounts of data. A system that is optimized for that kind of workloads.
ISSUES WITH SYSTEM WORKLOAD
Data Warehouse / DHBWDaimler TSS 29
30. Problem: Tooling and technology different from OLTP
Inadequate tools for data integration and analysis
Infrastructure configured for OLTP transactions and not for DWH load
Storage systems and processors to weak to fulfill the requirements
Solution: Standard Tools and technology that help to increase productivity
and solve such problems, e.g. Reporting Tools for Data Analysis or ETL tools
for Data ingestion/load
INADEQUATE TECHNOLOGY
Data Warehouse / DHBWDaimler TSS 30
33. How to get an overall view
across OLTP applications / functions that works?
CHALLENGE
Data Warehouse / DHBWDaimler TSS 33
34. Operative systems not suitable for analytical evaluations
Need for a new, separated system
• fast answers, ad-hoc questions possible
• no interference with daily transaction business
Data Warehouse
CONCLUSION
Data Warehouse / DHBWDaimler TSS 34
35. List possible (functional and non-functional) requirements for a data
warehouse end-user. Think of deficiencies of transactional systems like
• Distributed data
• Different data structures
• Problem with historic data
• Problem with system workload
• Inadequate technology
What are requirements from a Data Warehouse user perspective? (List at
least 5 requirements)
EXERCISE
Data Warehouse / DHBWDaimler TSS 35
36. Wants to access and analyze all data in a single database and not across
applications
Wants to get a complete analysis including history, e.g. where did the
customer live 5 years ago or how did bookings develop the last 10 days?
Wants fast data access for his queries
Wants to understand the data model = one single and easy data model and
not many different applications
Wants to browse through combined data sets to identify correlations or new
insights
DATA WAREHOUSE USER
Data Warehouse / DHBWDaimler TSS 36
37. Contains data from different systems
Imports data from different systems on a regular basis
• detailed data and summarized data
• provide historic data
• generate metadata
OLTP applications remain, DWH is a completely new system
Overcomes difficulties when using existing transaction systems for those
tasks
DATA WAREHOUSE
Data Warehouse / DHBWDaimler TSS 37
38. Not a product, but a overall concept
Applications come, applications go. The data, however, lives forever. It is not
about building applications; it really is about the data underneath these
application (Tom Kyte)
DATA WAREHOUSE
Data Warehouse / DHBWDaimler TSS 38
39. HIGH-LEVEL DATA WAREHOUSE ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 39
Staging
OLTP
OLTP
OLTP
Core
Warehouse
Mart
Data Warehouse
40. DATA WAREHOUSE DEFINITIONS
BY TWO “FATHERS” OF THE DWH
Data Warehouse / DHBWDaimler TSS 40
Ralph Kimball William Harvey „Bill“ Inmon
„A data warehouse is a copy of
transaction data specifically
structured for querying and
reporting“
“A data warehouse is a subject-
oriented, integrated, time-
variant, nonvolatile collection of
data in support of
management’s
decision-making
process”
41. A data warehouse is organized around the major subjects (business keys) of
the enterprise like
• Customer
• Vendor
• Car
• Transaction or activity
In contrast to the process/functional orientation of applications such as
• Offer
• Booking
• Delivery
SUBJECT-ORIENTED
Data Warehouse / DHBWDaimler TSS 41
42. DWHOLTP
SUBJECT-ORIENTED - EXAMPLE
Data Warehouse / DHBWDaimler TSS 42
Flight Reservation System
Passengers
Bookings
Flight Operation System
Crews
Planes
Planes
Airline Frequent Flyer System
Customer
Points
Customer Planes
Marketing:
Which are
popular
destinations,
e.g. Paris and
make the
customer an
exclusive offer.
Planning:
How many flight
kilometers and
flight times do
planes have.
When does a
plane need
maintenance?
Capacity planning:
What is a forecasted
passenger demand for flights
to London? Is a larger plane
required on the route?
43. Data contained in the warehouse are integrated.
Aspects of integration
• consistent naming conventions
• consistent measurement of variables
• consistent encoding structures
• consistent physical attributes of data
INTEGRATED
Data Warehouse / DHBWDaimler TSS 43
44. INTEGRATED - EXAMPLE
Data Warehouse / DHBWDaimler TSS 44
OLTP DWH
System1: m,w
System2: male, female
System3: 1, 0
m,w
System1: John Brown
System2: Brown, J.
System3: Brown, Jo
John Brown
System1: Varchar(5)
System2: Number(8)
System3: Char(12)
Varchar(12)
45. Operations in operational environment
• Insert
• Delete
• Update
• Select
Operations in a data warehouse
• Insert: the initial and additional loading of data by (batch) processes
• Select: the access of data
• (almost) no updates and deletes (technical updates / deletes only)
NONVOLATILE
Data Warehouse / DHBWDaimler TSS 45
46. OLTP
NONVOLATILE - EXAMPLE
Data Warehouse / DHBWDaimler TSS 46
Flight Reservation System
Passenger John flies from
Stuttgart to London on 15.02
at 06:00
Insert into DB:
Passenger John, From Stuttgart to London,
15.02. 06:00
Passenger John changes his
mind and flies at 10:00
Update in DB:
Passenger John, 15.02. 10:00
DWH
Insert into DB:
Passenger John, From Stuttgart to London,
15.02. 06:00
Insert into DB:
Passenger John, From Stuttgart to London,
15.02. 10:00
47. NONVOLATILE - EXAMPLE
Data Warehouse / DHBWDaimler TSS 47
What happens in the OLTP system if the customer cancels his booking?
• Delete operation in OLTP
• Seat gets available again and can be sold to another passenger
What happens in the DWH?
• Insert operation in DWH with e.g. a flag indicating that the customer
cancelled/deleted his booking
• Business can make analysis about cancelled booking: why might the
customer have cancelled? How to prevent the customer or other customers
to cancel next time?
48. All data in the data warehouse is accurate as of some moment in time
• Has to be associated with a time stamp
Once data is correctly recorded in the data warehouse, it cannot be updated
or deleted
• Data warehouse data is, for all practical purposes, a long series of snapshots
In the operational environment data is accurate as of the moment of access
Operational data, being accurate as of the moment of access, can be
updated as the need arises
TIME-VARIANT
Data Warehouse / DHBWDaimler TSS 48
49. TIME-VARIANT - EXAMPLE
Data Warehouse / DHBWDaimler TSS 49
DWH
Insert into DB:
Passenger John, From Stuttgart to London,
15.02. 06:00
Insert into DB:
Passenger John, From Stuttgart to London,
15.02. 10:00
Insert into DB:
Passenger Jim, From Hamburg to Munich,
18.02. 15:00
DB insert timestamp: 02.02. 15:03:21
DB insert timestamp: 02.02. 15:04:29
DB insert timestamp: 05.02. 12:15:03
Insert into DB:
Passenger Mike, From Hamburg to Munich,
15.02. 10:00
DB insert timestamp: 05.02. 12:15:11
Insert into DB:
Passenger John, From Stuttgart to London,
15.02. 10:00, Cancel Flag
DB insert timestamp: 08.02. 09:52:33
50. You outlined OLTP systems for a vehicle manufacturer in an earlier exercise.
Now start designing a Data Warehouse:
• Describe what data can be stored in it. Define at least 5 subject-areas!
• Which questions can/should be answered with this information
EXERCISE - DWH
Data Warehouse / DHBWDaimler TSS 50
51. DWH – SUBJECT AREAS
Data Warehouse / DHBWDaimler TSS 51
Customer
Driver
Bank
account
CRM Lead
Individual
or
company?
Part
Supplier
Color
Partnumber
Description
Vehicle
Truck
Prototype
Car
Car Rental
GPS data
Rental start
time
Bill
Rental end
time
Formula-1
car
Plant
Robots
Cars built
Location
52. Which customers own a car and use car rental regularly?
Which parts have the most defects? Can diagnostic data be used to predict
potential defects and warn customers?
Which areas and times are popular for car rentals? Does it make sense to
relocate cars to these areas? (e.g. cinema in the evening/night)
EXERCISE – SAMPLE QUESTIONS
Data Warehouse / DHBWDaimler TSS 52
53. OLTP VS OLAP
Data Warehouse / DHBWDaimler TSS 53
Online Transaction Processing Online Analytical Processing
Transaction-oriented system Query-oriented system
Optimized for insert and update consistency Optimized for complex queries with short
response times; ad-hoc queries
Many users change data Only ETL process writes data
Selective queries on the data Evaluations of all data including history
(complex queries)
Avoid redundancy Redundant data storage
Normalized data management 3NF De-normalized data management
Relational Data Modeling Several layers with different data models, one
model usually Dimensional Data Modeling
54. OPERATIVE VS INTEGRATED DATA
Data Warehouse / DHBWDaimler TSS 54
Operative data Integrated data
Handling Structured, parallel processes with
short and isolated ("atomic")
transactions
Information for management (decision
support)
Modeling Process- and function oriented,
individual for each application
Different data models in one DWH;
historic, stable and summarized, data
# users Many Few(er) but increasing user base
System return time Milliseconds Seconds to minutes (even hours)
55. OPERATIVE VS ANALYTICAL DATABASES
Data Warehouse / DHBWDaimler TSS 55
Operative DBs Analytical DBs
Purpose Processing of daily business
transactions
Information for management (decision
support)
Content Detailed, complete, most recent
data
Historic, stable and summarized data
Data amount Small amount of data per
transaction. Nested Loop Joins
Large amount of data for load, and
often per query. Hash Joins common
Data structure Suitable for operational
transactions
Several models; suitable for long term
storage and business analyses
Transactions ACID; very short read/write
transactions
Long load operations, longer read
transactions
56. Which challenges could not be solved by OLTP? Why is a DWH necessary?
• Integrated view, distributed data, historic data, technological challenges,
system workload, different data structures
Name two “fathers” of the DWH
• Bill Inmon and Ralph Kimball
Which characteristics does a DWH have according to Bill Inmon?
• Subject-oriented, integrated, non-volatile, time-variant
SUMMARY
Data Warehouse / DHBWDaimler TSS 56
58. • Specific implementation can follow an architecture
• Architecture describes an ideal type. Therefore an implementation may not use
all components or can combine components
• Better understanding, overview and complexity reduction by
decomposing a DWH into its components
• Can be used in many projects: repeatable, standardizable
• Map DWH tools into the different components and compare functionality
• Functional oriented as it describes data and control flow
PURPOSE: WHY ARE DWH ARCHITECTURES USEFUL?
Data Warehouse / DHBWDaimler TSS 58
59. Apple: multiple Petabytes
• Customer insights: who’s who and what are the customers up to
Walmart: 300TB (2003), several PB today
• It tells suppliers, “You have three feet of shelf space. Optimize it.”
eBay: >10PB, 100s of production DBs fed in
• Get better understanding of customers
Most DWHs are much smaller though. For huge and small DWHs: High
challenges to architect + develop + maintain + run such complex systems
https://gigaom.com/2013/03/27/why-apple-ebay-and-walmart-have-some-of-the-biggest-data-warehouses-youve-ever-seen/ and http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/
EXAMPLES OF DATA WAREHOUSES IN THE INDUSTRY
Data Warehouse / DHBWDaimler TSS 59
60. LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 60
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging
Layer
(Input
Layer)
OLTP
OLTP
Core
Warehouse
Layer
(Storage
Layer)
Mart Layer
(Output
Layer)
(Reporting
Layer)
Integration
Layer
(Cleansing
Layer)
Aggregation
Layer
Metadata Management
Security
DWH Manager incl. Monitor
61. • Providing internal and external data out of the source systems
• Enabling data through Push (source is generating extracts) or Pull (BI Data
Backend is requesting or directly accessing data)
• Example for Push practice (deliver csv or text data through file interface; Change
Data Capture (CDC))
• Example for Pull practice (direct access to the source system via ODBC, JDBC, API
and so on)
DATA SOURCES
Data Warehouse / DHBWDaimler TSS 61
62. • “Landing Zone” for data coming into a DWH
• Purpose is to increase speed into DWH and decouple source and target
system (repeating extraction run, additional delivery)
• Granular data (no pre-aggregation or filtering in the Data Source Layer, i.e.
the source system)
• Usually not persistent, therefore regular housekeeping is necessary (for
instance delete data in this layer that is few days/weeks old or – more
common - if a correct upload to Core Warehouse Layer is ensured)
• Tables have no referential integrity constraints, columns often varchar
only
STAGING LAYER
Data Warehouse / DHBWDaimler TSS 62
63. • Business Rules, harmonization and standardization of data
• Classical Layer for transformations: ETL = Extract – TRANSFORM – Load
• Fixing data quality issues
• Usually not persistent, therefore regular housekeeping is necessary (for
instance after a few days or weeks or at the latest once a correct upload
to Core Warehouse Layer is ensured)
• The component is often not required or often not a physical part of a DB
INTEGRATION LAYER
Data Warehouse / DHBWDaimler TSS 63
64. • Data storage in an integrated, consolidated, consistent and non-
redundant (normalized) data model
• Contains enterprise-wide data organized around multiple subject-areas
• Application / Reporting neutral data storage on the most detailed level of
granularity (incl. historic data)
• Size of database can be several TB and can grow rapidly due to data
historization
CORE WAREHOUSE LAYER
Data Warehouse / DHBWDaimler TSS 64
65. • Preparing data for the Data Mart Layer to the required granularity
• E.g. Aggregating daily data to monthly summaries
• E.g. Filtering data (just last 2 years or just data for a specific region)
• Harmonize computation of key performance indicators (measures) and
additional Business Rules
• The component is often not required or often not a physical part of a DB
AGGREGATION LAYER
Data Warehouse / DHBWDaimler TSS 65
66. • Data is stored in a denormalized data model for performance reasons and
better end user usability/understanding
• The Data Mart Layer is providing typically aggregated data or data with
less history (e.g. latest years only) in a denormalized data model
• Created through filtering or aggregating the Core Warehouse Layer
• One Mart ideally represents one subject area
• Technically the Data Mart Layer can also be a part of an Analytical
Frontend product (such as Qlik, Tableau, or IBM Cognos TM1) and need
not to be stored in a relational database
DATA MART LAYER
Data Warehouse / DHBWDaimler TSS 66
67. • Metadata Management
• “Data about Data”, separate lecture
• Security
• Not all users are allowed to see all data
• Data security classification (e.g. restricted, confidential, secret)
• DWH Manager incl. Monitor
• DWH Manager initiates, controls, and checks job execution
• Monitor identifies changes/new data from source systems, separate lecture
METADATA MANAGEMENT, SECURITY, MONITOR
Data Warehouse / DHBWDaimler TSS 67
68. The article
http://www.kimballgroup.com/2004/03/differences-of-opinion/
compares THE two classic DWH architectures.
Read the paper and complete the table / questions on the next slide.
(Caution: The paper is biased / favors one approach; you may want to read
other/more papers for a neutral view.)
EXERCISE: CLASSICAL DWH ARCHITECTURES
Data Warehouse / DHBWDaimler TSS 68
69. EXERCISE: CLASSICAL DWH ARCHITECTURES
Data Warehouse / DHBWDaimler TSS 69
How are the approaches called?
Who “invented” the approach?
How many layers are used and how
are the layers called?
Which data modeling approaches
are used in which layer?
In which layer are atomic detail data
stored?
In which layer are aggregated /
summary data stored?
List at least 2 advantages
List at least 2 disadvantages
70. EXERCISE: CLASSICAL DWH ARCHITECTURES
Data Warehouse / DHBWDaimler TSS 70
How are the approaches
called?
Kimball Bus Architecture Corporate Information Factory
Who “invented” the
approach?
• Ralph Kimball • Bill Inmon
How many layers are used
and how are the layers
called?
• Data Staging
• Dimensional Data Warehouse
• Data Acquisition
• Normalized Data Warehouse
• Data Delivery / Dimensional Mart
Which data modeling
approaches are used in which
layer?
• Data Staging: variable,
corresponds to source system
• Dimensional Data Warehouse:
Dimensional Model
• Data Acquisition: variable, corresponds
to source system
• Normalized Data Warehouse: 3NF
• Data Delivery: Dimensional Model
In which layer are atomic
detail data stored?
• Dimensional Data Warehouse • Normalized Data Warehouse
In which layer are aggregated
/ summary data stored?
• Dimensional Data Warehouse • Data Delivery / Dimensional Mart
71. EXERCISE: CLASSICAL DWH ARCHITECTURES
Data Warehouse / DHBWDaimler TSS 71
Kimball Bus Architecture Corporate Information Factory
Advantages • Two layers only mean faster development
and less work
• Rather simple approach to make data fast
and easily accessible
• Lower startup costs (but higher
subsequent development costs)
• Separation of concerns: long-term enterprise
data storage separated from data presentation
• Changes in requirements and scope are easier
to manage
• Lower subsequent development costs (but
higher startup costs)
Disadvantages • If table structures change (instable source
systems), high effort to implement the
changes and reload data, especially
conformed dimensions (“Dimensionitis”
desease)
• Non-metric data not optimal for
dimensional model
• Dimensional model (esp. Star Schema)
contains data redundancy
• Data model transformations from 3NF to
Dimensional model required
• More complex as two different data models
are required
• Larger team(s) of specialists required
72. • Kimball Bus Architecture (Central data warehouse based on data marts)
• Inmon Corporate Information Factory
• Data Vault 2.0 Architecture (Dan Linstedt)
• DW 2.0: The Architecture for the Next Generation of Data Warehousing
• Virtual Data Warehouse
• Operational Data Store (ODS)
OTHER ARCHITECTURES
Data Warehouse / DHBWDaimler TSS 72
73. KIMBALL BUS ARCHITECTURE (CENTRAL DATA WAREHOUSE
BASED ON DATA MARTS)
Data Warehouse / DHBWDaimler TSS 73
Source: http://www.kimballgroup.com/2004/03/differences-of-opinion/
74. KIMBALL BUS ARCHITECTURE (CENTRAL DATA WAREHOUSE
BASED ON DATA MARTS)
Data Warehouse / DHBWDaimler TSS 74
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging Layer
(Input Layer)
OLTP
OLTP
Core Warehouse Layer
= Mart Layer
Data Mart 1
Data Mart 2
Data Mart 3
Metadata Management
Security
DWH Manager incl. Monitor
More Business-
process oriented
than subject-
oriented,
integrated,
time-variant,
non-volatile
75. • Bottom-up approach
• Dimensional model with denormalized data
• Sum of the data marts constitute the Enterprise DWH
• Enterprise Service Bus / conformed dimensions for integration purposes
• (don’t confuse with ESB as middleware/communication system between applications)
• Kimball describes that agreeing on conformed dimensions is a hard job
and it’s expected that the team will get stuck from time to time trying to
align the incompatible original vocabularies of different groups
• Data marts need to be redesigned if incompatibilities exist
KIMBALL BUS ARCHITECTURE (CENTRAL DATA WAREHOUSE
BASED ON DATA MARTS)
Data Warehouse / DHBWDaimler TSS 75
76. INMON CORPORATE INFORMATION FACTORY
Data Warehouse / DHBWDaimler TSS 76
Source: http://www.kimballgroup.com/2004/03/differences-of-opinion/
77. INMON CORPORATE INFORMATION FACTORY
Data Warehouse / DHBWDaimler TSS 77
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging Layer
(Input Layer)
OLTP
OLTP
Core
Warehouse
Layer
(Storage
Layer)
Mart Layer
(Output Layer)
(Reporting Layer)
Metadata Management
Security
DWH Manager incl. Monitor
subject-
oriented,
integrated,
time-
variant,
non-
volatile
78. • Top-down approach
• (Normalized) Core Warehouse is essential for subject-oriented,
integrated, time-variant and nonvolatile data storage
• Create (departmental) Data Marts as subsets of Core Enterprise DWH as
needed
• Data Marts can be designed with Dimensional model
• The logical standard architecture is more general compared to CIF, but
was mainly influenced by CIF
INMON CORPORATE INFORMATION FACTORY
Data Warehouse / DHBWDaimler TSS 78
79. DATA VAULT 2.0 ARCHITECTURE – TODAY’S WORLD (DAN
LINSTEDT)
Data Warehouse / DHBWDaimler TSS
80. DATA VAULT 2.0 ARCHITECTURE (DAN LINSTEDT)
Data Warehouse / DHBWDaimler TSS 80
Michael Olschimke, Dan Linstedt: Building a Scalable Data Warehouse with Data Vault 2.0, Morgan Kaufmann, 2015, Chapter 2.2
81. DATA VAULT 2.0 ARCHITECTURE (DAN LINSTEDT)
Data Warehouse / DHBWDaimler TSS 81
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging Layer
(Input Layer)
OLTP
OLTP
Raw
Data Vault
Mart Layer
(Output
Layer)
(Reporting
Layer)
Business
Data Vault
Metadata Management
Security
DWH Manager incl. Monitor
Hard
Rules
only
Soft Rules
time-variant,
non-volatile,
Integrated by BK
(integrated)
subject-
oriented,
integrated
82. • Core Warehouse Layer is modeled with Data Vault and integrates data by
BK (business key) “only” (Data Vault modeling is a separate lecture)
• Business rules (Soft Rules) are applied from Raw Data Vault Layer to Mart
Layer and not earlier
• Alternatively from Raw Data Vault to additional layer called Business Data Vault
• Hard Rules don’t change data
• Data is fully auditable
• Real-time capable architecture
• Architecture got very popular recently; also applicable to BigData, NoSQL
DATA VAULT 2.0 ARCHITECTURE (DAN LINSTEDT)
Data Warehouse / DHBWDaimler TSS 82
83. • In the classical DWHs, the Core Warehouse Layer is regarded as “single
version of the truth”
• Integrates + cleanses data from different sources and eliminates contradiction
• Produces consistent results/reports across Data Marts
• But: cleansing is (still) objective, Enterprises change regularly, paradigm does not
scale as more and more systems exist
• Data in Raw Data Vault Layer is regarded as “Single version of the facts”
• 100% of data is loaded 100% of time
• Data is not cleansed and bad data is not removed in the Core Layer (Raw Vault)
DATA VAULT 2.0 ARCHITECTURE (DAN LINSTEDT)
Data Warehouse / DHBWDaimler TSS 83
84. • Data Vault is optimized for the following requirements:
• Flexibility
• Agility
• Data historization
• Data integration
• Auditability
• Bill Inmon wrote in 2008: “Data Vault is the optimal approach for
modeling the EDW in the DW2.0 framework.” (DW2.0)
DATA VAULT 2.0 ARCHITECTURE (DAN LINSTEDT)
Data Warehouse / DHBWDaimler TSS 84
85. DW 2.0: THE ARCHITECTURE FOR THE NEXT GENERATION
OF DATA WAREHOUSING
Data Warehouse / DHBWDaimler TSS 85
Source: W.H. Inmon, Dan Linstedt: Data Architecture: A Primer for the Data Scientist, Morgan Kaufmann, 2014, chapter 3.1
Operational application
data model
Integrated corporate
data model
Integrated corporate
data model
Archival
data model
DataLifecycle
86. Main characteristics:
• Structured and “unstructured” data, not just metrics
• Life Cycle of data with different storage areas
• Hot data: High speed, expensive storage (RAM, SSDs) for most
recent data
• …
• Cold data: Low speed, inexpensive storage (e.g. hard disks) for old data; archival
data model with high compression
• Metadata is an integral part of the DWH and not an afterthought
DW 2.0: THE ARCHITECTURE FOR THE NEXT GENERATION
OF DATA WAREHOUSING
Data Warehouse / DHBWDaimler TSS 86
87. VIRTUAL DATA WAREHOUSE
Data Warehouse / DHBWDaimler TSS 87
Data Warehouse
FrontendBackend
External data sources
Internal data sources
OLTP
OLTP
Query Management
Weakly+partly subject-oriented,
Weakly+partly integrated,
Not time-variant,
Not non-volatile
88. • Data not extracted from operational systems and stored separately
• Standardized interface for all operational data sources
• One "GUI" for all existing data
• Generates combined queries
• Query Processor joins query result data from different sources
• Can also access data in Hadoop (Polybase, Big SQL, BigData SQL, etc)
VIRTUAL DATA WAREHOUSE
Data Warehouse / DHBWDaimler TSS 88
89. • Query Management manages metadata about all operational systems
• (physical) location of data and algorithms for extracting data from OLTP system
• Implementation easier
• Low cost: can use existing hardware infrastructure
• Queries cause significant performance problems in operational systems
• Known problems when analyzing operational data directly
• Same query is processed multiple times (if queried multiple times)
• Same query delivers different results when processed at different times
VIRTUAL DATA WAREHOUSE
Data Warehouse / DHBWDaimler TSS 89
90. OPERATIONAL DATA STORE (ODS)
Data Warehouse / DHBWDaimler TSS 90
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging Layer
(Input Layer)
OLTP
OLTP
Core
Warehouse
Layer
(Storage
Layer)
Mart Layer
(Output Layer)
(Reporting Layer)
Metadata Management
Security
DWH Manager incl. Monitor
subject-
oriented,
integrated,
time-
variant,
non-
volatile
Operational Data Store
91. • ODS: Real-time/Right-time layer
• Replication techniques used to transport data from source database to
ODS layer with minimal impact on source system
• Data in the ODS has no history and is stored without any cleansing and
without any integration (1:1 copy from single source)
• DWH performance not optimal as data model is suited for OLTP and not
for reporting requirements
• ODS normally additionally to Staging / Core DWH / Mart Layer but can
exist alone without other layers
OPERATIONAL DATA STORE (ODS)
Data Warehouse / DHBWDaimler TSS 91
92. EXAMPLE DWH FOR STATE OF CONSTRUCTION DOCU
Data Warehouse / DHBWDaimler TSS 92
93. ARCHITECTURE FROM AN ACTUAL PROJECT
Data Warehouse / DHBWDaimler TSS 93
ETL Engine
Frontend
Standard
Reports
AdHoc
Reports
Logs
TSM
IIDR
ReplEngine
Source
Datastore
Source
Mirror DB
(Operational
Data Store)
OLTP
DB
IIDR ReplEngine
Mirror
Datastore
Mirror
IIDR ReplEngine
DWH
Datastore
DWH
DWH DB
Staging Layer
Raw + Business Data Vault
Mart Layer
94. END USER SAMPLE QUESTIONS
Data Warehouse / DHBWDaimler TSS 94
Which vehicles or aggregates are documented incompletely? (Data quality)
Which vehicles / which control units require SW updates?
Which interiors are most common by region?
Supply data for external simulations, customs clearance, spare part planning, etc.
95. Review the presented data warehouse architectures.
Which architecture would you recommend for
• A holding of 3 telecommunication companies
• An online store with real/right-time data integration needs
• Marketing department of a bank
List advantages and drawbacks of your proposal.
EXERCISE: RECOMMEND AN ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 95
96. A holding of 3 telecommunication companies
• Architecture: Virtual Data Warehouse
• + Companies may not want to provide their data to a new storage
• + Can easily be extended if new companies join the holding or reduced if a company
leaves the holding
• - Bad performance
• - Not really data integration achieved, low Data Quality
• - Firewalls have to be opened
EXERCISE: RECOMMEND AN ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 96
97. An online store with real-time/right-time data integration needs
• Architecture: Data Vault 2.0
• + Integration of many internal and external source systems (e.g. integrate social media
data about the online store)
• + Fast data delivery in Raw Vault Layer (Real-time/Right-time data integration).
Complex data cleansing / transformation / soft rules are delayed downstream towards
Mart Layer
• - Transformation overhead (Source system data model to Data Vault data model to
Dimensional data model)
EXERCISE: RECOMMEND AN ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 97
98. Marketing department of a bank
• Architecture: Kimball Bus architecture
• + Start small for a department. If other departments are interested, new data and new
Marts can be added on demand
• - High risk to loose the Enterprise view and several DWHs are built
That’s still quite a common scenario nowadays. A single Enterprise DWH is
often not achieved (e.g. Mergers & Acquisitions, inflexibility due to a single
centralized DWH, rapidly changing conditions, etc.) and therefore very often
several DWHs with different architectures exist in parallel within a company.
EXERCISE: RECOMMEND AN ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 98
99. Which layers does the logical standard architecture have?
• Staging (Input), Integration (Cleansing), Core Warehouse (Storage), Aggregation, Mart
(Reporting, Output) and additionally Metadata, Security, DWH Manager, Monitor
Which other architectures exist?
• Kimball Bus Architecture (Central data warehouse based on data marts)
• Inmon Corporate Information Factory
• Data Vault 2.0 Architecture (Dan Linstedt)
• DW 2.0: The Architecture for the Next Generation of Data Warehousing
• Virtual Data Warehouse
SUMMARY
Data Warehouse / DHBWDaimler TSS 99
100. SUPPLEMENT: POLYGLOT DATA ARCHITECTURE
ARCHITECTURES AROUND BUZZWORDS LIKE
BIG DATA, STREAMING & DATA LAKES
101. • There exist well-known reference architectures for Data Warehouses
• Many tools and schema-on-read came with the Hadoop ecosystem
• Was a “black box” at the beginning
• Gets more and more structure with different layers instead of a “black box”
• Structure, modeling, organization, governance instead of tool-only focus
• The slides provide some architectures with links to more information
BIG DATA / DATA LAKE ARCHITECTURES
INTRODUCTION
Data Warehouse / DHBWDaimler TSS 101
102. • Architecture by Nathan Marz
• Realtime and batch processing
• Batch layer stores and historizes raw data
• Serving layer has to union batch and realtime layer
• Rather complex
• Author recommends graph data model and advises against schema-on-
read
LAMBDA ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 102
103. LAMBDA ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 103
Source:
Batch Layer
Batch
Engine Serving Layer
Serving
Backend Queries
Raw history
data
Result
data
Real-Time Layer
Real-Time
Engine
104. • Architecture by Jay Kreps
• Logcentric, write-ahead logging
• Each event is an immutable log entry and is added to the end of the log
• Read and write operations are separated
• Materialized views can be recomputed consistently from data in the log
KAPPA ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 104
105. KAPPA ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 105
Source:
Real-Time Layer
Real-Time
Engine
Serving Layer
Serving
BackendData Queries
Raw history
data
Result
data
106. • Architecture by Rogier Werschkull
• Store incoming data in Data Library Layer (Persistent staging = PS)
• Prepare data in a 3C layer for “Concept – Context – Connector”-model
• Concept + Connector can be virtualized on data in Data Library Layer
PS-3C ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 106
107. PS-3C ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 107
Source: TDWI 2016
Data Library
Storage
Engine
3C Layer
Preparation
EngineData Queries
Serving Layer
Delivery
Engine
Raw history
data
Integrated
subjects
Result
data
108. • Architecture by Joe Caserta
• Big Data Warehouse may live in one or more platforms on premise or in
the cloud
• Hadoop only
• Hadoop + MPP or RDBMS
• Additionally NoSQL or Search
POLYGLOT WAREHOUSE
Data Warehouse / DHBWDaimler TSS 108
110. • Architecture by Claudia Imhoff
• combine the stability and reliability of the BI architectures while
embracing new and innovative technologies and techniques
• 3 components that extend the EDW environment
• Investigative computing platform
• Data refinery
• Real-time (RT) analysis platform
THE EXTENDED DATA WAREHOUSE ARCHITECTURE (XDW)
THE ENTERPRISE ANALYTICS ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 110
111. THE EXTENDED DATA WAREHOUSE ARCHITECTURE (XDW)
THE ENTERPRISE ANALYTICS ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 111
Source: https://upside.tdwi.org/articles/2016/03/15/extending-traditional-data-warehouse.aspx
112. SUMMARY
Data Warehouse / DHBWDaimler TSS 112
Landing Area
Storage
Engine
Data Lake
Integration
EngineData Queries
Data Presentation
Delivery
Engine
Raw history
data
Lightly
integrated data
Result
data
113. Daimler TSS GmbH
Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99
tss@daimler.com / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSS
Domicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle
Data Warehouse / DHBWDaimler TSS 113
THANK YOU
Editor's Notes
Mission: Wir sind Spezialist und strategischer Business-Partner für innovative IT-Gesamtlösungen im Daimler-Konzern – not just another supplier! more than another supplier!
Who has already experience with DWH?
With DBs?
Inmon: Subject Orientation
The data warehouse is oriented to the major subject areas of the corporation that have been defined in the high-level corporate data model. Typical subject areas include the following:
Customer
Product
Transaction or activity
Policy
Claim
Account
Nicht löschbar / permanent
zeit-variant: zu jeder Zeit das gleiche Verhalten bei gleicher Eingabe zu zeigen
Subject-oriented, integrated, non-volatile, time-variant
Ralph Kimball, Bill Inmon