1© Cloudera, Inc. All rights reserved.
More Data in Less Time
Deploying an Operational Data Store with Cloudera
2© Cloudera, Inc. All rights reserved.
Trends in the Market
16 billion connected devices
generating more data
“It will soon be technically
feasible & affordable to record
& store everything…”
ELT drives up to 80% of
database capacity
Internet of Things Data Storage Costs Resource Intensive ELT
Trends Driving Change
Source: Forbes Source: New York Times Source: Syncsort
3© Cloudera, Inc. All rights reserved.
Customers are augmenting their
traditional architectures for
modern business needs.
4© Cloudera, Inc. All rights reserved.
Operational Data Store (ODS):
Ingesting, storing, and preparing data for
both operational and analytical use.
(AKA: Operational Data Warehouse., RDBMS, Storage)
5© Cloudera, Inc. All rights reserved.
ODS Use Cases
Offload resource intensive ETL
workloads from systems
Migrate old data and ELT
workloads off of EDW
Store old data online so analyst
can access historic data
ETL Offload EDW Optimization Active Archive
6© Cloudera, Inc. All rights reserved.
Goals of an Operational Data Store
Ingest Data Store DataPrepare Data
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
7© Cloudera, Inc. All rights reserved.
Challenges with a Traditional Architecture
1) Limited Data Ingest
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
1
8© Cloudera, Inc. All rights reserved.
Challenges with a Traditional Architecture
1) Limited Data Ingest 2) Inefficient Data Processing
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
1
2
2
9© Cloudera, Inc. All rights reserved.
Challenges with a Traditional Architecture
1) Limited Data Ingest 2) Inefficient Data Processing 3) Data Archived
Enterprise Data Warehouse
ApplicationsData Sources
Structured
Unstructured
Ingest
Operational Data Store
Traditional Architecture
Enterprise Data Warehouse
ServeELT
Archive
BI System
Modeling
Reporting
ETL
Storage #1
Storage #2
Storage N
Ingest
Process
Load
1
2
2
3
10© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Ingest More Data
ApplicationsData Sources
Structured
Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
EDHIngest
Active
Structured Data
Serve
Serve
ELT
Archive
Load
1
ETL
BI System
Modeling
Reporting
11© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Ingest More Data 2) Optimize Data Processing
ApplicationsData Sources
Structured
Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
EDHIngest
Active
Structured Data
Serve
Serve
ELT
Archive
Load
2
1
ETL
BI System
Modeling
Reporting
12© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Ingest More Data 2) Optimize Data Processing 3) Automated Secure Archive
ApplicationsData Sources
Structured
Unstructured
Operational Data Store
Modern Architecture
Enterprise Data Warehouse
EDHIngest
Active
Structured Data
Serve
Serve
ELT
Archive
Load
2
31
ETL
BI System
Modeling
Reporting
13© Cloudera, Inc. All rights reserved.
RelayHealth Customer Story
14© Cloudera, Inc. All rights reserved.
About RelayHealth (A McKesson Business)
What does RelayHealth do-
RelayHealth is a financial solution of McKesson used to automate 2.4 billion financial transactions per year
200K Physicians, 2K Hospitals, 1.9K Payers/ Health Plans
Who is McKesson-
Largest healthcare solution company in the world with $103+ billion in revenue
Headquarters in San Francisco and established in 1833
32K employees
15© Cloudera, Inc. All rights reserved.
RelayHealth’s Objectives
Offload resource intensive ETL
workloads from systems
Migrate old data and ELT
workloads off of EDW
Store old data online so analyst
can access historic data
ETL Offload EDW Optimization Active Archive
16© Cloudera, Inc. All rights reserved.
The Pre-Hadoop Environment
1 Deleted & archived information
Challenges
OLTP
Claim
Submitters
Various
Applications
RDBMS
EDW
Reports
Archive
1
RelayHealth Transaction
BATCH Processing System
17© Cloudera, Inc. All rights reserved.
The Pre-Hadoop Environment
1 Deleted & archived information
Challenges
OLTP
Claim
Submitters
Various
Applications
RDBMS
EDW
Reports
Archive
2 Batch wasn’t cutting it
1
2
RelayHealth Transaction
BATCH Processing System
18© Cloudera, Inc. All rights reserved.
The Pre-Hadoop Environment
1 Deleted & archived information
Challenges
OLTP
Claim
Submitters
Various
Applications
RDBMS
EDW
Reports
Archive
2 Batch wasn’t cutting it
3 Application & report latency
1
3
3
2
3
RelayHealth Transaction
BATCH Processing System
19© Cloudera, Inc. All rights reserved.
RelayHealth’s Modern Hadoop Architecture
Active archive on Hadoop1
Improvements
Traditional BATCH Processing
Hadoop STREAM Processing
Process
Payer
Application
Reports
Spark
Streaming
Claim
Submitters
RelayHealth Transaction Processing System
Ingest Store Access
Kafka Hbase
Search
Spark
Modeling
1
20© Cloudera, Inc. All rights reserved.
RelayHealth’s Modern Hadoop Architecture
Active archive on Hadoop1
Improvements
Traditional BATCH Processing
Hadoop STREAM Processing
Process
Payer
Application
Reports
Spark
Streaming
Claim
Submitters
RelayHealth Transaction Processing System
Ingest Store Access
Kafka Hbase
Search
Spark
Modeling
Stream & batch processing2
2
1
21© Cloudera, Inc. All rights reserved.
RelayHealth’s Modern Hadoop Architecture
Active archive on Hadoop1
Improvements
Traditional BATCH Processing
Hadoop STREAM Processing
Process
Payer
Application
Reports
Spark
Streaming
Claim
Submitters
RelayHealth Transaction Processing System
Ingest Store Access
Kafka Hbase
Search
Spark
Modeling
Stream & batch processing2
Prepared for future use cases3
2
3
1
22© Cloudera, Inc. All rights reserved.
Business and Technical ROI
Technology ROI
Business ROI
1) Active archive and Navigator for HIPAA compliance
2) Prepared for future use cases
3) Data ingest goes from end of day to near real-time
1) Transaction processed in 20ms VS 1 hour prior
2) $250k in licensing and hardware savings per year
3) Greater flexibility with data ingest
23© Cloudera, Inc. All rights reserved.
Key Leanings
Crawl, walk, run
It takes time, start now
Lean on experts in the community
24© Cloudera, Inc. All rights reserved.
INSERT PARTNER SLIDES
25© Cloudera, Inc. All rights reserved.
Thank you

Breakout: Hadoop and the Operational Data Store

  • 1.
    1© Cloudera, Inc.All rights reserved. More Data in Less Time Deploying an Operational Data Store with Cloudera
  • 2.
    2© Cloudera, Inc.All rights reserved. Trends in the Market 16 billion connected devices generating more data “It will soon be technically feasible & affordable to record & store everything…” ELT drives up to 80% of database capacity Internet of Things Data Storage Costs Resource Intensive ELT Trends Driving Change Source: Forbes Source: New York Times Source: Syncsort
  • 3.
    3© Cloudera, Inc.All rights reserved. Customers are augmenting their traditional architectures for modern business needs.
  • 4.
    4© Cloudera, Inc.All rights reserved. Operational Data Store (ODS): Ingesting, storing, and preparing data for both operational and analytical use. (AKA: Operational Data Warehouse., RDBMS, Storage)
  • 5.
    5© Cloudera, Inc.All rights reserved. ODS Use Cases Offload resource intensive ETL workloads from systems Migrate old data and ELT workloads off of EDW Store old data online so analyst can access historic data ETL Offload EDW Optimization Active Archive
  • 6.
    6© Cloudera, Inc.All rights reserved. Goals of an Operational Data Store Ingest Data Store DataPrepare Data Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load
  • 7.
    7© Cloudera, Inc.All rights reserved. Challenges with a Traditional Architecture 1) Limited Data Ingest Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load 1
  • 8.
    8© Cloudera, Inc.All rights reserved. Challenges with a Traditional Architecture 1) Limited Data Ingest 2) Inefficient Data Processing Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load 1 2 2
  • 9.
    9© Cloudera, Inc.All rights reserved. Challenges with a Traditional Architecture 1) Limited Data Ingest 2) Inefficient Data Processing 3) Data Archived Enterprise Data Warehouse ApplicationsData Sources Structured Unstructured Ingest Operational Data Store Traditional Architecture Enterprise Data Warehouse ServeELT Archive BI System Modeling Reporting ETL Storage #1 Storage #2 Storage N Ingest Process Load 1 2 2 3
  • 10.
    10© Cloudera, Inc.All rights reserved. A New Way Forward 1) Ingest More Data ApplicationsData Sources Structured Unstructured Operational Data Store Modern Architecture Enterprise Data Warehouse EDHIngest Active Structured Data Serve Serve ELT Archive Load 1 ETL BI System Modeling Reporting
  • 11.
    11© Cloudera, Inc.All rights reserved. A New Way Forward 1) Ingest More Data 2) Optimize Data Processing ApplicationsData Sources Structured Unstructured Operational Data Store Modern Architecture Enterprise Data Warehouse EDHIngest Active Structured Data Serve Serve ELT Archive Load 2 1 ETL BI System Modeling Reporting
  • 12.
    12© Cloudera, Inc.All rights reserved. A New Way Forward 1) Ingest More Data 2) Optimize Data Processing 3) Automated Secure Archive ApplicationsData Sources Structured Unstructured Operational Data Store Modern Architecture Enterprise Data Warehouse EDHIngest Active Structured Data Serve Serve ELT Archive Load 2 31 ETL BI System Modeling Reporting
  • 13.
    13© Cloudera, Inc.All rights reserved. RelayHealth Customer Story
  • 14.
    14© Cloudera, Inc.All rights reserved. About RelayHealth (A McKesson Business) What does RelayHealth do- RelayHealth is a financial solution of McKesson used to automate 2.4 billion financial transactions per year 200K Physicians, 2K Hospitals, 1.9K Payers/ Health Plans Who is McKesson- Largest healthcare solution company in the world with $103+ billion in revenue Headquarters in San Francisco and established in 1833 32K employees
  • 15.
    15© Cloudera, Inc.All rights reserved. RelayHealth’s Objectives Offload resource intensive ETL workloads from systems Migrate old data and ELT workloads off of EDW Store old data online so analyst can access historic data ETL Offload EDW Optimization Active Archive
  • 16.
    16© Cloudera, Inc.All rights reserved. The Pre-Hadoop Environment 1 Deleted & archived information Challenges OLTP Claim Submitters Various Applications RDBMS EDW Reports Archive 1 RelayHealth Transaction BATCH Processing System
  • 17.
    17© Cloudera, Inc.All rights reserved. The Pre-Hadoop Environment 1 Deleted & archived information Challenges OLTP Claim Submitters Various Applications RDBMS EDW Reports Archive 2 Batch wasn’t cutting it 1 2 RelayHealth Transaction BATCH Processing System
  • 18.
    18© Cloudera, Inc.All rights reserved. The Pre-Hadoop Environment 1 Deleted & archived information Challenges OLTP Claim Submitters Various Applications RDBMS EDW Reports Archive 2 Batch wasn’t cutting it 3 Application & report latency 1 3 3 2 3 RelayHealth Transaction BATCH Processing System
  • 19.
    19© Cloudera, Inc.All rights reserved. RelayHealth’s Modern Hadoop Architecture Active archive on Hadoop1 Improvements Traditional BATCH Processing Hadoop STREAM Processing Process Payer Application Reports Spark Streaming Claim Submitters RelayHealth Transaction Processing System Ingest Store Access Kafka Hbase Search Spark Modeling 1
  • 20.
    20© Cloudera, Inc.All rights reserved. RelayHealth’s Modern Hadoop Architecture Active archive on Hadoop1 Improvements Traditional BATCH Processing Hadoop STREAM Processing Process Payer Application Reports Spark Streaming Claim Submitters RelayHealth Transaction Processing System Ingest Store Access Kafka Hbase Search Spark Modeling Stream & batch processing2 2 1
  • 21.
    21© Cloudera, Inc.All rights reserved. RelayHealth’s Modern Hadoop Architecture Active archive on Hadoop1 Improvements Traditional BATCH Processing Hadoop STREAM Processing Process Payer Application Reports Spark Streaming Claim Submitters RelayHealth Transaction Processing System Ingest Store Access Kafka Hbase Search Spark Modeling Stream & batch processing2 Prepared for future use cases3 2 3 1
  • 22.
    22© Cloudera, Inc.All rights reserved. Business and Technical ROI Technology ROI Business ROI 1) Active archive and Navigator for HIPAA compliance 2) Prepared for future use cases 3) Data ingest goes from end of day to near real-time 1) Transaction processed in 20ms VS 1 hour prior 2) $250k in licensing and hardware savings per year 3) Greater flexibility with data ingest
  • 23.
    23© Cloudera, Inc.All rights reserved. Key Leanings Crawl, walk, run It takes time, start now Lean on experts in the community
  • 24.
    24© Cloudera, Inc.All rights reserved. INSERT PARTNER SLIDES
  • 25.
    25© Cloudera, Inc.All rights reserved. Thank you

Editor's Notes

  • #3 Data storage costs: http://thecaucus.blogs.nytimes.com/2012/08/14/advances-in-data-storage-have-implications-for-government-surveillance/IoT: http://www.forbes.com/sites/gilpress/2014/08/22/internet-of-things-by-the-numbers-market-estimates-and-forecasts/ Resource Intensive ELT: http://www.syncsort.com/getattachment/45696aa9-1e40-43cb-8905-b9fc7e2519f7/Syncsort-Data-Warehouse-Offload-Solution.aspx
  • #7 An Operational Data Store provides a staging environment in order to ingest, store, and process data in preparation for operational and analytical use. Depending on whether or not this data is structured or unstructured, different systems can be used to optimize data pipelines. The only challenge is that as your organization continues to ask for larger volumes of diverse data, traditional systems face issues.
  • #8 These challenges specifically arise around data storage and processing. The first challenge is limited data access. Collecting and ingesting a wide variety of diverse data is not a simple task and usually results in additional systems, or capacity being added to the architecture. As the business continues to ask for more data this continues to put strains on IT. In order to avoid these challenges only the most valuable data is brought in, limiting the businesses access to data that could be extremely valuable. The second challenges that we see organizations try to hurdle is around processing data volumes. These organizations have already collected and operationalized large volumes of data and need to process this data efficiently in order to meet SLAs. If data doesn’t reach the employees in a timely manner then they continue on without the most recent information. The third and final set of challenges is around archiving data. When systems reach capacity as larger volumes of diverse data is leveraged within an organization, this causes IT professionals to archive or delete data that has been deemed “invaluable”. When data is moved offline to an archive, this significantly reduces the return on the data and can hurt the business. This data can be extremely important as analyst attempt to find patterns in historic data but can’t access this information because it’s offline. However, as the external and internal data environment has changed over the years so has the data management space.
  • #9 These challenges specifically arise around data storage and processing. The first challenge is limited data access. Collecting and ingesting a wide variety of diverse data is not a simple task and usually results in additional systems, or capacity being added to the architecture. As the business continues to ask for more data this continues to put strains on IT. In order to avoid these challenges only the most valuable data is brought in, limiting the businesses access to data that could be extremely valuable. The second challenges that we see organizations try to hurdle is around processing data volumes. These organizations have already collected and operationalized large volumes of data and need to process this data efficiently in order to meet SLAs. If data doesn’t reach the employees in a timely manner then they continue on without the most recent information. The third and final set of challenges is around archiving data. When systems reach capacity as larger volumes of diverse data is leveraged within an organization, this causes IT professionals to archive or delete data that has been deemed “invaluable”. When data is moved offline to an archive, this significantly reduces the return on the data and can hurt the business. This data can be extremely important as analyst attempt to find patterns in historic data but can’t access this information because it’s offline. However, as the external and internal data environment has changed over the years so has the data management space.
  • #10 These challenges specifically arise around data storage and processing. The first challenge is limited data access. Collecting and ingesting a wide variety of diverse data is not a simple task and usually results in additional systems, or capacity being added to the architecture. As the business continues to ask for more data this continues to put strains on IT. In order to avoid these challenges only the most valuable data is brought in, limiting the businesses access to data that could be extremely valuable. The second challenges that we see organizations try to hurdle is around processing data volumes. These organizations have already collected and operationalized large volumes of data and need to process this data efficiently in order to meet SLAs. If data doesn’t reach the employees in a timely manner then they continue on without the most recent information. The third and final set of challenges is around archiving data. When systems reach capacity as larger volumes of diverse data is leveraged within an organization, this causes IT professionals to archive or delete data that has been deemed “invaluable”. When data is moved offline to an archive, this significantly reduces the return on the data and can hurt the business. This data can be extremely important as analyst attempt to find patterns in historic data but can’t access this information because it’s offline. However, as the external and internal data environment has changed over the years so has the data management space.
  • #11 We have been working closely with leading organizations to create a platform that allows them to complement their current architecture in order to avoid these common challenges. This in turn prepares for future growth of data within their organizations. Ingest More Data- Cloudera allows you to collect and ingest any data type or volume of data, in full fidelity, in order to allow for complete data access to your current systems and end users. This has allowed organizations to collect and access more diverse data, opening up the possibilities of what data can do for the business, without compromising system performance or existing resource constraints. Efficiently Process & Store Data Volumes- By offloading heavy processing workloads to Cloudera, organizations are able to use parallel processing in order to significantly reduce processing time on large volumes of data. With the scalable nature of Cloudera, you also ensure that no matter how much data is stored the platform continues to perform at peak performance. Automated Secure Archive- Leveraging Cloudera as an ODS and using it as a centralized staging environment for new data allows you to automatically create a secure archive. Because of the platform’s scalable nature, there is never a reason to archive your data. Historic data can remain on the platform for analysts allowing them complete access without derogating system performance. While smaller volumes of already defined active data can run directly into the right systems, with outdated data being offloaded to Cloudera. Leading data organizations have already seen these benefits.
  • #12 We have been working closely with leading organizations to create a platform that allows them to complement their current architecture in order to avoid these common challenges. This in turn prepares for future growth of data within their organizations. Ingest More Data- Cloudera allows you to collect and ingest any data type or volume of data, in full fidelity, in order to allow for complete data access to your current systems and end users. This has allowed organizations to collect and access more diverse data, opening up the possibilities of what data can do for the business, without compromising system performance or existing resource constraints. Efficiently Process & Store Data Volumes- By offloading heavy processing workloads to Cloudera, organizations are able to use parallel processing in order to significantly reduce processing time on large volumes of data. With the scalable nature of Cloudera, you also ensure that no matter how much data is stored the platform continues to perform at peak performance. Automated Secure Archive- Leveraging Cloudera as an ODS and using it as a centralized staging environment for new data allows you to automatically create a secure archive. Because of the platform’s scalable nature, there is never a reason to archive your data. Historic data can remain on the platform for analysts allowing them complete access without derogating system performance. While smaller volumes of already defined active data can run directly into the right systems, with outdated data being offloaded to Cloudera. Leading data organizations have already seen these benefits.
  • #13 We have been working closely with leading organizations to create a platform that allows them to complement their current architecture in order to avoid these common challenges. This in turn prepares for future growth of data within their organizations. Ingest More Data- Cloudera allows you to collect and ingest any data type or volume of data, in full fidelity, in order to allow for complete data access to your current systems and end users. This has allowed organizations to collect and access more diverse data, opening up the possibilities of what data can do for the business, without compromising system performance or existing resource constraints. Efficiently Process & Store Data Volumes- By offloading heavy processing workloads to Cloudera, organizations are able to use parallel processing in order to significantly reduce processing time on large volumes of data. With the scalable nature of Cloudera, you also ensure that no matter how much data is stored the platform continues to perform at peak performance. Automated Secure Archive- Leveraging Cloudera as an ODS and using it as a centralized staging environment for new data allows you to automatically create a secure archive. Because of the platform’s scalable nature, there is never a reason to archive your data. Historic data can remain on the platform for analysts allowing them complete access without derogating system performance. While smaller volumes of already defined active data can run directly into the right systems, with outdated data being offloaded to Cloudera. Leading data organizations have already seen these benefits.
  • #22 Arrow from batch to stream processing