SlideShare a Scribd company logo
HADOOP AND
THE DATA WAREHOUSE:
WHEN TO USE WHICH
2 Copyright Teradata
• Data warehouse strengths
> What is a Data Warehouse?
• Hadoop strengths
• When to use which
> Hadoop
> Data warehouse
Agenda
3 Copyright Teradata
Data Hub/Lake DataWarehouse Discovery
Three Primary Workloads
• Data models
• Data integration
• Trusted data
• Concurrent users
• Workload mgmt
• Response time
• Easy to use
• Many tools
• Algorithm collections
• Data wrangling
• Business user access
• Semi-production
• Fast raw data ingest
• Archival
• ETL refinery
• Search
• Relaxed SLAs
• Millions of files
4 Copyright Teradata
Best Fit Primary Strengths and Overlaps
Data
Warehouse
DiscoveryDataLake
WHY HADOOP IS NOT A
DATA WAREHOUSE
6 Copyright Teradata
• A data design pattern, an architecture
> Not necessarily a database
• Definition: Gartner (2005) /Inmon (1992)
> Subject oriented
– Detailed data + modeling of sales, inventory, finance, etc.
> Integrated logical model
– Merged data
– Consistent, standardized data formats and values
> Nonvolatile
– Data stored unmodified for long periods of time
> Time variant
– Record versioning or temporal services
> Persistent storage, not virtual, not federated
What is a Data Warehouse?
Source: Gartner: Of Data Warehouses, Operational Data Stores, Data Marts and Data 'Outhouses‘, Dec 2005;
Inmon, Building the Data Warehouse, 1992, Wiley and Sons
7 Copyright Teradata
By Definition
Data
Warehouse
Hadoop
Subject oriented 5 0
Detailed data 5 5
Modeled by business subject 5 0
Integrated 5 0
Merged, deduplicated data 5 0
Standardized data formats and values 5 0
Nonvolatile storage 5 5
Time variant: record versions, temporal 5 0
Persistent storage 5 5
Data Warehouse Design Pattern
0=none, 1= poor, 2= limited, 3= average, 4=robust, 5=outstanding
8 Copyright Teradata
NoSchema, Schema-on-Read, Complex Schemas
Single file
(Schema-on-read)
Data Marts
(Schema-on-read)
Data Warehouse
(Schema-on-
write)
No schema, no joins
One source
Raw data
3-5 uses
Star and snowflake
schemas
2-4 fact table joins
Multiple sources
Raw data, unknown
data
Key value stores
5K-10K tables
20-50 way joins
Cross-organization
Pre-integrated,
cleansed
Referential integrity
Many applications
Events
Locations
Finance Transaction
Session
Orders
InventoryCall
Center
POS
9 Copyright Teradata
• Not a database
> No schema, indexes, optimizer
> No separation of code and data structure
> Hadoop uses objects and files
– Not rows and columns
• Hive helps a little
> Limited SQL
> Limited metadata
• Not high performance
• Not fully interactive queries
What Hadoop is Not
See also http://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html
http://blogs.gartner.com/donald-feinberg/2014/12/22/a-database-by-any-other-name/
10 Copyright Teradata
• Guarantees database actions
are processed reliably
• Ensures query result accuracy
• Supports updates and deletes
• Needed for applications that
require 100% consistency
> Banks, finance, inventory, etc.
> Maybe not for Facebook,
Twitter, etc.
• Data you can trust
ACID Advantages of an RDBMS
Atomicity
apply all changes or none
Consistency
rollback on errors
Isolation
one update at a time
Durability
transactions survive crashes
11 Copyright Teradata
Integration and Analytics
Hadoop’s Biggest Differentiators
Capture and
ETL
Long term
archive
Cheap, commodity hardware
Data
Warehouse
12 Copyright Teradata
Data Hub Refinery: Parallel ETL
Social networks Mobile
Web Logs Sensors
13 Copyright Teradata
When We’re Too Small for Hadoop ETL
Avoid hand coded transforms
2 ETL servers do the job
Prefer tool based ETL
ETL is working well
14 Copyright Teradata
When We Need Massive Data Integration
Dozens of ETL servers
High velocity real time data
10s-100s of TB/day
The risk is worth reward
15 Copyright Teradata
When In-database ELT Works Well
Reference data look-ups
Joins for derived data
Lots of derived data
Service-level goals to meet
16 Copyright Teradata
When to Use Which: It Depends
In Database ELT Hadoop
Reference data
• Lookups
• Joins
Transformations
• Structured data
• ELT modules
• SQL can do it
• Unstructured
• Some ETL modules
• Do it yourself
Service level
goals
• Predictable
• System management
Data security • Robust
Costs • Commodity hardware
Data quality • Governance, MDM • Low quality/trust OK
Data volume • High volume • Extreme volume
Offload ELT • Migration costs
Agility • No governance
WHERE HADOOP EXCELS
18 Copyright Teradata
• Commodity low cost hardware
• Many programing languages
> But mostly it’s Java
• Free open source
• Any data structure
• Scale-out to petabytes + parallelism
Hadoop Strengths
19 Copyright Teradata
• ETL on steroids
• Economically ”keep files forever”
> Queryable
• File based reporting and analytics
• Backup and archival storage
> Databases, files, development
Hadoop: the Data Hub
20 Copyright Teradata
• Temporary data, data exhaust
• Data mining/exploration
> 1000s of continuous variables
> Linear algebra
> Graph mining
> Machine learning
> Random forest, decision trees
> Markov chains
• Not all data mining  MapReduce
> Many things work better in MPP RDBMS
> In-database SAS, R, Fuzzy Logix
> It depends
Where MapReduce Excels
21 Copyright Teradata
• Easy to work on non-relational data
> Java data types
> JSON, objects
• Hadoop is written in the Java
> Compatible APIs, skills, concepts, frameworks, scripts
• Huge open source factories
> Apache, GitHub, Eclipse, SourceForge,etc.
> Assorted compression algorithms
• People
> 9M-10M java programmers
> Web tutorials – extensive “how to” topics
> University student research
Developer Advantages with Hadoop
22 Copyright Teradata
• Raw data format provides complete flexibility
• Non-traditional data types easily supported
> Graph, text, weblog, etc.
• No upfront ETL required
• No data loading required
• Flexible: late binding let’s data scientist choose
NoSchema Advantages
41521390 2013-01-01 00:25:42 2.111.94.18
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-
us) AppleWebKit/533.19.4 (KHTML, like Gecko)
Version/5.0.3 Safari/533.19.4
"http://www.cokstate.edu/welcome/"
"https://www.google.com/#sclient=psyab&hl=en&sourc
e=hp&q=oklahoma+state&pbx=1&oq”
Weblog
Note: there are many pitfalls when schema-on-read is not a good solution
23 Copyright Teradata
Attributes Favoring Hadoop
Reason Description
Cost Low cost, low value data before refinement
Multi structured
data ingest
Raw weblogs, Twitter, Facebook, mobile,
PST files, etc.
Data depth
High data volume, few users, high signal-to-
noise ratio
Non-SQL analytics
Complex processes, pipeline transforms,
random forests, Markov chains, enormous
arrays, etc.
Flexibility,
autonomy
Exploratory analysis with little governance
Fast, short-term turn around
Ugly data
Videos, satellite images, format conversions
(PDF to text)
24 Copyright Teradata
MPP RDBMS Hadoop
Stable schema Evolving schema
Structured data Structure agnostic
Full ANSI SQL Flexible programming
Iterative analysis Batch analysis
Fine grain security N/A
Cleansed data Raw data
Seeks Scans
Updates/deletes Ingest
Service level agreements Flexibility
Core data Source files
Complex joins Complex processing
Efficient CPU and IO Low cost storage
Key Considerations
25 Copyright Teradata
• YARN and Tez
• Queries on flat files!
• Parallel scanning engine
• Developer community
• Complex parallel processing
• Fast ingest of raw data
• Long term archives at full fidelity
• Good scalability
What I Like About Hadoop
26 Copyright Teradata
• Start with workload requirements
> Map the tool capabilities to the requirement
• Hadoop is a DataHub, a Data Lake
> Not a database or data warehouse
> Exploit Hadoop’s strengths
• Combine the data warehouse and Hadoop
> Two tool sets solve more objectives
> Better together
Summary
27 Copyright Teradata
The End

More Related Content

What's hot

Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
hadooparchbook
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
Thang Bui (Bob)
 
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
Mark Rittman
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
DataWorks Summit/Hadoop Summit
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseLaurent Alquier
 
Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)
Remy Rosenbaum
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
Hortonworks
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
Tyler Mitchell
 
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventBig Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventThe Hive
 
Data governance in Hadoop (My Personal Notes)
Data governance in Hadoop (My Personal Notes)Data governance in Hadoop (My Personal Notes)
Data governance in Hadoop (My Personal Notes)
Komes Chandavimol
 
HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
DataWorks Summit
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
DataWorks Summit
 
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
Michael Rainey
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Mark Rittman
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
Roman Chukh
 
Data engineering
Data engineeringData engineering
Data engineering
Parimala Killada
 
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsBig Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Mark Rittman
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
DataWorks Summit/Hadoop Summit
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Mark Rittman
 

What's hot (20)

Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge base
 
Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India eventBig Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
Big Data App servor by Lance Riedel, CTO, The Hive for The Hive India event
 
Data governance in Hadoop (My Personal Notes)
Data governance in Hadoop (My Personal Notes)Data governance in Hadoop (My Personal Notes)
Data governance in Hadoop (My Personal Notes)
 
HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
Data engineering
Data engineeringData engineering
Data engineering
 
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsBig Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
 

Viewers also liked

Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Chris Fregly
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013Nathan Bijnens
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to ThriftDvir Volk
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
Nick Dimiduk
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaselarsgeorge
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
LivePerson
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 

Viewers also liked (8)

Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
Advanced Apache Spark Meetup Data Sources API Cassandra Spark Connector Spark...
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
 
Introduction to Thrift
Introduction to ThriftIntroduction to Thrift
Introduction to Thrift
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 

Similar to Hadoop and IDW - When_to_use_which

Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
Laurent Leturgez
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
Michael Hiskey
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Inside Analysis
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Institute of Contemporary Sciences
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Inside Analysis
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
Caserta
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overview
StreamHorizon
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL David Smelker
 
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse OptimisationBigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
Excelerate Systems
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
David P. Moore
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
cdmaxime
 
Productionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons LearnedProductionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons Learned
Cloudera, Inc.
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
punedevscom
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
chariorienit
 
An AMIS overview of database 12c
An AMIS overview of database 12cAn AMIS overview of database 12c
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentation
mlang222
 

Similar to Hadoop and IDW - When_to_use_which (20)

Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overview
 
Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL Colorado Springs Open Source Hadoop/MySQL
Colorado Springs Open Source Hadoop/MySQL
 
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse OptimisationBigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Productionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons LearnedProductionizing Hadoop - New Lessons Learned
Productionizing Hadoop - New Lessons Learned
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
An AMIS overview of database 12c
An AMIS overview of database 12cAn AMIS overview of database 12c
An AMIS overview of database 12c
 
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentation
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Hadoop and IDW - When_to_use_which

  • 1. HADOOP AND THE DATA WAREHOUSE: WHEN TO USE WHICH
  • 2. 2 Copyright Teradata • Data warehouse strengths > What is a Data Warehouse? • Hadoop strengths • When to use which > Hadoop > Data warehouse Agenda
  • 3. 3 Copyright Teradata Data Hub/Lake DataWarehouse Discovery Three Primary Workloads • Data models • Data integration • Trusted data • Concurrent users • Workload mgmt • Response time • Easy to use • Many tools • Algorithm collections • Data wrangling • Business user access • Semi-production • Fast raw data ingest • Archival • ETL refinery • Search • Relaxed SLAs • Millions of files
  • 4. 4 Copyright Teradata Best Fit Primary Strengths and Overlaps Data Warehouse DiscoveryDataLake
  • 5. WHY HADOOP IS NOT A DATA WAREHOUSE
  • 6. 6 Copyright Teradata • A data design pattern, an architecture > Not necessarily a database • Definition: Gartner (2005) /Inmon (1992) > Subject oriented – Detailed data + modeling of sales, inventory, finance, etc. > Integrated logical model – Merged data – Consistent, standardized data formats and values > Nonvolatile – Data stored unmodified for long periods of time > Time variant – Record versioning or temporal services > Persistent storage, not virtual, not federated What is a Data Warehouse? Source: Gartner: Of Data Warehouses, Operational Data Stores, Data Marts and Data 'Outhouses‘, Dec 2005; Inmon, Building the Data Warehouse, 1992, Wiley and Sons
  • 7. 7 Copyright Teradata By Definition Data Warehouse Hadoop Subject oriented 5 0 Detailed data 5 5 Modeled by business subject 5 0 Integrated 5 0 Merged, deduplicated data 5 0 Standardized data formats and values 5 0 Nonvolatile storage 5 5 Time variant: record versions, temporal 5 0 Persistent storage 5 5 Data Warehouse Design Pattern 0=none, 1= poor, 2= limited, 3= average, 4=robust, 5=outstanding
  • 8. 8 Copyright Teradata NoSchema, Schema-on-Read, Complex Schemas Single file (Schema-on-read) Data Marts (Schema-on-read) Data Warehouse (Schema-on- write) No schema, no joins One source Raw data 3-5 uses Star and snowflake schemas 2-4 fact table joins Multiple sources Raw data, unknown data Key value stores 5K-10K tables 20-50 way joins Cross-organization Pre-integrated, cleansed Referential integrity Many applications Events Locations Finance Transaction Session Orders InventoryCall Center POS
  • 9. 9 Copyright Teradata • Not a database > No schema, indexes, optimizer > No separation of code and data structure > Hadoop uses objects and files – Not rows and columns • Hive helps a little > Limited SQL > Limited metadata • Not high performance • Not fully interactive queries What Hadoop is Not See also http://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html http://blogs.gartner.com/donald-feinberg/2014/12/22/a-database-by-any-other-name/
  • 10. 10 Copyright Teradata • Guarantees database actions are processed reliably • Ensures query result accuracy • Supports updates and deletes • Needed for applications that require 100% consistency > Banks, finance, inventory, etc. > Maybe not for Facebook, Twitter, etc. • Data you can trust ACID Advantages of an RDBMS Atomicity apply all changes or none Consistency rollback on errors Isolation one update at a time Durability transactions survive crashes
  • 11. 11 Copyright Teradata Integration and Analytics Hadoop’s Biggest Differentiators Capture and ETL Long term archive Cheap, commodity hardware Data Warehouse
  • 12. 12 Copyright Teradata Data Hub Refinery: Parallel ETL Social networks Mobile Web Logs Sensors
  • 13. 13 Copyright Teradata When We’re Too Small for Hadoop ETL Avoid hand coded transforms 2 ETL servers do the job Prefer tool based ETL ETL is working well
  • 14. 14 Copyright Teradata When We Need Massive Data Integration Dozens of ETL servers High velocity real time data 10s-100s of TB/day The risk is worth reward
  • 15. 15 Copyright Teradata When In-database ELT Works Well Reference data look-ups Joins for derived data Lots of derived data Service-level goals to meet
  • 16. 16 Copyright Teradata When to Use Which: It Depends In Database ELT Hadoop Reference data • Lookups • Joins Transformations • Structured data • ELT modules • SQL can do it • Unstructured • Some ETL modules • Do it yourself Service level goals • Predictable • System management Data security • Robust Costs • Commodity hardware Data quality • Governance, MDM • Low quality/trust OK Data volume • High volume • Extreme volume Offload ELT • Migration costs Agility • No governance
  • 18. 18 Copyright Teradata • Commodity low cost hardware • Many programing languages > But mostly it’s Java • Free open source • Any data structure • Scale-out to petabytes + parallelism Hadoop Strengths
  • 19. 19 Copyright Teradata • ETL on steroids • Economically ”keep files forever” > Queryable • File based reporting and analytics • Backup and archival storage > Databases, files, development Hadoop: the Data Hub
  • 20. 20 Copyright Teradata • Temporary data, data exhaust • Data mining/exploration > 1000s of continuous variables > Linear algebra > Graph mining > Machine learning > Random forest, decision trees > Markov chains • Not all data mining  MapReduce > Many things work better in MPP RDBMS > In-database SAS, R, Fuzzy Logix > It depends Where MapReduce Excels
  • 21. 21 Copyright Teradata • Easy to work on non-relational data > Java data types > JSON, objects • Hadoop is written in the Java > Compatible APIs, skills, concepts, frameworks, scripts • Huge open source factories > Apache, GitHub, Eclipse, SourceForge,etc. > Assorted compression algorithms • People > 9M-10M java programmers > Web tutorials – extensive “how to” topics > University student research Developer Advantages with Hadoop
  • 22. 22 Copyright Teradata • Raw data format provides complete flexibility • Non-traditional data types easily supported > Graph, text, weblog, etc. • No upfront ETL required • No data loading required • Flexible: late binding let’s data scientist choose NoSchema Advantages 41521390 2013-01-01 00:25:42 2.111.94.18 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en- us) AppleWebKit/533.19.4 (KHTML, like Gecko) Version/5.0.3 Safari/533.19.4 "http://www.cokstate.edu/welcome/" "https://www.google.com/#sclient=psyab&hl=en&sourc e=hp&q=oklahoma+state&pbx=1&oq” Weblog Note: there are many pitfalls when schema-on-read is not a good solution
  • 23. 23 Copyright Teradata Attributes Favoring Hadoop Reason Description Cost Low cost, low value data before refinement Multi structured data ingest Raw weblogs, Twitter, Facebook, mobile, PST files, etc. Data depth High data volume, few users, high signal-to- noise ratio Non-SQL analytics Complex processes, pipeline transforms, random forests, Markov chains, enormous arrays, etc. Flexibility, autonomy Exploratory analysis with little governance Fast, short-term turn around Ugly data Videos, satellite images, format conversions (PDF to text)
  • 24. 24 Copyright Teradata MPP RDBMS Hadoop Stable schema Evolving schema Structured data Structure agnostic Full ANSI SQL Flexible programming Iterative analysis Batch analysis Fine grain security N/A Cleansed data Raw data Seeks Scans Updates/deletes Ingest Service level agreements Flexibility Core data Source files Complex joins Complex processing Efficient CPU and IO Low cost storage Key Considerations
  • 25. 25 Copyright Teradata • YARN and Tez • Queries on flat files! • Parallel scanning engine • Developer community • Complex parallel processing • Fast ingest of raw data • Long term archives at full fidelity • Good scalability What I Like About Hadoop
  • 26. 26 Copyright Teradata • Start with workload requirements > Map the tool capabilities to the requirement • Hadoop is a DataHub, a Data Lake > Not a database or data warehouse > Exploit Hadoop’s strengths • Combine the data warehouse and Hadoop > Two tool sets solve more objectives > Better together Summary