Submit Search
Upload
Data Warehouse Offload
•
Download as PPTX, PDF
•
2 likes
•
1,436 views
John Berns
Follow
Presented at BigData.SG, October 2013
Read less
Read more
Technology
Business
Report
Share
Report
Share
1 of 19
Download now
Recommended
Webinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift Spectrum
Matillion
Getting Started With Amazon Redshift
Getting Started With Amazon Redshift
Matillion
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Databricks
Never late again! Job-Level deadline SLOs in YARN
Never late again! Job-Level deadline SLOs in YARN
DataWorks Summit
Hadoop Everywhere
Hadoop Everywhere
DataWorks Summit/Hadoop Summit
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
Clinical Suspecting at Scale Using PySpark
Clinical Suspecting at Scale Using PySpark
Databricks
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
VMware Tanzu
Recommended
Webinar | Getting Started With Amazon Redshift Spectrum
Webinar | Getting Started With Amazon Redshift Spectrum
Matillion
Getting Started With Amazon Redshift
Getting Started With Amazon Redshift
Matillion
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large D...
Databricks
Never late again! Job-Level deadline SLOs in YARN
Never late again! Job-Level deadline SLOs in YARN
DataWorks Summit
Hadoop Everywhere
Hadoop Everywhere
DataWorks Summit/Hadoop Summit
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
Clinical Suspecting at Scale Using PySpark
Clinical Suspecting at Scale Using PySpark
Databricks
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
VMware Tanzu
Serverless data pipelines gcp
Serverless data pipelines gcp
Catherine Kimani
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Databricks
ETL Practices for Better or Worse
ETL Practices for Better or Worse
Eric Sun
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
Databricks
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Spark Summit
Introduction to Hive for Hadoop
Introduction to Hive for Hadoop
ryanlecompte
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Databricks
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
DataWorks Summit
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
DataWorks Summit/Hadoop Summit
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Spark Summit
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
skahler
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
VMware Tanzu
Build Your Own Data Beast : Greenplum + Dell
Build Your Own Data Beast : Greenplum + Dell
skahler
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
Presto: SQL-on-anything
Presto: SQL-on-anything
DataWorks Summit
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Alluxio, Inc.
EMR AWS Demo
EMR AWS Demo
Rim Moussa
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Edelweiss Kammermann
ETL: Logging y auditoría en SSIS
ETL: Logging y auditoría en SSIS
SolidQ
More Related Content
What's hot
Serverless data pipelines gcp
Serverless data pipelines gcp
Catherine Kimani
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Databricks
ETL Practices for Better or Worse
ETL Practices for Better or Worse
Eric Sun
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Databricks
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
Databricks
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Spark Summit
Introduction to Hive for Hadoop
Introduction to Hive for Hadoop
ryanlecompte
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Databricks
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
DataWorks Summit
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
DataWorks Summit/Hadoop Summit
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Spark Summit
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
skahler
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
VMware Tanzu
Build Your Own Data Beast : Greenplum + Dell
Build Your Own Data Beast : Greenplum + Dell
skahler
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
Presto: SQL-on-anything
Presto: SQL-on-anything
DataWorks Summit
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Alluxio, Inc.
EMR AWS Demo
EMR AWS Demo
Rim Moussa
What's hot
(20)
Serverless data pipelines gcp
Serverless data pipelines gcp
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's To...
ETL Practices for Better or Worse
ETL Practices for Better or Worse
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Introduction to Hive for Hadoop
Introduction to Hive for Hadoop
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on...
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
Big Data Heterogeneous Mixture Learning on Spark
Big Data Heterogeneous Mixture Learning on Spark
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Get Results, Build Your Own Big Data Beast : Greenplum + Dell
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Agile Data Science on Greenplum Using Airflow - Greenplum Summit 2019
Build Your Own Data Beast : Greenplum + Dell
Build Your Own Data Beast : Greenplum + Dell
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Presto: SQL-on-anything
Presto: SQL-on-anything
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
EMR AWS Demo
EMR AWS Demo
Viewers also liked
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Edelweiss Kammermann
ETL: Logging y auditoría en SSIS
ETL: Logging y auditoría en SSIS
SolidQ
SolidQ SSIS Framework
SolidQ SSIS Framework
SolidQ
Webinar: Oracle Data Integrator 12c (25-02-2015)
Webinar: Oracle Data Integrator 12c (25-02-2015)
avanttic Consultoría Tecnológica
1. limpieza y transformación de datos
1. limpieza y transformación de datos
Miguel Murillo
Management in Informatica Power Center
Management in Informatica Power Center
Edureka!
Principios de diseño para procesos de ETL
Principios de diseño para procesos de ETL
SpanishPASSVC
Designing and implementing_an_etl_framework
Designing and implementing_an_etl_framework
Bharat Vadlamudi
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
Big Data Architecture
Big Data Architecture
Guido Schmutz
Viewers also liked
(10)
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
Integración de Oracle Data Integrator con Oracle GoldenGate 12c
ETL: Logging y auditoría en SSIS
ETL: Logging y auditoría en SSIS
SolidQ SSIS Framework
SolidQ SSIS Framework
Webinar: Oracle Data Integrator 12c (25-02-2015)
Webinar: Oracle Data Integrator 12c (25-02-2015)
1. limpieza y transformación de datos
1. limpieza y transformación de datos
Management in Informatica Power Center
Management in Informatica Power Center
Principios de diseño para procesos de ETL
Principios de diseño para procesos de ETL
Designing and implementing_an_etl_framework
Designing and implementing_an_etl_framework
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
Big Data Architecture
Big Data Architecture
Similar to Data Warehouse Offload
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
MapR Technologies
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
Yousun Jeong
Big data at United Airlines
Big data at United Airlines
DataWorks Summit
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
DataWorks Summit
EMC Isilon Database Converged deck
EMC Isilon Database Converged deck
KeithETD_CTO
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
Edgar Alejandro Villegas
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
avanttic Consultoría Tecnológica
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
Yousun Jeong
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IT Strategy Group
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
DataStax
SPL_ALL_EN.pptx
SPL_ALL_EN.pptx
政宏 张
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seeling Cheung
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
Inside Analysis
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
avanttic Consultoría Tecnológica
Greenplum feature
Greenplum feature
Ahmad Yani Emrizal
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Inside Analysis
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
Dancing with the Elephant
Dancing with the Elephant
DataWorks Summit
Similar to Data Warehouse Offload
(20)
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Using Hadoop to Offload Data Warehouse Processing and More - Brad Anserson
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
Big data at United Airlines
Big data at United Airlines
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
EMC Isilon Database Converged deck
EMC Isilon Database Converged deck
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
SPL_ALL_EN.pptx
SPL_ALL_EN.pptx
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
Greenplum feature
Greenplum feature
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
Dancing with the Elephant
Dancing with the Elephant
Recently uploaded
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
NavinnSomaal
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
carlostorres15106
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Mark Billinghurst
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Lorenzo Miniero
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
hariprasad279825
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
charlottematthew16
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Pixlogix Infotech
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Wonjun Hwang
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Scott Keck-Warren
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
charlottematthew16
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April Automation LPDG
MarianaLemus7
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Commit University
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Mark Simos
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
2toLead Limited
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Mattias Andersson
Recently uploaded
(20)
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April Automation LPDG
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Data Warehouse Offload
1.
1©MapR Technologies -
Confidential Data Warehouse Offload (ETL and ELT and Preprocessing, Oh My!)
2.
2©MapR Technologies -
Confidential Introduce Myself John Berns, Solutions Architect, APAC for MapR I’ve been involed in Big Data for three years, using Hadoop for two. (I go waaaaay back!) I’m also co-founder of BigData.SG and Hadoop.SG http://bigdata.sg http://hadoop.sg I’m a Hadoop nerd—and proud of it.
3.
3©MapR Technologies -
Confidential Traditional Data Warehouse
4.
4©MapR Technologies -
Confidential Arrival of Big Data impacts DW BIG DATA Volume Variety Velocity Prohibitively expensive storage costs Inability to process unstructured formats Faster arrival and processing needs DW needs to accommodate Big Data
5.
5©MapR Technologies -
Confidential Scaling the Data Warehouse-MPP Databases
6.
6©MapR Technologies -
Confidential But There Are Some Problems Scaling Cost – Data Warehouse costs $$$,000’s per terabyte Works only on relational data; doesn’t like unstructured data Fixed schema—you can only query the data in ways that are predefined by the existing schema
7.
7©MapR Technologies -
Confidential Accommodating Big Data RDBMS Sensor Data Web Logs Hadoop RDBMS • Only structured data • $50K – 100K per TB • Limited Analytics Both structured and unstructured data 50x-100x cost savings: $1K per TB Expanded analytics with MapReduce, NoSQL etc. FROM TO DW DW ETL + Long Term Storage Query + Present Hadoop ETL + Long Term Storage
8.
8©MapR Technologies -
Confidential Data Warehouse Meets Big Data Use ELT to handle semi-structured (or even unstructured) data ELT applies structure after the data is loaded Use compute power to do the transformation Can be done in parallel—that’s what Hadoop is good for! ELT for ETL – process semi-structured data & save structured data Connect via ODBC or JDBC and execute queries on the fly
9.
9©MapR Technologies -
Confidential ELT: Applying Schema on Load CREATE TABLE apachelog ( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^]*) ([^]*) ([^]*) (-|[^]*]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s" ) STORED AS TEXTFILE;
10.
10©MapR Technologies -
Confidential Read Semi-Structured Data & CreateStructure 127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 host 127.0.0.1 identity 1001 user frank time 10/Oct/2000:13:55:36 -0700 request GET /apache_pb.gif HTTP/1.0 status 200 size 2326
11.
11©MapR Technologies -
Confidential Accommodating Big Data RDBMS Sensor Data Web Logs Hadoop RDBMS • Only structured data • $50K – 100K per TB • Limited Analytics Both structured and unstructured data 50x-100x cost savings: $1K per TB Expanded analytics with MapReduce, NoSQL etc. FROM TO DW DW ETL + Long Term Storage Query + Present Hadoop ETL + Long Term Storage
12.
12©MapR Technologies -
Confidential MapR Strengths for DW Offload Best ROI • 2x Performance • No custom connectors • Unlimited scale Easiest Integration • Works with existing tools • Streaming ingestion and extraction Enterprise Grade Platform • 99.999% HA • Full data protection • Disaster recovery
13.
13©MapR Technologies -
Confidential MapR Customer Case Study Teradata Teradata OLD NEW • All ETL steps done in Teradata • Cost prohibitive scaling • Data warehouse team not able to handle new data formats • Replaced 5 out of 7 ETL steps • Only hot data is stored in EDW • Existing applications not affected • Extensively leverage NFS to directly ingest data into Teradata Large Telecom Company Deployed Billing applications using Teradata Hundreds of users and applications across the enterprise Hadoop
14.
14©MapR Technologies -
Confidential Lots of Data Lots of Scans Across Large Sets Throughput Important Data ShapeTelecom
15.
15©MapR Technologies -
Confidential ETL CDR billing records Billing reports Data Warehouse Customer bills Original Flow – ELTL
16.
16©MapR Technologies -
Confidential ETL CDR billing records Billing reports Data Warehouse Customer billing With ETL Offload
17.
17©MapR Technologies -
Confidential Price Performance EDW strategy –1.5x performance –$30 million MapR Strategy –3x performance –$3 million 20x cost/performance advantage for MapR strategy
18.
18©MapR Technologies -
Confidential Business Impact: Saved $30M in 5 year TCO Able to store all data and have a scalable architecture for future Do not have to maintain any special connectors A happy Ops team enhancing services for its internal customers with MapReduce Implemented the change without impacting internal users MapR Customer Case Study continued
19.
19©MapR Technologies -
Confidential Wrapping It Up… My contact info: jberns@maprtech.com http://www.linkedin.com/in/jfxberns Find the slides at: http://www.slideshare.net Whitepaper with mode details on Data Warehouse Offload: http://www.mapr.com/solutions/data-warehouse-offload
Editor's Notes
----- Meeting Notes (3/22/13 11:57) -----Add a before and afterbroader data sources…. data
Download now