1©MapR Technologies - Confidential
Data Warehouse Offload
(ETL and ELT and Preprocessing, Oh My!)
2©MapR Technologies - Confidential
Introduce Myself
John Berns, Solutions Architect, APAC for MapR
I’ve been involed in Bi...
3©MapR Technologies - Confidential
Traditional Data Warehouse
4©MapR Technologies - Confidential
Arrival of Big Data impacts DW
BIG
DATA
Volume
Variety
Velocity
Prohibitively expensive...
5©MapR Technologies - Confidential
Scaling the Data Warehouse-MPP Databases
6©MapR Technologies - Confidential
But There Are Some Problems Scaling
 Cost – Data Warehouse costs $$$,000’s per terabyt...
7©MapR Technologies - Confidential
Accommodating Big Data
RDBMS
Sensor Data
Web Logs
Hadoop
RDBMS
• Only structured data
•...
8©MapR Technologies - Confidential
Data Warehouse Meets Big Data
 Use ELT to handle semi-structured (or even unstructured...
9©MapR Technologies - Confidential
ELT: Applying Schema on Load
CREATE TABLE apachelog (
host STRING,
identity STRING,
use...
10©MapR Technologies - Confidential
Read Semi-Structured Data & CreateStructure
127.0.0.1 user-identifier frank [10/Oct/20...
11©MapR Technologies - Confidential
Accommodating Big Data
RDBMS
Sensor Data
Web Logs
Hadoop
RDBMS
• Only structured data
...
12©MapR Technologies - Confidential
MapR Strengths for DW Offload
Best ROI
• 2x Performance
• No custom connectors
• Unlim...
13©MapR Technologies - Confidential
MapR Customer Case Study
Teradata Teradata
OLD NEW
• All ETL steps done in Teradata
• ...
14©MapR Technologies - Confidential
 Lots of Data
 Lots of Scans Across Large Sets
 Throughput Important
Data ShapeTele...
15©MapR Technologies - Confidential
ETL
CDR billing
records
Billing
reports
Data Warehouse
Customer
bills
Original Flow – ...
16©MapR Technologies - Confidential
ETL
CDR billing
records
Billing
reports
Data Warehouse
Customer
billing
With ETL Offlo...
17©MapR Technologies - Confidential
Price Performance
 EDW strategy
–1.5x performance
–$30 million
 MapR Strategy
–3x pe...
18©MapR Technologies - Confidential
Business Impact:
 Saved $30M in 5 year TCO
 Able to store all data and have a scalab...
19©MapR Technologies - Confidential
Wrapping It Up…
My contact info:
jberns@maprtech.com
http://www.linkedin.com/in/jfxber...
Upcoming SlideShare
Loading in …5
×

Data Warehouse Offload

1,084 views
929 views

Published on

Presented at BigData.SG, October 2013

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,084
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
48
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • ----- Meeting Notes (3/22/13 11:57) -----Add a before and afterbroader data sources…. data
  • Data Warehouse Offload

    1. 1. 1©MapR Technologies - Confidential Data Warehouse Offload (ETL and ELT and Preprocessing, Oh My!)
    2. 2. 2©MapR Technologies - Confidential Introduce Myself John Berns, Solutions Architect, APAC for MapR I’ve been involed in Big Data for three years, using Hadoop for two. (I go waaaaay back!) I’m also co-founder of BigData.SG and Hadoop.SG  http://bigdata.sg  http://hadoop.sg I’m a Hadoop nerd—and proud of it.
    3. 3. 3©MapR Technologies - Confidential Traditional Data Warehouse
    4. 4. 4©MapR Technologies - Confidential Arrival of Big Data impacts DW BIG DATA Volume Variety Velocity Prohibitively expensive storage costs Inability to process unstructured formats Faster arrival and processing needs DW needs to accommodate Big Data
    5. 5. 5©MapR Technologies - Confidential Scaling the Data Warehouse-MPP Databases
    6. 6. 6©MapR Technologies - Confidential But There Are Some Problems Scaling  Cost – Data Warehouse costs $$$,000’s per terabyte  Works only on relational data; doesn’t like unstructured data  Fixed schema—you can only query the data in ways that are predefined by the existing schema
    7. 7. 7©MapR Technologies - Confidential Accommodating Big Data RDBMS Sensor Data Web Logs Hadoop RDBMS • Only structured data • $50K – 100K per TB • Limited Analytics Both structured and unstructured data 50x-100x cost savings: $1K per TB Expanded analytics with MapReduce, NoSQL etc. FROM TO DW DW ETL + Long Term Storage Query + Present Hadoop ETL + Long Term Storage
    8. 8. 8©MapR Technologies - Confidential Data Warehouse Meets Big Data  Use ELT to handle semi-structured (or even unstructured) data  ELT applies structure after the data is loaded  Use compute power to do the transformation  Can be done in parallel—that’s what Hadoop is good for!  ELT for ETL – process semi-structured data & save structured data  Connect via ODBC or JDBC and execute queries on the fly
    9. 9. 9©MapR Technologies - Confidential ELT: Applying Schema on Load CREATE TABLE apachelog ( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^]*) ([^]*) ([^]*) (-|[^]*]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s" ) STORED AS TEXTFILE;
    10. 10. 10©MapR Technologies - Confidential Read Semi-Structured Data & CreateStructure 127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 host 127.0.0.1 identity 1001 user frank time 10/Oct/2000:13:55:36 -0700 request GET /apache_pb.gif HTTP/1.0 status 200 size 2326
    11. 11. 11©MapR Technologies - Confidential Accommodating Big Data RDBMS Sensor Data Web Logs Hadoop RDBMS • Only structured data • $50K – 100K per TB • Limited Analytics Both structured and unstructured data 50x-100x cost savings: $1K per TB Expanded analytics with MapReduce, NoSQL etc. FROM TO DW DW ETL + Long Term Storage Query + Present Hadoop ETL + Long Term Storage
    12. 12. 12©MapR Technologies - Confidential MapR Strengths for DW Offload Best ROI • 2x Performance • No custom connectors • Unlimited scale Easiest Integration • Works with existing tools • Streaming ingestion and extraction Enterprise Grade Platform • 99.999% HA • Full data protection • Disaster recovery
    13. 13. 13©MapR Technologies - Confidential MapR Customer Case Study Teradata Teradata OLD NEW • All ETL steps done in Teradata • Cost prohibitive scaling • Data warehouse team not able to handle new data formats • Replaced 5 out of 7 ETL steps • Only hot data is stored in EDW • Existing applications not affected • Extensively leverage NFS to directly ingest data into Teradata Large Telecom Company Deployed Billing applications using Teradata Hundreds of users and applications across the enterprise Hadoop
    14. 14. 14©MapR Technologies - Confidential  Lots of Data  Lots of Scans Across Large Sets  Throughput Important Data ShapeTelecom
    15. 15. 15©MapR Technologies - Confidential ETL CDR billing records Billing reports Data Warehouse Customer bills Original Flow – ELTL
    16. 16. 16©MapR Technologies - Confidential ETL CDR billing records Billing reports Data Warehouse Customer billing With ETL Offload
    17. 17. 17©MapR Technologies - Confidential Price Performance  EDW strategy –1.5x performance –$30 million  MapR Strategy –3x performance –$3 million  20x cost/performance advantage for MapR strategy
    18. 18. 18©MapR Technologies - Confidential Business Impact:  Saved $30M in 5 year TCO  Able to store all data and have a scalable architecture for future  Do not have to maintain any special connectors  A happy Ops team enhancing services for its internal customers with MapReduce  Implemented the change without impacting internal users MapR Customer Case Study continued
    19. 19. 19©MapR Technologies - Confidential Wrapping It Up… My contact info: jberns@maprtech.com http://www.linkedin.com/in/jfxberns Find the slides at: http://www.slideshare.net Whitepaper with mode details on Data Warehouse Offload: http://www.mapr.com/solutions/data-warehouse-offload

    ×