Your SlideShare is downloading. ×
0
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Data Warehouse Offload
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Warehouse Offload

689

Published on

Presented at BigData.SG, October 2013

Presented at BigData.SG, October 2013

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
689
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
47
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • ----- Meeting Notes (3/22/13 11:57) -----Add a before and afterbroader data sources…. data
  • Transcript

    • 1. 1©MapR Technologies - Confidential Data Warehouse Offload (ETL and ELT and Preprocessing, Oh My!)
    • 2. 2©MapR Technologies - Confidential Introduce Myself John Berns, Solutions Architect, APAC for MapR I’ve been involed in Big Data for three years, using Hadoop for two. (I go waaaaay back!) I’m also co-founder of BigData.SG and Hadoop.SG  http://bigdata.sg  http://hadoop.sg I’m a Hadoop nerd—and proud of it.
    • 3. 3©MapR Technologies - Confidential Traditional Data Warehouse
    • 4. 4©MapR Technologies - Confidential Arrival of Big Data impacts DW BIG DATA Volume Variety Velocity Prohibitively expensive storage costs Inability to process unstructured formats Faster arrival and processing needs DW needs to accommodate Big Data
    • 5. 5©MapR Technologies - Confidential Scaling the Data Warehouse-MPP Databases
    • 6. 6©MapR Technologies - Confidential But There Are Some Problems Scaling  Cost – Data Warehouse costs $$$,000’s per terabyte  Works only on relational data; doesn’t like unstructured data  Fixed schema—you can only query the data in ways that are predefined by the existing schema
    • 7. 7©MapR Technologies - Confidential Accommodating Big Data RDBMS Sensor Data Web Logs Hadoop RDBMS • Only structured data • $50K – 100K per TB • Limited Analytics Both structured and unstructured data 50x-100x cost savings: $1K per TB Expanded analytics with MapReduce, NoSQL etc. FROM TO DW DW ETL + Long Term Storage Query + Present Hadoop ETL + Long Term Storage
    • 8. 8©MapR Technologies - Confidential Data Warehouse Meets Big Data  Use ELT to handle semi-structured (or even unstructured) data  ELT applies structure after the data is loaded  Use compute power to do the transformation  Can be done in parallel—that’s what Hadoop is good for!  ELT for ETL – process semi-structured data & save structured data  Connect via ODBC or JDBC and execute queries on the fly
    • 9. 9©MapR Technologies - Confidential ELT: Applying Schema on Load CREATE TABLE apachelog ( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^]*) ([^]*) ([^]*) (-|[^]*]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s" ) STORED AS TEXTFILE;
    • 10. 10©MapR Technologies - Confidential Read Semi-Structured Data & CreateStructure 127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 host 127.0.0.1 identity 1001 user frank time 10/Oct/2000:13:55:36 -0700 request GET /apache_pb.gif HTTP/1.0 status 200 size 2326
    • 11. 11©MapR Technologies - Confidential Accommodating Big Data RDBMS Sensor Data Web Logs Hadoop RDBMS • Only structured data • $50K – 100K per TB • Limited Analytics Both structured and unstructured data 50x-100x cost savings: $1K per TB Expanded analytics with MapReduce, NoSQL etc. FROM TO DW DW ETL + Long Term Storage Query + Present Hadoop ETL + Long Term Storage
    • 12. 12©MapR Technologies - Confidential MapR Strengths for DW Offload Best ROI • 2x Performance • No custom connectors • Unlimited scale Easiest Integration • Works with existing tools • Streaming ingestion and extraction Enterprise Grade Platform • 99.999% HA • Full data protection • Disaster recovery
    • 13. 13©MapR Technologies - Confidential MapR Customer Case Study Teradata Teradata OLD NEW • All ETL steps done in Teradata • Cost prohibitive scaling • Data warehouse team not able to handle new data formats • Replaced 5 out of 7 ETL steps • Only hot data is stored in EDW • Existing applications not affected • Extensively leverage NFS to directly ingest data into Teradata Large Telecom Company Deployed Billing applications using Teradata Hundreds of users and applications across the enterprise Hadoop
    • 14. 14©MapR Technologies - Confidential  Lots of Data  Lots of Scans Across Large Sets  Throughput Important Data ShapeTelecom
    • 15. 15©MapR Technologies - Confidential ETL CDR billing records Billing reports Data Warehouse Customer bills Original Flow – ELTL
    • 16. 16©MapR Technologies - Confidential ETL CDR billing records Billing reports Data Warehouse Customer billing With ETL Offload
    • 17. 17©MapR Technologies - Confidential Price Performance  EDW strategy –1.5x performance –$30 million  MapR Strategy –3x performance –$3 million  20x cost/performance advantage for MapR strategy
    • 18. 18©MapR Technologies - Confidential Business Impact:  Saved $30M in 5 year TCO  Able to store all data and have a scalable architecture for future  Do not have to maintain any special connectors  A happy Ops team enhancing services for its internal customers with MapReduce  Implemented the change without impacting internal users MapR Customer Case Study continued
    • 19. 19©MapR Technologies - Confidential Wrapping It Up… My contact info: jberns@maprtech.com http://www.linkedin.com/in/jfxberns Find the slides at: http://www.slideshare.net Whitepaper with mode details on Data Warehouse Offload: http://www.mapr.com/solutions/data-warehouse-offload

    ×