Big Data redefines Enterprise Data Warehouse @Bangalore

716 views

Published on

Big Data to redefine traditional Data warehouse.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
716
On SlideShare
0
From Embeds
0
Number of Embeds
47
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big Data redefines Enterprise Data Warehouse @Bangalore

  1. 1. Big Data redefines Enterprise Data WarehouseBig Data Innovation, Unicom - Bangalore February 2013 Raghu Kashyap
  2. 2. About Raghu KashyapPersonal■ Director – Data Insights Group @ Orbitz Worldwide■ eMail: raghu.kashyap@orbitz.com■ Twitter: @ragskashyap■ Blog: http://kashyaps.com■ LinkedIn: http://www.linkedin.com/in/raghukashyap/ Areas of Responsibility ■ Orbitz Services Bangalore Center Head ■ Lead Big Data team that builds out Global Data Infrastructure for Orbitz Worldwide and provides business insights. ■ US, Europe, Australia(APAC) page 2
  3. 3. Orbitz in a nutshellpage 3
  4. 4. Orbitz Worldwidepage 4
  5. 5. Back to the future.page 5
  6. 6. Vendor evaluation • KARMAsphere • Datameer • Aster Datapage 6
  7. 7. Traditional Data warehouse Greenplum Raw logs ETL Staging table Temp tables ETL Data Martpage 7
  8. 8. Hadoop Infrastructurepage 8
  9. 9. Redefine Enterprise Data warehouse ETL only approach 2:12 seconds Run map reduce job 1m 14.298s Port flat file to Greenplum using GP connector Time: 5.077 spage 9
  10. 10. Approach with Hadoop and ETL Raw Greenplum logs Event Model Map Reduce ETL Flat files GP Connector External Tablespage 10
  11. 11. Resolving database keys tag_value_dim Greenplum id tag value tag_value_dim 1 pos ORB id tag value 2 pos ORBC 200 pos ORB 3 pos ORB 157 pos ORBC ETL fact fact id tag value id fact id Tag value id value value 200 $ 5600 1 $ 5600 200 $ 7500 3 $ 7500page 11
  12. 12. Hadoop Configuration • 74 Nodes • >1PB • Hive • Flume • HBase •R • Cloudera Distribution • Greenplum Connectorpage 12
  13. 13. Hadoop Applications Site Analytics Machine Learning Multi Variate Testing Analysis Production Logs Hotel Rate Cache TTLpage 13
  14. 14. Hadoop Usagepage 14
  15. 15. Business Performance Monitoring • EFX • Marketing channels • Shopper patterns • Recommendation Modulepage 15
  16. 16. Multi channel attributionpage 16
  17. 17. MVT Analyze behavioral and Test data from our MVT testingpage 17
  18. 18. Lessons Learnt  Analytics using Big Data comes with a price.  Data Governance  Senior Leadership buy in  I cant tell you the key to success, but the key to failure is trying to please everyone." -Ed Sheeranpage 18
  19. 19. Thank youpage 19

×