Extending the Enterprise Data Warehouse with Hadoop                 Robert Lancaster and Jonathan Seidman                 ...
Who We Are•  Robert Lancaster  –  Solutions Architect, Hotel Supply Team  –  rlancaster@orbitz.com  –  @rob1lancaster•  Jo...
Launched: 2001, Chicago, IL                              page 3
Why are we using Hadoop? Stop me if you’ve heard this before…                                        page 4
On Orbitz alone we do millions of searches and transactions daily, which leads to hundreds of gigabytes of log data every ...
Hadoop provides us with efficient, economical, scalable, and reliable storage and processing of these large amounts of dat...
And…Hadoop places no constraints on how data is processed.                                              page 7
Before Hadoop                page 8
With Hadoop              page 9
Access to this non-transactional data enables a number ofapplications…                                                    ...
Optimizing Hotel Search                          page 11
Recommendations                  page 12
Page Performance Tracking                            page 13
Cache Analysis100.00%                        72% of queries are                                                        Que...
User Segmentation                    page 15
All of this is great, but…Most of these efforts are driven by development teams.The challenge now is to unlock the value i...
“Given the ubiquity of data in modern organizations, a datawarehouse can keep pace today only by being “magnetic”:attracti...
In a better world…	                        page 18
Integrating Hadoop with the Enterprise Data Warehouse                   Robert Lancaster and Jonathan Seidman             ...
The goal is a unified view of the data, allowing us to usethe power of our existing tools for reporting and analysis.     ...
BI vendors are working on integration with Hadoop…                                                     page 21
And one more reporting tool…                               page 22
Example Processing Pipeline for Web Analytics Data                                                     page 23
Aggregating data for import into Data Warehouse                                                  page 24
Example Use Case: Beta Data Processing                                     page 25
Example Use Case – Beta Data Processing                                          page 26
Example Use Case – Beta Data Processing Output                                                 page 27
Example Use Case: RCDC Processing                                    page 28
Example Use Case – RCDC Processing                                     page 29
Example Use Case: Click Data Processing                                      page 30
Click Data Processing – Current DW ProcessingWeb                                           DataServer	 Web                ...
Click Data Processing – New Hadoop ProcessingWeb                        DataServer	 Web                       Cleansing  W...
Conclusions•  Market is still immature, but Hadoop has already become a   valuable business intelligence tool, and will be...
Oh, and also…•  Orbitz is looking for a Lead Engineer for the BI/Big Data team.•  Go to http://careers.orbitz.com/ and sea...
References•  MAD Skills: New Analysis Practices for Big Data, Jeffrey   Cohen, Brian Dolan, Mark Dunlap, Joseph Hellerstei...
Upcoming SlideShare
Loading in...5
×

Extending the EDW with Hadoop - Chicago Data Summit 2011

2,103

Published on

Slides from talk at the Chicago Data Summit on 4/26/11: "Extending the Enterprise Data Warehouse with Hadoop".

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,103
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
72
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Extending the EDW with Hadoop - Chicago Data Summit 2011

  1. 1. Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster and Jonathan Seidman Chicago Data Summit April 26 | 2011
  2. 2. Who We Are•  Robert Lancaster –  Solutions Architect, Hotel Supply Team –  rlancaster@orbitz.com –  @rob1lancaster•  Jonathan Seidman –  Lead Engineer, Business Intelligence/Big Data Team –  Co-founder/organizer of Chicago Hadoop User Group (http://www.meetup.com/Chicago-area-Hadoop-User- Group-CHUG) –  jseidman@orbitz.com –  @jseidman page 2
  3. 3. Launched: 2001, Chicago, IL page 3
  4. 4. Why are we using Hadoop? Stop me if you’ve heard this before… page 4
  5. 5. On Orbitz alone we do millions of searches and transactions daily, which leads to hundreds of gigabytes of log data every day. page 5
  6. 6. Hadoop provides us with efficient, economical, scalable, and reliable storage and processing of these large amounts of data. $ per TB page 6
  7. 7. And…Hadoop places no constraints on how data is processed. page 7
  8. 8. Before Hadoop page 8
  9. 9. With Hadoop page 9
  10. 10. Access to this non-transactional data enables a number ofapplications… page 10
  11. 11. Optimizing Hotel Search page 11
  12. 12. Recommendations page 12
  13. 13. Page Performance Tracking page 13
  14. 14. Cache Analysis100.00% 72% of queries are Queries singletons and make up90.00% Searches nearly a third of total search volume.80.00% Reverse Running Total (Searches) 71.67% Reverse Running Total70.00% (Queries)60.00% A small number of queries (3%) make50.00% up more than a third of search volume.40.00% 34.30% 31.87%30.00%20.00%10.00% 2.78% 0.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 page 14
  15. 15. User Segmentation page 15
  16. 16. All of this is great, but…Most of these efforts are driven by development teams.The challenge now is to unlock the value in this data by making it more available to the rest of the organization. page 16
  17. 17. “Given the ubiquity of data in modern organizations, a datawarehouse can keep pace today only by being “magnetic”:attracting all the data sources that crop up within anorganization regardless of data quality niceties.”* *MAD Skills: New Analysis Practices for Big Data page 17
  18. 18. In a better world… page 18
  19. 19. Integrating Hadoop with the Enterprise Data Warehouse Robert Lancaster and Jonathan Seidman Chicago Data Summit April 26 | 2011
  20. 20. The goal is a unified view of the data, allowing us to usethe power of our existing tools for reporting and analysis. page 20
  21. 21. BI vendors are working on integration with Hadoop… page 21
  22. 22. And one more reporting tool… page 22
  23. 23. Example Processing Pipeline for Web Analytics Data page 23
  24. 24. Aggregating data for import into Data Warehouse page 24
  25. 25. Example Use Case: Beta Data Processing page 25
  26. 26. Example Use Case – Beta Data Processing page 26
  27. 27. Example Use Case – Beta Data Processing Output page 27
  28. 28. Example Use Case: RCDC Processing page 28
  29. 29. Example Use Case – RCDC Processing page 29
  30. 30. Example Use Case: Click Data Processing page 30
  31. 31. Click Data Processing – Current DW ProcessingWeb DataServer Web Cleansing Web Server Logs ETL DW (Stored DW Servers procedure) 3 hours 2 hours ~20% original data size page 31
  32. 32. Click Data Processing – New Hadoop ProcessingWeb DataServer Web Cleansing Web Server Logs HDFS (MapReduce) DW Servers page 32
  33. 33. Conclusions•  Market is still immature, but Hadoop has already become a valuable business intelligence tool, and will become an increasingly important part of a BI infrastructure.•  Hadoop won’t replace your EDW, but any organization with a large EDW should at least be exploring Hadoop as a complement to their BI infrastructure.•  Use Hadoop to offload the time and resource intensive processing of large data sets so you can free up your data warehouse to serve user needs.•  The challenge now is making Hadoop more accessible to non- developers. Vendors are addressing this, so expect rapid advancements in Hadoop accessibility. page 33
  34. 34. Oh, and also…•  Orbitz is looking for a Lead Engineer for the BI/Big Data team.•  Go to http://careers.orbitz.com/ and search for IRC19035. page 34
  35. 35. References•  MAD Skills: New Analysis Practices for Big Data, Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph Hellerstein, and Caleb Welton, 2009 page 35
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×