Extending the Enterprise Data Warehouse with Hadoop                 Robert Lancaster and Jonathan Seidman                 ...
Who We Are•  Robert Lancaster  –  Solutions Architect, Hotel Supply Team  –  rlancaster@orbitz.com  –  @rob1lancaster  –  ...
Launched in 2001                   Over 160 million                   bookings                                      page 3
Some History…                page 4
In 2009…•  The Machine Learning team is formed to improve site   performance. For example, improving hotel search results....
The Problem…•  The only archive of the required data went back about two   weeks.       Non-transactional Data       Trans...
Hadoop Provided a Solution…          Detailed non-       transactional data     (what every user sees,           clicks, e...
Deploying Hadoop Enabled Multiple Applications…100.00%                                                              Querie...
And Useful Analyses…                       page 9
But Brought New Challenges…•  Most of these efforts are driven by development teams.•  The challenge now is unlocking the ...
In Early 2011…•  Big Data team is formed under Business Intelligence team at   Orbitz Worldwide.•  Allows the Big Data tea...
A View Shared Beyond Orbitz…“We strongly believe that Hadoop is the nucleus of the next-generation cloud EDW…” “…but that ...
Two Primary Ways We Use Hadoop to Complement the EDW•  Extraction and transformation of data for loading into the data   w...
ETL Example: Proposed Dimensional Model      Raw logs          Hadoop            Dimensional                              ...
ETL Example: Click Data ProcessingWeb                                               DataServer	 Web                       ...
ETL Example: Click Data Processing•  Moving to Hadoop will facilitate:   –  Remove load from the data warehouse.   –  Addi...
Analysis Example: Geo-Targeting Ads•  Facilitated analysis that allows for more personalized ad   content.•  Allowed marke...
BI Vendors Are Working on Hadoop IntegrationBoth big (relatively)…                                               page 18
And small…             page 19
Example Processing Pipeline for Web Analytics Data                                                     page 20
Example Use Case: Selection Errors                                     page 21
Use Case – Selection Errors: Introduction                                 •  Multiple points of entry.                    ...
Use Case – Selection Errors: Processing                                          page 23
Use Case – Selection Errors: Visualization                                             page 24
Example Use Case: Beta Data                              page 25
Use Case – Beta Data: Introduction                              •  Hotel Sort Optimization                              • ...
Use Case – Beta Data Processing                                  page 27
Use Case – Beta Data: Visualization                                      page 28
Example Use Case: RCDC                         page 29
Use Case – RCDC: Introduction•  Understand and improve cache behavior.•  Improve “coverage”  –  Traditionally search 1 pag...
Use Case – RCDC: Processing                              page 31
Use Case – RCDC: Visualization                                 page 32
Conclusions•  Hadoop market is still immature, but growing quickly. Better   tools are on the way.   –  Look beyond the us...
Conclusions•  Work closely with your existing data management teams.   –  Your idea of what constitutes big data might qui...
Upcoming SlideShare
Loading in...5
×

Extending the Data Warehouse with Hadoop - Hadoop world 2011

1,627

Published on

Hadoop provides the ability to extract business intelligence from extremely large, heterogeneous data sets that were previously impractical to store and process in traditional data warehouses. The challenge now is in bridging the gap between the data warehouse and Hadoop. In this talk we’ll discuss some steps that Orbitz has taken to bridge this gap, including examples of how Hadoop and Hive are used to aggregate data from large data sets, and how that data can be combined with relational data to create new reports that provide actionable intelligence to business users.

Published in: Technology, Business
1 Comment
0 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/Hadoop.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
1,627
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
65
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Extending the Data Warehouse with Hadoop - Hadoop world 2011

  1. 1. Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster and Jonathan Seidman Hadoop World 2011 November 8 | 2011
  2. 2. Who We Are•  Robert Lancaster –  Solutions Architect, Hotel Supply Team –  rlancaster@orbitz.com –  @rob1lancaster –  Co-organizer of Chicago Big Data and Chicago Machine Learning Study Group.•  Jonathan Seidman –  Lead Engineer, Business Intelligence/Big Data Team –  Co-founder/organizer of Chicago Hadoop User Group and Chicago Big Data –  jseidman@orbitz.com –  @jseidman page 2
  3. 3. Launched in 2001 Over 160 million bookings page 3
  4. 4. Some History… page 4
  5. 5. In 2009…•  The Machine Learning team is formed to improve site performance. For example, improving hotel search results.•  This required access to large volumes of behavioral data for analysis. –  Fortunately, the required data was collected in session data stored in web analytics logs. page 5
  6. 6. The Problem…•  The only archive of the required data went back about two weeks. Non-transactional Data Transactional data (e.g. searches) (e.g. bookings) and aggregated Non- transactional data Data Warehouse page 6
  7. 7. Hadoop Provided a Solution… Detailed non- transactional data (what every user sees, clicks, etc.) Transactional data (e.g. bookings) and aggregated Non- transactional data Data Warehouse Hadoop page 7
  8. 8. Deploying Hadoop Enabled Multiple Applications…100.00% Queries90.00%80.00% Searches 71.67%70.00%60.00%50.00%40.00% 34.30% 31.87%30.00%20.00%10.00% 2.78% 0.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 page 8
  9. 9. And Useful Analyses… page 9
  10. 10. But Brought New Challenges…•  Most of these efforts are driven by development teams.•  The challenge now is unlocking the value of this data for non- technical users. page 10
  11. 11. In Early 2011…•  Big Data team is formed under Business Intelligence team at Orbitz Worldwide.•  Allows the Big Data team to work more closely with the data warehouse and BI teams.•  Reflects the importance of big data to the future of the company. page 11
  12. 12. A View Shared Beyond Orbitz…“We strongly believe that Hadoop is the nucleus of the next-generation cloud EDW…” “…but that promise is still three to five years from fruition.”* *James Kobielus, Forrester Research, “Hadoop, Is It Soup Yet?” page 12
  13. 13. Two Primary Ways We Use Hadoop to Complement the EDW•  Extraction and transformation of data for loading into the data warehouse – “ETL”.•  Off-loading of analysis from the data warehouse. page 13
  14. 14. ETL Example: Proposed Dimensional Model Raw logs Hadoop Dimensional model page 14
  15. 15. ETL Example: Click Data ProcessingWeb DataServer Web Cleansing Web Server Logs ETL DW (Stored DW Servers procedure) Several hours of processing ~20% original data size Current Processing in Data Warehouse page 15
  16. 16. ETL Example: Click Data Processing•  Moving to Hadoop will facilitate: –  Remove load from the data warehouse. –  Adding additional attributes for processing. –  Allow processing to be run more frequently. Web Data Server Web Cleansing Web Server Logs Hadoop (MapReduce) DW Servers Proposed Processing in Hadoop page 16
  17. 17. Analysis Example: Geo-Targeting Ads•  Facilitated analysis that allows for more personalized ad content.•  Allowed marketing team to analyze over a years worth of search data.•  Provided analysis that was difficult to perform in the data warehouse. page 17
  18. 18. BI Vendors Are Working on Hadoop IntegrationBoth big (relatively)… page 18
  19. 19. And small… page 19
  20. 20. Example Processing Pipeline for Web Analytics Data page 20
  21. 21. Example Use Case: Selection Errors page 21
  22. 22. Use Case – Selection Errors: Introduction •  Multiple points of entry. •  Multiple paths through site. •  Goal: tie events together to form picture of customer behavior. page 22
  23. 23. Use Case – Selection Errors: Processing page 23
  24. 24. Use Case – Selection Errors: Visualization page 24
  25. 25. Example Use Case: Beta Data page 25
  26. 26. Use Case – Beta Data: Introduction •  Hotel Sort Optimization •  Compare A vs. B •  Web Analytics Data –  What user saw. –  How user behaved •  Server Log Data –  Sorting behavior used. page 26
  27. 27. Use Case – Beta Data Processing page 27
  28. 28. Use Case – Beta Data: Visualization page 28
  29. 29. Example Use Case: RCDC page 29
  30. 30. Use Case – RCDC: Introduction•  Understand and improve cache behavior.•  Improve “coverage” –  Traditionally search 1 page of hotels at a time. –  Get “just enough” information to present to consumers. –  Increase amount of availability information we have when consumer performs a search.•  Data needed to support needs beyond reporting. page 30
  31. 31. Use Case – RCDC: Processing page 31
  32. 32. Use Case – RCDC: Visualization page 32
  33. 33. Conclusions•  Hadoop market is still immature, but growing quickly. Better tools are on the way. –  Look beyond the usual (enterprise) suspects. Many of the most interesting companies in the big data space are small startups.•  Hadoop will not replace your data warehouse, but any organization with a large data warehouse should at least be exploring Hadoop as a complement to their BI infrastructure. page 33
  34. 34. Conclusions•  Work closely with your existing data management teams. –  Your idea of what constitutes big data might quickly diverge from theirs.•  The flip-side to this is that Hadoop can be an excellent tool to off-load resource-consuming jobs from your data warehouse. page 34
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×