Big Data Analytics from a Practitioners View


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • A website is just like a store You have millions of people visiting you every day and shopping on your site 3.So where does Analytics fit here? 4. Web Analytics is the invisible shopper who goes around the store and watches everyone's behavior.5. Web Analytics takes the behavioral attributes and helps business with insights! 6. Know the travel details such as how many travelers, what kind of travelers, any preferred carrier or hotels? 7. Understand the shopping patterns. Does he want to shop only on weekends or else only on Thursdays. 8. Focus on Visit Patterns. How many times does he come to the site before he buys anything 9. Learn the page navigation. I.e does he see 100 pages every time he comes or does he know exactly what to look at
  • Big Old Elephant:
  • 1. Last year at EMetrics I had interesting tweet exchanges with few folks. In essence we were talking about the importance of visitor level granularity of data and how it will impact personalization.Here I have 3 use cases which is being enabled through Big Data at OWW.2. Our CEO affectionately calls this EFX – We use Hadoop to analyze the attributes from Site Analytics, Internal logs(Consists of multiple application logs and NOT just weblogs), MVT logs. All these in essence will funnel our regression models. One of the key wins from the Machine Learning team was to analyze, build and implement the recommendation engine for out hotel search. The data was from our Site Analytics, and some other internal application logs. was analyzed using MR Hadoop jobs. The results we saw was astonishing. 7% interaction rate37% had a likely chances to continue deep in the funnel2.6% increase in booking path engagement.The beauty of this is the Big Data Analytics is fed into a machine and it learns and changes as time progresses. 3. PPC bidding based on Site Analytics data. EX in turn funneling out PPC channels. The results are very encouraging. Helps us with regression analysis4. The final use case is where we learned that Mac users tend to in general spend more money on our sites 
  • We perform around 100s of tests on our site. We need to analyze the consumer patterns and behaviors which will enable us to make the consumer experience very goodOne of the test we perform is around the search results. Based on the visitor we usually alter the search results order to make it more personal to them.This enables the user to find the right hotel results very easilyImagine you as a user take a yearly vacation in Honololu Hawaii and always stay at the same hotel. If we know this information about you we can serve you better to show that hotel at the top.We also do a lot of testing around layout, colors, button placementEventually lot of this data flows through our Hadoop infrastructure which will enable us to to perform modeling exercise and analysis of our control and test groups.
  • 1. We faced organizational resistance to deploying Hadoop.Not from management, but from other technical teams.Required persistence to convince them that we needed to introduce a new hardware spec to support Hadoop.CTO was a big believer and provided technical guidance with Big Data. This helped a lot in making this a success at organization. Its not everyday you have CMO and GVP asking their team members to get data out of Hadoop 
  • 1. Here are some key learning's from our experience and some thoughts for you to consider 2. If you have the strength of technology go for it. 3. This needs heavy investment from time and resource perspective 4. Like I mentioned many times data without analysis is worthless 5. Senior Leadership buy in is a must. We had huge support from our CTO, CEO and CMO6. Data Governance is a must7. Lastly if you want to succeed then you need to fight the tough battles and make the tough choices.
  • Big Data Analytics from a Practitioners View

    1. 1. Big Data Analytics from a Practitioners view Sep 2013 Raghu Kashyap
    2. 2. About Raghu Kashyap page 1 Areas of Responsibility  Data Insights Group (Site analytics, Competitive Intelligence, Big Data)  Orbitz India, supporting Analytics and BI teams  US, Europe, Australia(APAC) Personal  Director – Data Insights Group  Strong background with technology(13 years) passion and experience with analytics(4 years) and big data (3.5 year)  Masters in Computer Science  Golf, traveling, helping non-profit organizations, spending time with my wife and 2 boys  Twitter: @ragskashyap  Blog:  Email:
    3. 3. Orbitz Worldwide page 2
    4. 4. Challenges  Lack of multi-dimensional capabilities  Heavy investment on the tools  Precision vs Accuracy  Data Governance
    5. 5. continued….  No data unification or uniform platform across organizations and business units  No easy data extraction capabilities
    6. 6. Hadoop history at OWW page 5
    7. 7. Web Analytics & Big Data  OWW generates couple million air and hotel searches every day.  Massive amounts of data. Over hundred GB of log data per day.  Expensive and difficult to store and process this data using existing data infrastructure.
    8. 8. Love Thy Hadoop page 7  Long term storage for very large data sets.  Open access to developers and analysts.  Allows for ad-hoc querying of data and rapid deployment of reporting applications.
    9. 9. Hadoop Growth page 8
    10. 10. Hadoop Cluster page 9
    11. 11. Treemap of HDFS storage page 10
    12. 12. Approach with Hadoop and ETL Raw logs Flat files Event Model Map Reduce ETL External Tables Data Warehouse (Greenplum) GP Connector
    13. 13. Opportunities page 12 Machine Learning Site Analytics Data PPC bidding efficiencies Internal log analysis. Hgrep MVT testing Advanced Analytics
    14. 14. Show me the money EFX – Every Friggin X PPC bidding efficiencies MAC vs. PC
    15. 15. Marketing Channel optimization page 14 Orbitz.comDirect Paid - Brand Paid – Non Brand SEO – Brand SEO - Non Brand Email Meta Travel Research Affiliates Display Ads
    16. 16. Hotel Rate Cache optimization page 15 Data is collected as part of RCDC. Includes every live rate search (aka burst) performed by our hotel stack. Raw data: ~200 GB, compressed, 108 records. Extraction: <40 GB compressed, 109 records.
    17. 17. MVT Analyze behavioral and Test data from our MVT testing page 16
    18. 18. DWH Log analysis page 17 • Analysis of Greenplum DB logs within Hadoop to analyze the data usage patterns. • Impact analysis • Hadoop usage for the last 30 days of DB log analysis.
    19. 19. HIPPO is your best friend • Expect organizational resistance from unanticipated directions • You can do wonders in the analytics area if you get buy in.
    20. 20. Lessons Learnt Analytics using Big Data comes with a price. Data Governance Senior Leadership buy in I can't tell you the key to success, but the key to failure is trying to please everyone." -Ed Sheeran page 19
    21. 21. How to capitalize on Big Data? page 20 Learn from people who have already done this. DO NOT reinvent the wheel Buy v/s Build balance Build once and leverage mulitple places. Go where clients don’t want to go or cant go in terms of execution.
    22. 22. What matters to Practitioners? Things change dramatically in the world of analytics Being Agile is very important Dashboards and Reports can take you only to a certain level Buy in from key groups is important Grow business and impress Boss  page 21
    23. 23. 222222 Thank you