Web analyticsandbigdata techweek2011


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Welcome everyone. This presentation is about Hadoop and Big data to drive web analytics 2. The goal of this presentation is to explain how we are shaping up web analytics and big data to optimize the data driven decisions at Orbitz World wide 3. I will also be talking about the process model on how we are effectively utilizing the brains and man power across the organization towards a common goal 4. Between me and Wetta we promise to give you some thought provoking details about analytics and Big data :-)
  • I met someone at the train station who asked me what I do? I said I work in the web analytics field and I help shape up the strategy and vision at Orbitz worldwide and enable our business teams to get insights on the performance of our site and act upon it. So he said, Ah you do reporting :-) 2. I started thinking why web analytics is hard for people to get and started evangelizing both within and outside Orbitz 3. I manage the webanalytics team at Orbitz worldwide I also try to help out non-profit organizations while I am not busy with my wife and 2 sons.
  • Here is the Agenda on what we would be talking. We will be breaking the sections in web analytics, challenges with web analytics, how Big data is helping us overcome these challenges and the process. 3. Michael will providing the business side of the story. 4. Finally we will open up for questions.
  • .1 So what is web analytics? 2, Read the definition . It tells you exactly why someone came to your site and what kind of impact they had on the bottom line of your revenue 3. Read the definition . You need to immerse yourself in data to understand the story it's telling 4. Read the definition . Focus on Customer. Customer is the king. You need to listen and act upon their feedback 5. Read the definition. Test Test and Test. If you want to prove or disprove a HIPPO's opinion you need to perform tests on your site 6. btw HIPPO is a common terminology in the industry. It stands for Highest Income Paid person's opinion :-)
  • 1. A website is just like a store 2. You have millions of people visiting you every day and shopping on your site 3. So where does Web Analytics fit here? 4. Web Analytics is the invisible shopper who goes around the store and watches everyone's behavior. 5. Web Analytics takes the behavioral attributes and helps business with insights!
  • So how do we fit the puzzle? By learning the behavior of the customer and focusing on key attributes Know the travel details such as how many travelers, what kind of travelers, any preferred carrier or hotels? 4. Understand the shopping patterns. Does he want to shop only on weekends or else only on Thursdays. 5. Focus on Visit Patterns. How many times does he come to the site before he buys anything 6. Learn the page navigation. I.e does he see 100 pages every time he comes or does he know exactly what to look at 7. Master the Demand source. Anyone who's worked in the marketing side knows that attribution is a holy war. Deciding which demand source gets the credit for conversion is something people will argue to death Just like the IDE war between VIM, EMACS, Intellij and Eclipse :-) 8. So now that I think you understand what webanalytics is and what you can do with it lets focus a bit on its history
  • So who remembers the glory days of hit counters? Early 1990's if you had more hits to the site than that was a wonderful thing. People would measure the traffic using hit counters. As Michael says nothing's worse than bringing cheap and crappy traffic to the site.
  • So for people who are familiar with webtrends know that they were one of the first companies to parse server side logs to provide web analytics tools 2. This was kind of a gateway to the wonderful world of site analytics. 3. Technology folks know that nothing is easy with log file parsing :-)
  • The reason webanalytics became popular along the business marketing teams was due to the fact that it was easy to implement javascript tags and use one of the SAAS tools to analyze data 2. There has been numerous articles, discussions and debates on which approach is better server side or client side tagging 3. I think this will continue in the near future.
  • Google Analytics made Web Analytics sexy, easy and cheap Prior to the arrival of Google analytics the big vendors in this area were Omniture, Webtrends, Coremetrics The cost for these tools ranges anywhere from quarter million dollars to over a million dollars. Google changed the map of web analytics with the introduction of GA
  • So this finally brings us to the new era of web analytics with Big Data In the early days of 2009 there were a lot of acquisitions in this area and now we see most of the vendors consolidating their businesses to support the big data market. 3. The early adaptors of Big Data were in the likes of IBM, Facebook and Orbitz 4. We will be seeing more the Data warehouse vendors moving in this direction
  • What exactly is Web Analytics today and what type of data we collect and funnel into our Big Data infrastructure. 2. The four pillars of web analytics are Site Analytics, Voice of Customer, Multi Variate testing(MVT in short) and finally Competitive intelligence 3. Let's now focus on what these four key areas are about and understand the importance and usage within Orbitz
  • Site Analytics provides the "What" of web analytics This really helps us understand the visitor behavior pattern It also helps us measure the conversion and track the demand source
  • This helps us answer why users drop off from a search result page Do people like round button or a square button. Things that you never imagined would have an impact on a customer will be surfaced by doing MVT testing 4. If you want to have some fun and see your customer understanding skills check out www.whichtestwon.com There are every day tests on which you can vote.
  • You don't know why a customer behaved the way he did on your site. The only way to understand certain things with customer behavior is by asking them why. VOC helps you understand the customer needs and behavior by listening to them through surveys and feedback mechanisms
  • so lets say your business is seeing an upward trend and revenue has been soaring Do you know if this is because of the changes that you did on your site? Or do you know if there is a upward trend in the market and everyone including your competitors are growing 4. If you competitor is growing at 25% rate and you are at 5% then you really are not growing. 5. CI will help you understand all the these aspects of business so that you can make educated data driven decisions
  • This brings us to Orbitz and how we are using all the aspects if WA along with Big Data. OWW Operates multiple brands across the globe In US we have Orbitz, CheapTickets, The Away Network and Orbitz for Business Internationally we have ebookers, HotelClub (includes RatestoGo and Asia Hotels) We went public in 2007 and registered as OWW on NYSE
  • So with so many brands and so much data we had quite a few challenges? For starters we couldn't easily do multi dimensional analysis with the tools. With data spread across in multiple tools it was hard to picture the whole 9 yards obviously tools cost money Harder for people to understand where to look at for data With Analytics you need direction rather than precision to take action and get insights
  • In the Big Data front we didn't have a good infrastructure where we could house all this data in a cost effective way. 2. Data extraction was NOT an easy task 3. Focusing on the key differences on when you need testing v/s when you need reporting. 4. Earlier I mentioned that you need to do rigorous outcome analysis. However, with all the challenges we faced it was not an easy task.
  • We realized that with all the challenges we had, we had to innovate and experiment new ways to enable successful web analytics at OWW 2. We generate hundreds of GB of log data per day. How can we effectively store this massive data and how can we mine this data and make sense out of it? 3. Our existing DW was not intended to support such large sets of data and more importantly process this data We also needed to make sure that we don't spend huge money to store this data set. 4. Big data infrastructure with Hadoop has been a huge success at Orbitz and at other organizations
  • So what does this buy us? We can now store data for a long period of time without worrying too much about the space Analysts and developers have access to this data set Developers can run adhoc queries to support our business needs. While the core web analytics team focuses on the company standards and metrics
  • Here is an example of how we process our site analytics data today. We FTP the log files into our Hadoop infrastructure daily. The files are LZO compressed for better storage utilization. Developers then write Map reduce jobs against these raw log files to output data into HIVE tables. HIVE is a DW equivalent of Hadoop Most of the MR jobs are written using Java and scripting languages such as Python, Ruby, BASH. Business teams however, have skillset to run queries against HIVE tables.
  • Since the market on Big Data is not that mature there are no good ways to build visualization on top of HIVE 2. Due to this and for other reasons we need to bring a subset of this data into our warehouse. 3. So in essence the data that are in HIVE will make it into the warehouse. 4. There are companies such as Karmaspehe, Datamere who are in the initial stages of bridging the gap between business needs and Hadoop access. 5. However, its too early to say if this will be the norm
  • We focused on some key areas of our business such as demand source and campaigns as our pilot and worked with our business partners to enable the analytics on Big Data 2. We have developers writing Map Reduce jobs which run every day and populate HIVE tables We generate more than 25 million records for a month for the pilot use case that we worked on This only show cases the sheer magnitude and power of analytics within the Big Data framework
  • Here are some of the areas where we are utilizing the infrastructure we have built in to extract data and provide additional analysis 2. Traffic acquisition helps us understand the demand and flow into our websites 3. provide platform for better marketing optimizations. 4. Better understand the user engagement 5. Provide better ad optimization framework 6. finally understand the user behavior 7. So all this is pretty cool stuff from technology and analytics stand point of view. Lets now turn our focus to Michael wetta and learn more from the expert on how business is leveraging the Big Data and web analytics to drive business decisions.
  • So how do you organizationally structure yourself and Big Data so that you can be effective both in terms of resource utilization and setting the platform for success 2. This is what we call the Centralized Decentralization. 3. With this approach the core web analytics team controls and supports the individual teams when it comes to data extraction and modeling. 4. This prevents one team from being the bottle neck with data extraction and analytics 5. If you have ever worked in the Data Warehouse side of the world you will know the challenges and delays in getting the data
  • With the core process of centralized decentralization and being agile how do you succeed? You can't manage if you can't measure. But once you measure make sure you fail fast Every team needs to be thinking of analytics with every feature they work on Dimensional modeling is great but like someone wise said 'All models are wrong but some are useful" :-) My point here is data without analysis is like a Ferrari without gas. If you Make it a point to extract smaller chunks of data and tie this effort to your business objectives. You are sure to succeed
  • Here are some key learning's from our experience and some thoughts for you to consider If you have the strength of technology go for it. This needs heavy investment from time and resource perspective Like I mentioned many times data without analysis is worthless
  • We at Orbitz use Big Data and Hadoop for numerous other projects some of them being Machine Learning, page load performance analysis and data cache analysis
  • I couldn't end this session without telling who else is doing something similar. read slides
  • In conclusion I would like to say: Invest in people and tools empower individual teams in your organization to manage their own analytics on Big Data Focus on analysis and not just data extraction.
  • Here are some good references
  • Thanks again for listening to our story and we would be available for any further questions you may have. Also if you are interested in applying for a job at orbitz please check out the career site
  • Web analyticsandbigdata techweek2011

    1. 1. Hadoop and Big Data to Drive Web Analytics Raghu Kashyap & Michael Wetta @ Orbitz Worldwide
    2. 2. About Us Raghu Kashyap - Director Web Analytics Twitter: @ragskashyap Blog: http://kashyaps.com Email: [email_address] Michael Wetta - Marketing Strategy & Analytics Email: [email_address]
    3. 3. Overview <ul><li>Web Analytics journey </li></ul><ul><li>Orbitz Worldwide </li></ul><ul><li>What challenges exist? </li></ul><ul><li>Big Data Analysis </li></ul><ul><li>Business Testimonial </li></ul><ul><li>Centralized Decentralization </li></ul><ul><li>Dos and Don ’ ts </li></ul><ul><li>What Hadoop is being used for beyond Web Analytics at Orbitz </li></ul><ul><li>Where else? </li></ul><ul><li>Conclusion </li></ul>
    4. 4. What is Web Analytics? <ul><li>Understand the impact and economic value of the website </li></ul><ul><li>Rigorous outcome analysis </li></ul><ul><li>Passion for customer centricity by embracing voice-of-customer initiatives </li></ul><ul><li>Fail faster by leveraging the power of experimentation(MVT) </li></ul>
    5. 6. <ul><li>Travel details </li></ul><ul><li>Shopping patterns </li></ul><ul><li>Visit patterns </li></ul><ul><li>Page navigation </li></ul><ul><li>Demand source </li></ul>Behavioral attributes
    6. 7. Web Analytics History Early1990s – Hit counters Reference - http://www.theedifier.com
    7. 8. Web Analytics History 1993 – Web server logs (Webtrends) - - [25/May/2004:00:17:09 +1200] &quot;GET /internet/index.html HTTP/1.1&quot; 200 6792 &quot;http://www.mediacollege.com/video/streaming/http.html&quot; &quot;Mozilla/5.0 (X11; U; Linux i686; es-ES; rv:1.6) Gecko/20040413 Debian/1.6-5” - - [25/May/2004:00:17:20 +1200] &quot;GET /cgi-bin/forum/commentary.pl /noframes/read/209 HTTP/1.1&quot; 200 6863 &quot;http://search.virgilio.it/search/cgi/search.cgi ?qs=download+video+illegal+Berg&lr=&dom=s&offset=0&hits=10&switch=0&f=us” &quot;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Hotbar”
    8. 9. Web Analytics History <ul><li>1997 – Javascript tag collection </li></ul><ul><li>Server side or Client side tagging? </li></ul>
    9. 10. Web Analytics History 2005 – Google Analytics Reference - http://www.theedifier.com
    10. 11. Web Analytics History <ul><li>2009/2010 – Major acquisitions (Adobe, IBM, Comscore) </li></ul><ul><li>2009/2010 – Big Data (IBM, Facebook, Orbitz, Informatica, Greenplum) </li></ul>
    11. 12. Web Analytics today <ul><li>Site Analytics </li></ul><ul><li>Multi Variate Testing (MVT) </li></ul><ul><li>Voice of Customer (VOC) </li></ul><ul><li>Competitive intelligence </li></ul>
    12. 13. Site Analytics <ul><li>The “ What ” of Web Analytics </li></ul><ul><li>Helps measure: </li></ul><ul><ul><ul><li>Visits/Visitors </li></ul></ul></ul><ul><ul><ul><li>Page views </li></ul></ul></ul><ul><ul><ul><li>Conversion </li></ul></ul></ul><ul><ul><ul><li>SEO activities </li></ul></ul></ul><ul><ul><ul><li>Traffic Source </li></ul></ul></ul>
    13. 14. Multi Variate Testing (MVT) <ul><li>The “ Why ” of Web Analytics </li></ul><ul><li>Fail faster </li></ul><ul><li>Experiment or Die </li></ul>
    14. 15. Voice of Customer (VOC) <ul><li>The “ Why ” of Web Analytics </li></ul><ul><li>Surveys </li></ul><ul><li>Lab usability tests </li></ul>
    15. 16. Competitive Intelligence <ul><li>The “ What else ” of Web Analytics </li></ul><ul><li>Data Collection : Toolbar, Panel, ISP </li></ul>
    16. 17. About Orbitz Worldwide
    17. 18. Challenges <ul><li>Site Analytics </li></ul><ul><ul><li>Lack of multi-dimensional capabilities </li></ul></ul><ul><ul><li>Hard to find the right insight </li></ul></ul><ul><ul><li>Heavy investment on the tools    </li></ul></ul><ul><ul><li>Precision vs Direction </li></ul></ul>
    18. 19. continued…. <ul><li>Big Data </li></ul><ul><ul><li>No data unification or uniform platform across organizations and business units </li></ul></ul><ul><ul><ul><ul><li>No easy data extraction capabilities </li></ul></ul></ul></ul><ul><ul><ul><li>Business </li></ul></ul></ul><ul><ul><li>Distinction between reporting and testing(MVT) </li></ul></ul><ul><ul><li>Minimal measurement of outcomes </li></ul></ul>
    19. 20. Web Analytics & Big Data <ul><li>OWW generates couple million air and hotel searches every day. </li></ul><ul><li>Massive amounts of data. Over hundred GB of log data per day. </li></ul><ul><li>Expensive and difficult to store and process this data using existing data infrastructure. </li></ul>
    20. 21. Big Data Infrastructure <ul><li>Infrastructure provides: </li></ul><ul><ul><li>Long term storage for very large data sets. </li></ul></ul><ul><ul><li>Open access to developers and analysts. </li></ul></ul><ul><ul><li>Allows for ad-hoc querying of data and rapid deployment of reporting applications. </li></ul></ul>
    21. 22. Processing of Web Analytics Data
    22. 23. Aggregating data into Data Warehouse
    23. 24. Data Analysis Jobs <ul><li>Traffic Source and Campaign activities </li></ul><ul><li>Daily jobs, Weekly analysis </li></ul><ul><li>Map reduce job </li></ul><ul><ul><li>~ 20 minutes for one day raw logs </li></ul></ul><ul><ul><li>~ 3 minutes to load to hive tables </li></ul></ul><ul><ul><li>Generates more than 25 million records for a month </li></ul></ul>
    24. 25. Data Categories <ul><li>Traffic acquisition </li></ul><ul><li>Marketing optimization </li></ul><ul><li>User engagement </li></ul><ul><li>Ad optimization </li></ul><ul><li>User behaviour </li></ul>
    25. 26. Shifting from Innovation to Mainstream Consumption
    26. 27. Crossing the Chasm: Shifting from Innovation to Mainstream Consumption Adapted from Geoffrey A. Moore – Technology Adoption Lifecycle <ul><li>Background on Analytics at Orbitz </li></ul><ul><li>Crossing the Chasm Framework </li></ul><ul><li>Application </li></ul>
    27. 28. Crossing the Chasm: Shifting from Innovation to Mainstream Consumption Adapted from Geoffrey A. Moore – Technology Adoption Lifecycle
    28. 29. Innovators Visionaries Mainstream Adapted from Geoffrey A. Moore – Technology Adoption Lifecycle Crossing the Chasm: Shifting from Innovation to Mainstream Consumption
    29. 30. Crossing the Chasm: Shifting from Innovation to Mainstream Consumption Adapted from Geoffrey A. Moore – Technology Adoption Lifecycle <ul><li>Consistent Message of Capabilities </li></ul><ul><li>Understanding and Handling Reservations </li></ul><ul><li>Inclusion in the development cycle </li></ul><ul><li>Storage and Accessibility </li></ul>Key Components Adoption:
    30. 31. Centralized Decentralization Web Analytics team + SEO team + Hotel optimization team
    31. 32. Model for success <ul><li>Measure the performance of your feature and fail fast </li></ul><ul><li>Experimentation and testing should be ingrained into every key feature. </li></ul><ul><li>Break down into smaller chunks of data extraction </li></ul>
    32. 33. Should everyone do this? <ul><li>Do you have the Technology strength to invest and use Big data? </li></ul><ul><li>Analytics using Big Data comes with a price (resource, time) </li></ul><ul><li>Big Data mining != analysis </li></ul><ul><li>Key Data warehouse challenges still exist (time, data validity) </li></ul>
    33. 34. Other Key Projects <ul><li>Machine Learning team </li></ul><ul><li>Measuring page download performance using site analytics logs </li></ul><ul><li>Storing and processing production application logs </li></ul><ul><li>Data cache analysis </li></ul>
    34. 35. Where else? Amazon - Was Amazon's recommendation engine crucial to the company's success? Facebook – A Petabyte Scale Data Warehouse using Hadoop EBay – The power of the Elephant Apple – iAds, UX and Data analytics
    35. 36. Conclusion <ul><li>Invest in the 10/90 rule (10$ on tools and 90$ on people) – Avinash Kaushik </li></ul><ul><li>Analytical thinking engineers/analysts </li></ul><ul><li>Empower individual feature teams to manage their own analytics (Centralized Decentralization) </li></ul><ul><li>Focus on Analysis more than reporting </li></ul>
    36. 37. Reference <ul><li>Web Analytics Association http ://www.webanalyticsassociation.org / </li></ul><ul><li>Avinash Kaushik http://kaushik.net </li></ul><ul><li>Twitter #measure </li></ul><ul><li>Analysis Exchange http :// www.webanalyticsdemystified.com </li></ul>
    37. 38. Questions? <ul><li>http://careers.orbitz.com </li></ul>