Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop World 2011 Keynote: Ebay - Hugh Williams


Published on

Hugh Williams will discuss building Cassini, a new search engine at eBay which processes over 250 million search queries and serves more than 2 billion page views each day. Hugh will trace the genesis and building of Cassini as well as highlight and demonstrate the key features of this new search platform. He will discuss some of the challenges in scaling arguably the world’s largest real-time search problem, including the unique considerations associated with e-commerce and eBay’s domain, and how Hadoop and HBase are used to solve these problems

Hadoop World 2011 Keynote: Ebay - Hugh Williams

  1. 1. Project Cassini: ’sNew Search Engine Vice President of Search, Experience, and Platforms eBay Marketplaces
  2. 2. $2.63millionfor a lunch withWarren Buffett
  3. 3. $40,668for Justin Bieber’sjust-cut hair
  4. 4. $130Kfor PrincessBeatrice’s hat
  5. 5. $62billionin merchandise sold in 2010
  6. 6. 97 millionactive buyers and sellers worldwide250 million querieseach day to our search engine200+ million itemslive in more than 50,000 categories
  7. 7. 9 petabytes of datain our Hadoop and Teradata clusters2 billion page viewseach day75 billion database callseach day
  8. 8. Huge Opportunity: Taking the “e” out of ecommerce Yesterday Today Tomorrow Online Online 4% 6% Web- influenced Online offline + Offline 37% Offline Offline 96% 2008 = $325B 2013 = $10T Source: Forrester, Euromonitor and Economist Intelligence Unit Source: Forrester Source: Economist Intelligence Unit
  9. 9. Voyager: our current search engine
  10. 10. Voyager: our current search engine ► Reliable, critical, proven workhorse
  11. 11. Voyager: our current search engine ► Circa-2002 textbook design ► Basic ranking functionality ► Title-only match by default ► Very literal search
  12. 12. Voyager: our current search engine ► Inflexible & Manual ► The next wave of innovation requires a new search platform…
  13. 13. Project Cassini at eBayOur new search engine
  14. 14. Project Cassini at eBay Our most ambitious core engineering project
  15. 15. Project Cassini at eBay Our most ambitious core engineering project ► Entirely new codebase ► World-class, from a world-class team ► Platform for ranking innovation ► Uses all data by default ► Flexible ► Automated ► Four major tracks, 100+ engineers ► Complete in less than 18 months
  16. 16. Project Cassini at eBay Beginning tests, likely launch in 2012
  17. 17. A Short Primer on Indexing When a user types a query, it isn’t practical to exhaustively scan 200+ million items Instead, we create an inverted index, and use it to rank the items and find the best matches An inverted index is similar to the index in the back of a book:  A set of searchable terms  For each term, a list of locations
  18. 18. An Inverted Index cat 3: 1, 2, 7 1 cat on the mat fat cat 2 3 4 wild cat 5 6 7 8
  19. 19. Distributed Index Construction
  20. 20.  Larger index than Voyager  Descriptions, Seller data, other metadata, …  Much more history in our indexes More computationally expensive work at index- time (and less at query-time) Ability to rescore or reclassify entire site inventory
  21. 21.  Hadoop:  Distributed indexing – platform for hourly index refreshes  Fault tolerance through HDFS replication  Better utilization of hardware – can generate different index types with one cluster
  22. 22.  HBase:  Column-oriented data store on top of HDFS  Used to store eBay’s items  Bulk and incremental item writes  Fast item reads for index construction  Fast item reads and writes for item annotation
  23. 23.  Everyone is still learning Some issues only appear at scale Production cluster configuration is challenging  Hardware issues  Tuning cluster configuration to our work loads HBase stability Monitoring health of HBase Managing workflows – many step map/reduce jobs