Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Hadoop India Summit 2011 talk "Hadoop Avatar at eBay" by Srinivasan Rengarajan and Mohit Soni


Published on

  • Be the first to comment

Apache Hadoop India Summit 2011 talk "Hadoop Avatar at eBay" by Srinivasan Rengarajan and Mohit Soni

  1. 1. 1<br /> Avatar at eBay<br />Srinivasan Rengarajan (<br />Mohit Soni (<br />Courtesy<br />Anil Madan (<br />
  2. 2. 2<br />2007 Research Team Builds a 4 node Cluster<br />Subset of Click Stream and EDW data<br />Innovation with Mobius Query Language<br />Visualization and Click Path analysis<br />2009 Sept Search Clusters <br />Machine Learning Ranking cluster of 28 nodes<br />Search relevance cluster of 10 nodes<br />Subset of Click Stream and EDW Data<br />2010 May – Athena* Exploratory Cluster of 532 nodes<br />Platform Teams join hands with Search/Research to build a larger cluster .<br />Build it as a core competency for advanced insights for complex data<br />Rapid build-out with timelines pulled in by couple of months<br />* Athena, is the goddess of civilization, wisdom, strength, strategy, craft, justice and skill in Greek mythology<br /> MIT's Athena ushered the world in a new era of distributed systems when it started in the mid 80s.<br />2<br />
  3. 3. Infrastructure<br />3<br /><ul><li>Enterprise Nodes </li></ul>Sun 64bit , Red Hat Linux<br />2 Quad Core Nehalem, 72GB RAM, 4TB<br />Servers<br /><ul><li>NameNode(s)
  4. 4. Job Tracker
  5. 5. Zookeeper
  6. 6. HBaseMaster
  7. 7. Ganglia Server
  8. 8. eBay (Cloudera) HUE
  9. 9. Data Nodes</li></ul>SGI-Rackables, Cent OS, 1U , 5.3PB<br />2 Quad Core Nehalem, 36GB RAM, 10TB<br />Hbase on 20 nodes<br /><ul><li>Network</li></ul> TOR 1Gbps<br /> Core Switches uplink 40Gbps<br />3<br />
  10. 10. Ecosystem<br />4<br /><ul><li> Monitoring & Alerting</li></ul>Ganglia, Nagios<br /><ul><li> Tools</li></ul>HUE/Mobius – lifecycle of user jobs UC4 - scheduling Oozie – user workflow and data pipelines<br /> Mahout – data mining <br />Monitoring & Alerting <br />(Ganglia, Nagios)<br />Tools & Libraries<br />(HUE,UC4,Oozie.Mobius,Mahout)<br /><ul><li> Data Access Frameworks</li></ul>Hbase - for EDWdata<br />Pig – data piplelines<br />Hive – Adhoc queries MQL – Mobius Query Language<br />Data Access <br />(Hbase, Pig, Hive)<br />MapReduce <br />(Java, Streaming, Pipes,Scala)<br />Hadoop Core <br />(HDFS,Common)<br /><ul><li> MapReduce</li></ul>Sourcing data primarily Java Applications using Perl, Scala, Python…<br />4<br />
  11. 11. Administration<br />Groups<br />Built to support multiple groups<br />Job invocation uses the group name<br />Fair Scheduler <br />Allocations based on investment<br />Weights <br />Minimum share of mappers and reducers<br />poolMaxJobsDefault<br />userMaxJobsDefault<br />defaultMinSharePreemptionTimeout<br />fairSharePreemptionTimeout<br />Auth & Auth<br />HUE – custom module to use corp. credentials<br />CLI*– PAM custom module<br />Security* - Implement token interface to replace Kerberos with SAML.<br />* Work in Progress<br />5<br />
  12. 12. Data Sourcing Patterns<br />6<br />Click Stream<br />Search Indices<br />EDW<br />Analytics Reporting<br />Description<br />Acquisition<br />Algorithmic Models<br />Images<br />
  13. 13. Search Use Case – Machine Learned Ranking<br />7<br />ClickStream<br />Items<br />Users<br />Feedback<br />Classifiers<br />Ranking Function<br />Great Search Results<br /><ul><li>Goal
  14. 14. Enhance search relevance for eBay’s items.
  15. 15. Hadoop Usage
  16. 16. Build a ranking function that takes multiple factors into account like price, listing format, seller track record, relevance.
  17. 17. Ability to add new factors to validate hypothesis
  18. 18. .</li></li></ul><li>Research Use Case – Description Data Mining <br />8<br />BARBIE<br />1999 "PREMIERE NIGHT" <br />Home Shopping Special Edition<br />Gorgeous Doll With Beautiful Blond Hair /  In A Gown Of Purple And Silver<br />New / Never Removed From Box / Doll Is In Mint Condition / Remember This Beauty Is 11 Years Old<br />Free Shipping To US Only / Will Ship International / Please E-mail For Cost<br />Feel Free To Ask Me Any Questions Or Concerns<br />Smoke - Free Environment<br />Free Shipping<br />Year: 1999<br />Model: premiere night<br />Edition: home shopping special<br />Hair: blond<br />Gown: purple and silver<br />Condition: new / never removed from box / mint<br />Goal<br />Extend catalog coverage<br />Hadoop Usage<br />Leverage data mining/machine learning techniques to create inventory into name value pairs <br /> in an completely unsupervised way<br />
  19. 19. 9<br />
  20. 20. 10<br />Acknowledgments<br /><ul><li> Athena Team
  21. 21. Cloudera Inc.
  22. 22. Community</li>