Apache Hadoop India Summit 2011 talk "Hadoop Avatar at eBay" by Srinivasan Rengarajan and Mohit Soni

  • 2,227 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,227
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
61
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 1
    Avatar at eBay
    Srinivasan Rengarajan (srengarajan@ebay.com)
    Mohit Soni (mosoni@ebay.com)
    Courtesy
    Anil Madan (amadan@ebay.com)
  • 2. 2
    2007 Research Team Builds a 4 node Cluster
    Subset of Click Stream and EDW data
    Innovation with Mobius Query Language
    Visualization and Click Path analysis
    2009 Sept Search Clusters
    Machine Learning Ranking cluster of 28 nodes
    Search relevance cluster of 10 nodes
    Subset of Click Stream and EDW Data
    2010 May – Athena* Exploratory Cluster of 532 nodes
    Platform Teams join hands with Search/Research to build a larger cluster .
    Build it as a core competency for advanced insights for complex data
    Rapid build-out with timelines pulled in by couple of months
    * Athena, is the goddess of civilization, wisdom, strength, strategy, craft, justice and skill in Greek mythology
    MIT's Athena ushered the world in a new era of distributed systems when it started in the mid 80s.
    2
  • 3. Infrastructure
    3
    • Enterprise Nodes
    Sun 64bit , Red Hat Linux
    2 Quad Core Nehalem, 72GB RAM, 4TB
    Servers
    SGI-Rackables, Cent OS, 1U , 5.3PB
    2 Quad Core Nehalem, 36GB RAM, 10TB
    Hbase on 20 nodes
    • Network
    TOR 1Gbps
    Core Switches uplink 40Gbps
    3
  • 10. Ecosystem
    4
    • Monitoring & Alerting
    Ganglia, Nagios
    • Tools
    HUE/Mobius – lifecycle of user jobs UC4 - scheduling Oozie – user workflow and data pipelines
    Mahout – data mining
    Monitoring & Alerting
    (Ganglia, Nagios)
    Tools & Libraries
    (HUE,UC4,Oozie.Mobius,Mahout)
    • Data Access Frameworks
    Hbase - for EDWdata
    Pig – data piplelines
    Hive – Adhoc queries MQL – Mobius Query Language
    Data Access
    (Hbase, Pig, Hive)
    MapReduce
    (Java, Streaming, Pipes,Scala)
    Hadoop Core
    (HDFS,Common)
    • MapReduce
    Sourcing data primarily Java Applications using Perl, Scala, Python…
    4
  • 11. Administration
    Groups
    Built to support multiple groups
    Job invocation uses the group name
    Fair Scheduler
    Allocations based on investment
    Weights
    Minimum share of mappers and reducers
    poolMaxJobsDefault
    userMaxJobsDefault
    defaultMinSharePreemptionTimeout
    fairSharePreemptionTimeout
    Auth & Auth
    HUE – custom module to use corp. credentials
    CLI*– PAM custom module
    Security* - Implement token interface to replace Kerberos with SAML.
    * Work in Progress
    5
  • 12. Data Sourcing Patterns
    6
    Click Stream
    Search Indices
    EDW
    Analytics Reporting
    Description
    Acquisition
    Algorithmic Models
    Images
  • 13. Search Use Case – Machine Learned Ranking
    7
    ClickStream
    Items
    Users
    Feedback
    Classifiers
    Ranking Function
    Great Search Results
    • Goal
    • 14. Enhance search relevance for eBay’s items.
    • 15. Hadoop Usage
    • 16. Build a ranking function that takes multiple factors into account like price, listing format, seller track record, relevance.
    • 17. Ability to add new factors to validate hypothesis
    • 18. .
  • Research Use Case – Description Data Mining
    8
    BARBIE
    1999 "PREMIERE NIGHT"
    Home Shopping Special Edition
    Gorgeous Doll With Beautiful Blond Hair /  In A Gown Of Purple And Silver
    New / Never Removed From Box / Doll Is In Mint Condition / Remember This Beauty Is 11 Years Old
    Free Shipping To US Only / Will Ship International / Please E-mail For Cost
    Feel Free To Ask Me Any Questions Or Concerns
    Smoke - Free Environment
    Free Shipping
    Year: 1999
    Model: premiere night
    Edition: home shopping special
    Hair: blond
    Gown: purple and silver
    Condition: new / never removed from box / mint
    Goal
    Extend catalog coverage
    Hadoop Usage
    Leverage data mining/machine learning techniques to create inventory into name value pairs
    in an completely unsupervised way
  • 19. 9
  • 20. 10
    Acknowledgments