October 2016 HUG: Pulsar, a highly scalable, low latency pub-sub messaging s...
Apache Hadoop India Summit 2011 talk "Hadoop Avatar at eBay" by Srinivasan Rengarajan and Mohit Soni
1. 1 Avatar at eBay Srinivasan Rengarajan (srengarajan@ebay.com) Mohit Soni (mosoni@ebay.com) Courtesy Anil Madan (amadan@ebay.com)
2. 2 2007 Research Team Builds a 4 node Cluster Subset of Click Stream and EDW data Innovation with Mobius Query Language Visualization and Click Path analysis 2009 Sept Search Clusters Machine Learning Ranking cluster of 28 nodes Search relevance cluster of 10 nodes Subset of Click Stream and EDW Data 2010 May – Athena* Exploratory Cluster of 532 nodes Platform Teams join hands with Search/Research to build a larger cluster . Build it as a core competency for advanced insights for complex data Rapid build-out with timelines pulled in by couple of months * Athena, is the goddess of civilization, wisdom, strength, strategy, craft, justice and skill in Greek mythology MIT's Athena ushered the world in a new era of distributed systems when it started in the mid 80s. 2
11. Administration Groups Built to support multiple groups Job invocation uses the group name Fair Scheduler Allocations based on investment Weights Minimum share of mappers and reducers poolMaxJobsDefault userMaxJobsDefault defaultMinSharePreemptionTimeout fairSharePreemptionTimeout Auth & Auth HUE – custom module to use corp. credentials CLI*– PAM custom module Security* - Implement token interface to replace Kerberos with SAML. * Work in Progress 5
12. Data Sourcing Patterns 6 Click Stream Search Indices EDW Analytics Reporting Description Acquisition Algorithmic Models Images