SlideShare a Scribd company logo
1 of 10
Download to read offline
1                           Avatar at eBay Srinivasan Rengarajan (srengarajan@ebay.com) Mohit Soni (mosoni@ebay.com) Courtesy Anil Madan (amadan@ebay.com)
2 2007 Research Team Builds a 4 node Cluster Subset of Click Stream and EDW data Innovation with Mobius Query Language Visualization  and Click Path analysis 2009 Sept Search Clusters  Machine Learning Ranking cluster of 28 nodes Search relevance cluster of 10 nodes Subset of Click Stream and EDW Data 2010 May – Athena* Exploratory Cluster of 532 nodes Platform Teams join hands with Search/Research to build a larger cluster . Build it as a core competency for advanced insights for complex data Rapid build-out with timelines pulled in by couple of months *  Athena, is the goddess of civilization, wisdom, strength, strategy, craft, justice and skill in Greek mythology    MIT's Athena ushered the world in a new era of distributed systems when it started in the mid 80s. 2
Infrastructure 3 ,[object Object],Sun 64bit , Red Hat Linux 2 Quad Core Nehalem, 72GB RAM, 4TB Servers ,[object Object]
Job Tracker
Zookeeper
HBaseMaster
Ganglia Server
eBay (Cloudera) HUE
Data NodesSGI-Rackables, Cent OS, 1U , 5.3PB 2 Quad Core Nehalem, 36GB RAM, 10TB Hbase on 20 nodes ,[object Object], TOR 1Gbps  Core Switches uplink 40Gbps 3
Ecosystem 4 ,[object Object],Ganglia, Nagios ,[object Object],HUE/Mobius – lifecycle of user  jobs        UC4 - scheduling        Oozie – user workflow and data pipelines         Mahout – data mining     Monitoring & Alerting  (Ganglia, Nagios) Tools & Libraries (HUE,UC4,Oozie.Mobius,Mahout) ,[object Object],Hbase - for EDWdata Pig – data piplelines Hive – Adhoc queries MQL – Mobius Query Language Data Access  (Hbase, Pig,  Hive) MapReduce  (Java, Streaming, Pipes,Scala) Hadoop Core  (HDFS,Common) ,[object Object],Sourcing data primarily Java         Applications using Perl, Scala, Python… 4

More Related Content

What's hot

Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Uwe Printz
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopDataWorks Summit
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Koichi Shirahata
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringTaro L. Saito
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11Hortonworks
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoSadayuki Furuhashi
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoopColin Su
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBMohamed Taher Alrefaie
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEDataWorks Summit/Hadoop Summit
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyJay Nagar
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and developmentWes McKinney
 
Pig programming is more fun: New features in Pig
Pig programming is more fun: New features in PigPig programming is more fun: New features in Pig
Pig programming is more fun: New features in Pigdaijy
 
Big data & hadoop
Big data & hadoopBig data & hadoop
Big data & hadoopAbhi Goyan
 

What's hot (20)

Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
Hadoop-2 @ eBay
Hadoop-2 @ eBayHadoop-2 @ eBay
Hadoop-2 @ eBay
 
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl PresentationJanuary 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Hive
HiveHive
Hive
 
Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...Performance Analysis of MapReduce Implementations on High Performance Homolog...
Performance Analysis of MapReduce Implementations on High Performance Homolog...
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoop
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DB
 
Hadoop and HBase @eBay
Hadoop and HBase @eBayHadoop and HBase @eBay
Hadoop and HBase @eBay
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and development
 
Pig programming is more fun: New features in Pig
Pig programming is more fun: New features in PigPig programming is more fun: New features in Pig
Pig programming is more fun: New features in Pig
 
Big data & hadoop
Big data & hadoopBig data & hadoop
Big data & hadoop
 

Viewers also liked

Download 2nd edition file
Download 2nd edition fileDownload 2nd edition file
Download 2nd edition fileValerie Alarcon
 
Roteiro Castelao na Estrada
Roteiro Castelao na EstradaRoteiro Castelao na Estrada
Roteiro Castelao na Estradabngaestrada
 
One Off Garda Presentacion 1986 2003
One Off Garda Presentacion 1986 2003One Off Garda Presentacion 1986 2003
One Off Garda Presentacion 1986 2003Giorgio Avigo
 
Cross Media from 2001 | a good vision?
Cross Media from 2001 | a good vision?Cross Media from 2001 | a good vision?
Cross Media from 2001 | a good vision?Blockchainizator
 
Pedoman eyd 2010
Pedoman eyd 2010Pedoman eyd 2010
Pedoman eyd 2010Ocha Ardi
 
Programación de carteles I Jornada de Investigación Estudiantil de la FHE
Programación de carteles I Jornada de Investigación Estudiantil de la FHEProgramación de carteles I Jornada de Investigación Estudiantil de la FHE
Programación de carteles I Jornada de Investigación Estudiantil de la FHEFacultad de Humanidades y Educación
 
Metro mart november 2012
Metro mart november 2012Metro mart november 2012
Metro mart november 2012ONE ROOF INDIA
 
Génesis de la concepción dominante de investigación educativa.
Génesis de la concepción dominante de investigación educativa.Génesis de la concepción dominante de investigación educativa.
Génesis de la concepción dominante de investigación educativa.Sler Hdez
 
Salveo Life Sciences Range of Products
Salveo Life Sciences Range of ProductsSalveo Life Sciences Range of Products
Salveo Life Sciences Range of Productssalveolifesciences
 
Experiencias tecnológicas
Experiencias tecnológicas Experiencias tecnológicas
Experiencias tecnológicas jcanodedios
 
Festa major de Can Palet 2015
Festa major de Can Palet 2015Festa major de Can Palet 2015
Festa major de Can Palet 2015Fav Terrassa
 
Guia de estudio para Oracle Certified Java EE 6 Web Component Developer
Guia de estudio para Oracle Certified Java EE 6 Web Component DeveloperGuia de estudio para Oracle Certified Java EE 6 Web Component Developer
Guia de estudio para Oracle Certified Java EE 6 Web Component DeveloperOscar V
 
MACBET, Mathias Haas - Enhancing the cosmopolitan cooperation between media w...
MACBET, Mathias Haas - Enhancing the cosmopolitan cooperation between media w...MACBET, Mathias Haas - Enhancing the cosmopolitan cooperation between media w...
MACBET, Mathias Haas - Enhancing the cosmopolitan cooperation between media w...Mathias Haas
 

Viewers also liked (20)

Download 2nd edition file
Download 2nd edition fileDownload 2nd edition file
Download 2nd edition file
 
Roteiro Castelao na Estrada
Roteiro Castelao na EstradaRoteiro Castelao na Estrada
Roteiro Castelao na Estrada
 
One Off Garda Presentacion 1986 2003
One Off Garda Presentacion 1986 2003One Off Garda Presentacion 1986 2003
One Off Garda Presentacion 1986 2003
 
Ficha técnica TM
Ficha técnica TMFicha técnica TM
Ficha técnica TM
 
E Mail Marketing
E Mail MarketingE Mail Marketing
E Mail Marketing
 
Cross Media from 2001 | a good vision?
Cross Media from 2001 | a good vision?Cross Media from 2001 | a good vision?
Cross Media from 2001 | a good vision?
 
Aitm Student Records
Aitm Student RecordsAitm Student Records
Aitm Student Records
 
Pedoman eyd 2010
Pedoman eyd 2010Pedoman eyd 2010
Pedoman eyd 2010
 
Solubilidad 2 mq
Solubilidad 2 mqSolubilidad 2 mq
Solubilidad 2 mq
 
Programación de carteles I Jornada de Investigación Estudiantil de la FHE
Programación de carteles I Jornada de Investigación Estudiantil de la FHEProgramación de carteles I Jornada de Investigación Estudiantil de la FHE
Programación de carteles I Jornada de Investigación Estudiantil de la FHE
 
El codigo jedi
El codigo jediEl codigo jedi
El codigo jedi
 
Metro mart november 2012
Metro mart november 2012Metro mart november 2012
Metro mart november 2012
 
Génesis de la concepción dominante de investigación educativa.
Génesis de la concepción dominante de investigación educativa.Génesis de la concepción dominante de investigación educativa.
Génesis de la concepción dominante de investigación educativa.
 
Blogs-wikis-foros
Blogs-wikis-forosBlogs-wikis-foros
Blogs-wikis-foros
 
Salveo Life Sciences Range of Products
Salveo Life Sciences Range of ProductsSalveo Life Sciences Range of Products
Salveo Life Sciences Range of Products
 
Experiencias tecnológicas
Experiencias tecnológicas Experiencias tecnológicas
Experiencias tecnológicas
 
Pet o pete
Pet o petePet o pete
Pet o pete
 
Festa major de Can Palet 2015
Festa major de Can Palet 2015Festa major de Can Palet 2015
Festa major de Can Palet 2015
 
Guia de estudio para Oracle Certified Java EE 6 Web Component Developer
Guia de estudio para Oracle Certified Java EE 6 Web Component DeveloperGuia de estudio para Oracle Certified Java EE 6 Web Component Developer
Guia de estudio para Oracle Certified Java EE 6 Web Component Developer
 
MACBET, Mathias Haas - Enhancing the cosmopolitan cooperation between media w...
MACBET, Mathias Haas - Enhancing the cosmopolitan cooperation between media w...MACBET, Mathias Haas - Enhancing the cosmopolitan cooperation between media w...
MACBET, Mathias Haas - Enhancing the cosmopolitan cooperation between media w...
 

Similar to Apache Hadoop India Summit 2011 talk "Hadoop Avatar at eBay" by Srinivasan Rengarajan and Mohit Soni

2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21Hadoop User Group
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overviewSoojung Hong
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017Jongwook Woo
 
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023Timothy Chen
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Michael Mathioudakis
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache HadoopHortonworks
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
LIS688_Group1
LIS688_Group1 LIS688_Group1
LIS688_Group1 e_chae
 
Eclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentationEclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentationSai Paravastu
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceNeev Technologies
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYAAditya Srinivasan
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]Shirshanka Das
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCloudera, Inc.
 

Similar to Apache Hadoop India Summit 2011 talk "Hadoop Avatar at eBay" by Srinivasan Rengarajan and Mohit Soni (20)

2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
 
Hadoop at eBay
Hadoop at eBayHadoop at eBay
Hadoop at eBay
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
 
BigData
BigDataBigData
BigData
 
Bigdata
BigdataBigdata
Bigdata
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
 
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
 
Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00Modern Database Systems - Lecture 00
Modern Database Systems - Lecture 00
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
LIS688_Group1
LIS688_Group1 LIS688_Group1
LIS688_Group1
 
Eclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentationEclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentation
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYA
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 

More from Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...Yahoo Developer Network
 

More from Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
 

Apache Hadoop India Summit 2011 talk "Hadoop Avatar at eBay" by Srinivasan Rengarajan and Mohit Soni

  • 1. 1 Avatar at eBay Srinivasan Rengarajan (srengarajan@ebay.com) Mohit Soni (mosoni@ebay.com) Courtesy Anil Madan (amadan@ebay.com)
  • 2. 2 2007 Research Team Builds a 4 node Cluster Subset of Click Stream and EDW data Innovation with Mobius Query Language Visualization and Click Path analysis 2009 Sept Search Clusters Machine Learning Ranking cluster of 28 nodes Search relevance cluster of 10 nodes Subset of Click Stream and EDW Data 2010 May – Athena* Exploratory Cluster of 532 nodes Platform Teams join hands with Search/Research to build a larger cluster . Build it as a core competency for advanced insights for complex data Rapid build-out with timelines pulled in by couple of months * Athena, is the goddess of civilization, wisdom, strength, strategy, craft, justice and skill in Greek mythology MIT's Athena ushered the world in a new era of distributed systems when it started in the mid 80s. 2
  • 3.
  • 9.
  • 10.
  • 11. Administration Groups Built to support multiple groups Job invocation uses the group name Fair Scheduler Allocations based on investment Weights Minimum share of mappers and reducers poolMaxJobsDefault userMaxJobsDefault defaultMinSharePreemptionTimeout fairSharePreemptionTimeout Auth & Auth HUE – custom module to use corp. credentials CLI*– PAM custom module Security* - Implement token interface to replace Kerberos with SAML. * Work in Progress 5
  • 12. Data Sourcing Patterns 6 Click Stream Search Indices EDW Analytics Reporting Description Acquisition Algorithmic Models Images
  • 13.
  • 14. Enhance search relevance for eBay’s items.
  • 16. Build a ranking function that takes multiple factors into account like price, listing format, seller track record, relevance.
  • 17. Ability to add new factors to validate hypothesis
  • 18.
  • 19. 9
  • 20.