SlideShare a Scribd company logo
Hadoop @ eBay:
Past, Present and
Future
Ryan Hennig
Hadoop Platform Team
ABOUT ME
RYAN HENNIG
Born and raised in Seattle, WA
Studied Computer Science at University of Washington in Seattle
Worked on Microsoft SQL Server 2006 – 2012
- Shipped SQL Server 2008, 2008 R2, 2012
Joined eBay Hadoop team in early 2012
- Based in Bellevue, suburb of Seattle

COMPUTE AND DATA INFRASTRUCTURE

3
AGENDA

Past: Growth of Hadoop at eBay
Present: Hadoop Use Cases, Operations Tools
Future: Hadoop 2.0
HADOOP AT EBAY:
PAST
Growth of Hadoop at eBay
Adventures in Forking
Partnership with Hortonworks
HADOOP EVOLUTION @ eBay

2013
• Shared
clusters

2012
2011
2010

2009
Search
2007

• 10snodes

Single digit
nodes

Shared
cluster
• 100s nodes
• 1000s +
core
• PB
• CDH2

• Shared
clusters
• 1000s node
• 10,000+ core
• 10s PB
• Wilma (0.20)

• Shared
clusters
• 1000s node
• 10,000+ core
• 10s PB
• Argon (0.22)

• 4k+ node
• 40,000+ core
• 50s PB
• HDP 1.x

HADOOP AT EBAY: PAST

6
ADVENTURES IN FORKING
• 2007-2010: eBay runs shared clusters on Cloudera Distribution of Hadoop
• 2010-2012: eBay runs shared clusters on custom Hadoop versions
– 2010: Wilma (based on 0.20)
– 2011: Argon (based on 0.22)
– 2012: Custom branch abandoned
• Lessons Learned
– Forking a fast-changing open source project is difficult and risky
• Balancing Development and operations needs
• Development team size
– Facebook had 100
– eBay had 15
• Coordination with open source community = lots of overhead
• Divergence from open source: Push changes early and often

HADOOP AT EBAY: PAST

7
HADOOP AT EBAY: PAST

8
EBAY AND HORTONWORKS
• 2012: eBay enters partnership with HortonWorks
– Goals
• Focus on eBay-specific development internally
• Leverage HortonWorks expertise for general Hadoop Development
• Avoid source code divergence by making open source contribution a priority
– Benefits to HortonWorks
• Credibility enhanced by having a well-known customer
• Ability to test at large scale

HADOOP AT EBAY: PAST

9
HADOOP AT EBAY:
PRESENT
Shared and Dedicated Clusters
Job Distribution
Use Case Examples
eBay Data Platform Overview
SHARED AND DEDICATED CLUSTERS
Shared clusters
–
–
–
–
–

10s of PB and 10s of thousands of slots per cluster
Used primarily for analytics of user behavior and inventory
Mix of production and ad-hoc jobs
Mix of MR, Hive, PIG, Cascading etc.
Hadoop and HBase security enabled

Dedicated clusters
–
–
–
–

Very specific use cases like Index Building
Tight SLAs for jobs (in order of minutes)
Immediate revenue impact
Usually smaller than our shared clusters, but still big (100s of nodes…)

HADOOP AT EBAY: PRESENT

11
JOB DISTRIBUTION BY TYPE

HADOOP AT EBAY: PRESENT

12
USE CASE EXAMPLES
•Cassini, eBay’s new search engine:
– Use MR to build full and incremental near-real-time indexes
– Raw Data is stored in HBase for efficient updates and random read
– Strong SLAs: < 10 minutes
– Run on dedicated clusters

•Related and similar Items recommendations:
– Use transactional data, click stream data, search index, etc.
– Production MR jobs on a shared cluster

•Analytics dashboard:
– Run Mobius MR jobs to join click stream data and transactional data
– Store summary data in HBase
– Web application to query HBase

HADOOP AT EBAY: PRESENT

13
HADOOP OPERATIONS
LDAP Integration
- All users stored in Active Directory, accessed via LDAP
- Access to MapReduce Queues granted via MapReduce queues
- Batch users: shared by a group of users
Security
- Kerberos as implemented by Microsoft Active Directory
- One domain for users, another for service/server principals
- Batch users authenticated via keytabs, not passwords
Misc
- 10’s of slave nodes are broken at any given time
- Often need to add several racks of machines at a time

HADOOP AT EBAY: PRESENT

14
HADOOP OPERATIONS
Team has Development and Operations Responsibilities
- 2 Huge shared clusters
- 1800+ users, exponential growth
- About 10 Hadoop developers
- Recently: operations work moved to dedicated team
Developed several tools to manage operations
- Hadoop Management Console: user-facing web app
- ldap-admin: swiss-army knife style tool for hadoop admins
- Puppet: for adding machines to the clusters, many racks at a time
- Decom/Recom scripts: automatic detection, repair, decommission, and
recommission of slave nodes

HADOOP AT EBAY: PRESENT

15
HADOOP MANAGEMENT CONSOLE
• Custom Web application built on Ruby on Rails
• Self-service tools are continually added to reduce support load
– User Management
• Access Requests
• Group Membership
– Batch User Management
• New Requests
• Sudoer management
– Dataset Management
• Explore Datasets
• Request New dataset transfer between Teradata and Hadoop
– Metadata tools
• Each dataset is stored in custom XML format
• Code Generation: Hive Tables, Java POJOs
HADOOP AT EBAY: PRESENT

16
HADOOP AT EBAY: PRESENT

17
HADOOP AT EBAY: PRESENT

18
HADOOP AT EBAY: PRESENT

19
HADOOP AT EBAY: PRESENT

20
HADOOP AT EBAY: PRESENT

21
HADOOP AT EBAY: PRESENT

22
HADOOP AT EBAY: PRESENT

23
ldap-admin
•Command-line tool written in Ruby
•Swiss-army knife tool, features added on demand for support issues
•Often used features:
– Add a user to a group
– View key details for LDAP users and groups
– List all users, batch users, hadoop groups
– Reset batch user passwords and keytabs
– Show/add/remove sudoers for a batch account
– Run user diagnostics: check permissions, keytabs, etc

HADOOP AT EBAY: PRESENT

24
HADOOP AT EBAY:
FUTURE
HDFS Federation
YARN
New Scenarios
Storage and Operational Efficiency
HDFS HA and Federation
• HDFS High-Availability for Reliability
– NameNode in Hadoop 1.0 is a Single Point of Failure
– Automated failover to hot standby
– Depends on ZooKeeper
• HDFS Federation for Scalability and Isolation
– Hadoop 1.0: Single NameNode service
• “Secondary NameNode” is not for failover
• Storage scales horizontally, but Namespace scales vertically
• No isolation for different tenants or applications
– Hadoop 2.0: HDFS Federation
• Partition the HDFS Namespace
• Many independent NameNodes
• Allows direct access to Block Storage w/o going through HDFS interface

HADOOP AT EBAY: FUTURE

26
HDFS HA

HADOOP AT EBAY: FUTURE

27
HDFS HA

HADOOP AT EBAY: FUTURE

28
HDFS HA

HADOOP AT EBAY: FUTURE

29
HDFS Federation
Horizontal Scalability of HDFS Namespace
Multiple independent NameNodes serving a subtree of the NameSpace

Example: NN1 provides /users, NN2 provides /reports

HADOOP AT EBAY: FUTURE

30
YARN
Hadoop 1.0: MapReduce
– JobTracker and TaskTracker services
– Handles Resource Management, Job Execution

Hadoop 2.0: YARN
- Refactoring Responsiblities of JobTracker and TaskTracker into more general
platform
- Global ResourceManager
- Cluster-wide resource managements
- Per-application ApplicationMaster
- Application-specific job control

HADOOP AT EBAY: FUTURE

31
YARN

HADOOP AT EBAY: FUTURE

32
YARN

HADOOP AT EBAY: FUTURE

33
YARN

HADOOP AT EBAY: FUTURE

34
YARN

HADOOP AT EBAY: FUTURE

35
New Scenarios
• Iterative Query
– Stinger (Hive), Impala, etc
– Rapid Data exploration and analysis
• Graph Databases
– TitanDB, Giraph
– Billions of vertices and edges
– Complex Graph Traversals
– Applications: PayPal fraud detection, Social Graph Analysis
• Real-Time Processing
– Storm (Twitter), Apache S4
– Reinforcement Learning, Monitoring

HADOOP AT EBAY: FUTURE

36
Efficiency and Reliability
• Storage Efficiency
– HDFS introduces a 3x storage cost for its replicas
– HDFS-RAID: more reliability for 1.5x storage cost
• Reed-Solomon
• Locally Repairable Codes (Project Xorbas)
– Tradeoff: the cost of repairing lost data is much higher
• Operational Efficiency
– More automation
– More self-service tools
– Better Monitoring

HADOOP AT EBAY: FUTURE

37
Open Source
• HMC Metadata
– Long term goal: standardize on open source technologies (HCatalog)
– Short term: explore what should be open sourced
• Hadoop Management Console
– Hadoop Access Request Automation
– Batch user creation and management
– Metadata management
– Code generation of dataset to Hive tables and Java POJOs
• ldap_admin tools
– Very useful but tightly coupled to eBay’s LDAP configuration
– Willing to open source if there is interest

HADOOP AT EBAY: FUTURE

38
THANK YOU
Questions?

More Related Content

What's hot

Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
Vikram Shinde
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
DataWorks Summit
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
Hadoop and friends : introduction
Hadoop and friends : introductionHadoop and friends : introduction
Hadoop and friends : introductionfredcons
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
Mateusz Buśkiewicz
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
Cisco Canada
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
Sudipta Ghosh
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Spark Summit
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
Atul Kushwaha
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
Amr Alaa Yassen
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
ALTEN Calsoft Labs
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
Tony Ng
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
DataWorks Summit
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
Sandeep Patil
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
DataWorks Summit
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
elliando dias
 

What's hot (20)

Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
Hadoop and friends : introduction
Hadoop and friends : introductionHadoop and friends : introduction
Hadoop and friends : introduction
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Hadoop vs Apache Spark
Hadoop vs Apache SparkHadoop vs Apache Spark
Hadoop vs Apache Spark
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 

Viewers also liked

WSO2 and 2 Degrees Case Study
WSO2 and 2 Degrees Case StudyWSO2 and 2 Degrees Case Study
WSO2 and 2 Degrees Case Study
WSO2
 
WSO2 & eBay Case Study
WSO2 & eBay Case StudyWSO2 & eBay Case Study
WSO2 & eBay Case Study
WSO2
 
Hadoop at eBay
Hadoop at eBayHadoop at eBay
Hadoop at eBay
Shalini Madan
 
Past Time
Past TimePast Time
E bay presentation
E bay presentationE bay presentation
E bay presentation
Josh Tullo
 
Big Data Viz (and much more!) with Apache Zeppelin
Big Data Viz (and much more!) with Apache ZeppelinBig Data Viz (and much more!) with Apache Zeppelin
Big Data Viz (and much more!) with Apache Zeppelin
Bruno Bonnin
 
Explorez vos données avec apache zeppelin
Explorez vos données avec apache zeppelinExplorez vos données avec apache zeppelin
Explorez vos données avec apache zeppelin
Bruno Bonnin
 
eBay Case Study
eBay Case StudyeBay Case Study
eBay Case Study
Sarath Chandra Nittala
 
The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedIn
Carl Steinbach
 
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
Carl Steinbach
 
Ebay presentation
Ebay presentationEbay presentation
Path to 400M Members: LinkedIn’s Data Powered Journey
Path to 400M Members: LinkedIn’s Data Powered JourneyPath to 400M Members: LinkedIn’s Data Powered Journey
Path to 400M Members: LinkedIn’s Data Powered Journey
DataWorks Summit/Hadoop Summit
 
Data Science : Méthodologie, Outillage et Application - MS Cloud Summit Paris...
Data Science : Méthodologie, Outillage et Application - MS Cloud Summit Paris...Data Science : Méthodologie, Outillage et Application - MS Cloud Summit Paris...
Data Science : Méthodologie, Outillage et Application - MS Cloud Summit Paris...
Jean-Pierre Riehl
 
Ebay presentation
Ebay presentationEbay presentation
Ebay presentation
Jenna Trego
 
Powerpoint Presentation on eBay.com
Powerpoint Presentation on eBay.comPowerpoint Presentation on eBay.com
Powerpoint Presentation on eBay.com
myclass08
 
ebay Case Study
ebay Case Studyebay Case Study
Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016
Arun Karthick Manoharan
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
Apache flink - prise en main rapide
Apache flink - prise en main rapideApache flink - prise en main rapide
Apache flink - prise en main rapide
Bilal Baltagi
 
Explorez vos données présentes dans MongoDB avec Apache Zeppelin
Explorez vos données présentes dans MongoDB avec Apache ZeppelinExplorez vos données présentes dans MongoDB avec Apache Zeppelin
Explorez vos données présentes dans MongoDB avec Apache Zeppelin
Bruno Bonnin
 

Viewers also liked (20)

WSO2 and 2 Degrees Case Study
WSO2 and 2 Degrees Case StudyWSO2 and 2 Degrees Case Study
WSO2 and 2 Degrees Case Study
 
WSO2 & eBay Case Study
WSO2 & eBay Case StudyWSO2 & eBay Case Study
WSO2 & eBay Case Study
 
Hadoop at eBay
Hadoop at eBayHadoop at eBay
Hadoop at eBay
 
Past Time
Past TimePast Time
Past Time
 
E bay presentation
E bay presentationE bay presentation
E bay presentation
 
Big Data Viz (and much more!) with Apache Zeppelin
Big Data Viz (and much more!) with Apache ZeppelinBig Data Viz (and much more!) with Apache Zeppelin
Big Data Viz (and much more!) with Apache Zeppelin
 
Explorez vos données avec apache zeppelin
Explorez vos données avec apache zeppelinExplorez vos données avec apache zeppelin
Explorez vos données avec apache zeppelin
 
eBay Case Study
eBay Case StudyeBay Case Study
eBay Case Study
 
The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedIn
 
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
 
Ebay presentation
Ebay presentationEbay presentation
Ebay presentation
 
Path to 400M Members: LinkedIn’s Data Powered Journey
Path to 400M Members: LinkedIn’s Data Powered JourneyPath to 400M Members: LinkedIn’s Data Powered Journey
Path to 400M Members: LinkedIn’s Data Powered Journey
 
Data Science : Méthodologie, Outillage et Application - MS Cloud Summit Paris...
Data Science : Méthodologie, Outillage et Application - MS Cloud Summit Paris...Data Science : Méthodologie, Outillage et Application - MS Cloud Summit Paris...
Data Science : Méthodologie, Outillage et Application - MS Cloud Summit Paris...
 
Ebay presentation
Ebay presentationEbay presentation
Ebay presentation
 
Powerpoint Presentation on eBay.com
Powerpoint Presentation on eBay.comPowerpoint Presentation on eBay.com
Powerpoint Presentation on eBay.com
 
ebay Case Study
ebay Case Studyebay Case Study
ebay Case Study
 
Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Apache flink - prise en main rapide
Apache flink - prise en main rapideApache flink - prise en main rapide
Apache flink - prise en main rapide
 
Explorez vos données présentes dans MongoDB avec Apache Zeppelin
Explorez vos données présentes dans MongoDB avec Apache ZeppelinExplorez vos données présentes dans MongoDB avec Apache Zeppelin
Explorez vos données présentes dans MongoDB avec Apache Zeppelin
 

Similar to Hadoop @ eBay: Past, Present, and Future

Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
Hortonworks
 
A glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataA glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big Data
Saurav Kumar Sinha
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
Steve Staso
 
Apache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other VersionsApache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other Versions
Konstantin V. Shvachko
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
Brian Enochson
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Eric Baldeschwieler
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
huguk
 
Hadoop
HadoopHadoop
Hadoop
Oded Rotter
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Lester Martin
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
Hadoop online training
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
KMS Technology
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
Thomas W. Dinsmore
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
Kibrom Gebrehiwot
 
Big Data A La Carte Menu
Big Data A La Carte MenuBig Data A La Carte Menu
Big Data A La Carte Menu
Venkatesh Balakumar
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
arslanhaneef
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
sonukumar379092
 
Hadoop
HadoopHadoop
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
raghavanand36
 

Similar to Hadoop @ eBay: Past, Present, and Future (20)

Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
A glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataA glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big Data
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Apache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other VersionsApache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other Versions
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Big Data A La Carte Menu
Big Data A La Carte MenuBig Data A La Carte Menu
Big Data A La Carte Menu
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 

Recently uploaded

How MJ Global Leads the Packaging Industry.pdf
How MJ Global Leads the Packaging Industry.pdfHow MJ Global Leads the Packaging Industry.pdf
How MJ Global Leads the Packaging Industry.pdf
MJ Global
 
Chapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .pptChapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .ppt
ssuser567e2d
 
Understanding User Needs and Satisfying Them
Understanding User Needs and Satisfying ThemUnderstanding User Needs and Satisfying Them
Understanding User Needs and Satisfying Them
Aggregage
 
The Genesis of BriansClub.cm Famous Dark WEb Platform
The Genesis of BriansClub.cm Famous Dark WEb PlatformThe Genesis of BriansClub.cm Famous Dark WEb Platform
The Genesis of BriansClub.cm Famous Dark WEb Platform
SabaaSudozai
 
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
my Pandit
 
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Zodiac Signs and Food Preferences_ What Your Sign Says About Your TasteZodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
my Pandit
 
2022 Vintage Roman Numerals Men Rings
2022 Vintage Roman  Numerals  Men  Rings2022 Vintage Roman  Numerals  Men  Rings
2022 Vintage Roman Numerals Men Rings
aragme
 
Best Forex Brokers Comparison in INDIA 2024
Best Forex Brokers Comparison in INDIA 2024Best Forex Brokers Comparison in INDIA 2024
Best Forex Brokers Comparison in INDIA 2024
Top Forex Brokers Review
 
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
SOFTTECHHUB
 
Industrial Tech SW: Category Renewal and Creation
Industrial Tech SW:  Category Renewal and CreationIndustrial Tech SW:  Category Renewal and Creation
Industrial Tech SW: Category Renewal and Creation
Christian Dahlen
 
The 10 Most Influential Leaders Guiding Corporate Evolution, 2024.pdf
The 10 Most Influential Leaders Guiding Corporate Evolution, 2024.pdfThe 10 Most Influential Leaders Guiding Corporate Evolution, 2024.pdf
The 10 Most Influential Leaders Guiding Corporate Evolution, 2024.pdf
thesiliconleaders
 
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel ChartSatta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
 
Authentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto RicoAuthentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto Rico
Corey Perlman, Social Media Speaker and Consultant
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
bosssp10
 
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdfHOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf
46adnanshahzad
 
Best practices for project execution and delivery
Best practices for project execution and deliveryBest practices for project execution and delivery
Best practices for project execution and delivery
CLIVE MINCHIN
 
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
Lacey Max
 
BeMetals Investor Presentation_June 1, 2024.pdf
BeMetals Investor Presentation_June 1, 2024.pdfBeMetals Investor Presentation_June 1, 2024.pdf
BeMetals Investor Presentation_June 1, 2024.pdf
DerekIwanaka1
 
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challengesEvent Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Holger Mueller
 
Building Your Employer Brand with Social Media
Building Your Employer Brand with Social MediaBuilding Your Employer Brand with Social Media
Building Your Employer Brand with Social Media
LuanWise
 

Recently uploaded (20)

How MJ Global Leads the Packaging Industry.pdf
How MJ Global Leads the Packaging Industry.pdfHow MJ Global Leads the Packaging Industry.pdf
How MJ Global Leads the Packaging Industry.pdf
 
Chapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .pptChapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .ppt
 
Understanding User Needs and Satisfying Them
Understanding User Needs and Satisfying ThemUnderstanding User Needs and Satisfying Them
Understanding User Needs and Satisfying Them
 
The Genesis of BriansClub.cm Famous Dark WEb Platform
The Genesis of BriansClub.cm Famous Dark WEb PlatformThe Genesis of BriansClub.cm Famous Dark WEb Platform
The Genesis of BriansClub.cm Famous Dark WEb Platform
 
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
 
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Zodiac Signs and Food Preferences_ What Your Sign Says About Your TasteZodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
 
2022 Vintage Roman Numerals Men Rings
2022 Vintage Roman  Numerals  Men  Rings2022 Vintage Roman  Numerals  Men  Rings
2022 Vintage Roman Numerals Men Rings
 
Best Forex Brokers Comparison in INDIA 2024
Best Forex Brokers Comparison in INDIA 2024Best Forex Brokers Comparison in INDIA 2024
Best Forex Brokers Comparison in INDIA 2024
 
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
 
Industrial Tech SW: Category Renewal and Creation
Industrial Tech SW:  Category Renewal and CreationIndustrial Tech SW:  Category Renewal and Creation
Industrial Tech SW: Category Renewal and Creation
 
The 10 Most Influential Leaders Guiding Corporate Evolution, 2024.pdf
The 10 Most Influential Leaders Guiding Corporate Evolution, 2024.pdfThe 10 Most Influential Leaders Guiding Corporate Evolution, 2024.pdf
The 10 Most Influential Leaders Guiding Corporate Evolution, 2024.pdf
 
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel ChartSatta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
 
Authentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto RicoAuthentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto Rico
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
 
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdfHOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf
 
Best practices for project execution and delivery
Best practices for project execution and deliveryBest practices for project execution and delivery
Best practices for project execution and delivery
 
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
 
BeMetals Investor Presentation_June 1, 2024.pdf
BeMetals Investor Presentation_June 1, 2024.pdfBeMetals Investor Presentation_June 1, 2024.pdf
BeMetals Investor Presentation_June 1, 2024.pdf
 
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challengesEvent Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
 
Building Your Employer Brand with Social Media
Building Your Employer Brand with Social MediaBuilding Your Employer Brand with Social Media
Building Your Employer Brand with Social Media
 

Hadoop @ eBay: Past, Present, and Future

  • 1. Hadoop @ eBay: Past, Present and Future Ryan Hennig Hadoop Platform Team
  • 3. RYAN HENNIG Born and raised in Seattle, WA Studied Computer Science at University of Washington in Seattle Worked on Microsoft SQL Server 2006 – 2012 - Shipped SQL Server 2008, 2008 R2, 2012 Joined eBay Hadoop team in early 2012 - Based in Bellevue, suburb of Seattle COMPUTE AND DATA INFRASTRUCTURE 3
  • 4. AGENDA Past: Growth of Hadoop at eBay Present: Hadoop Use Cases, Operations Tools Future: Hadoop 2.0
  • 5. HADOOP AT EBAY: PAST Growth of Hadoop at eBay Adventures in Forking Partnership with Hortonworks
  • 6. HADOOP EVOLUTION @ eBay 2013 • Shared clusters 2012 2011 2010 2009 Search 2007 • 10snodes Single digit nodes Shared cluster • 100s nodes • 1000s + core • PB • CDH2 • Shared clusters • 1000s node • 10,000+ core • 10s PB • Wilma (0.20) • Shared clusters • 1000s node • 10,000+ core • 10s PB • Argon (0.22) • 4k+ node • 40,000+ core • 50s PB • HDP 1.x HADOOP AT EBAY: PAST 6
  • 7. ADVENTURES IN FORKING • 2007-2010: eBay runs shared clusters on Cloudera Distribution of Hadoop • 2010-2012: eBay runs shared clusters on custom Hadoop versions – 2010: Wilma (based on 0.20) – 2011: Argon (based on 0.22) – 2012: Custom branch abandoned • Lessons Learned – Forking a fast-changing open source project is difficult and risky • Balancing Development and operations needs • Development team size – Facebook had 100 – eBay had 15 • Coordination with open source community = lots of overhead • Divergence from open source: Push changes early and often HADOOP AT EBAY: PAST 7
  • 9. EBAY AND HORTONWORKS • 2012: eBay enters partnership with HortonWorks – Goals • Focus on eBay-specific development internally • Leverage HortonWorks expertise for general Hadoop Development • Avoid source code divergence by making open source contribution a priority – Benefits to HortonWorks • Credibility enhanced by having a well-known customer • Ability to test at large scale HADOOP AT EBAY: PAST 9
  • 10. HADOOP AT EBAY: PRESENT Shared and Dedicated Clusters Job Distribution Use Case Examples eBay Data Platform Overview
  • 11. SHARED AND DEDICATED CLUSTERS Shared clusters – – – – – 10s of PB and 10s of thousands of slots per cluster Used primarily for analytics of user behavior and inventory Mix of production and ad-hoc jobs Mix of MR, Hive, PIG, Cascading etc. Hadoop and HBase security enabled Dedicated clusters – – – – Very specific use cases like Index Building Tight SLAs for jobs (in order of minutes) Immediate revenue impact Usually smaller than our shared clusters, but still big (100s of nodes…) HADOOP AT EBAY: PRESENT 11
  • 12. JOB DISTRIBUTION BY TYPE HADOOP AT EBAY: PRESENT 12
  • 13. USE CASE EXAMPLES •Cassini, eBay’s new search engine: – Use MR to build full and incremental near-real-time indexes – Raw Data is stored in HBase for efficient updates and random read – Strong SLAs: < 10 minutes – Run on dedicated clusters •Related and similar Items recommendations: – Use transactional data, click stream data, search index, etc. – Production MR jobs on a shared cluster •Analytics dashboard: – Run Mobius MR jobs to join click stream data and transactional data – Store summary data in HBase – Web application to query HBase HADOOP AT EBAY: PRESENT 13
  • 14. HADOOP OPERATIONS LDAP Integration - All users stored in Active Directory, accessed via LDAP - Access to MapReduce Queues granted via MapReduce queues - Batch users: shared by a group of users Security - Kerberos as implemented by Microsoft Active Directory - One domain for users, another for service/server principals - Batch users authenticated via keytabs, not passwords Misc - 10’s of slave nodes are broken at any given time - Often need to add several racks of machines at a time HADOOP AT EBAY: PRESENT 14
  • 15. HADOOP OPERATIONS Team has Development and Operations Responsibilities - 2 Huge shared clusters - 1800+ users, exponential growth - About 10 Hadoop developers - Recently: operations work moved to dedicated team Developed several tools to manage operations - Hadoop Management Console: user-facing web app - ldap-admin: swiss-army knife style tool for hadoop admins - Puppet: for adding machines to the clusters, many racks at a time - Decom/Recom scripts: automatic detection, repair, decommission, and recommission of slave nodes HADOOP AT EBAY: PRESENT 15
  • 16. HADOOP MANAGEMENT CONSOLE • Custom Web application built on Ruby on Rails • Self-service tools are continually added to reduce support load – User Management • Access Requests • Group Membership – Batch User Management • New Requests • Sudoer management – Dataset Management • Explore Datasets • Request New dataset transfer between Teradata and Hadoop – Metadata tools • Each dataset is stored in custom XML format • Code Generation: Hive Tables, Java POJOs HADOOP AT EBAY: PRESENT 16
  • 17. HADOOP AT EBAY: PRESENT 17
  • 18. HADOOP AT EBAY: PRESENT 18
  • 19. HADOOP AT EBAY: PRESENT 19
  • 20. HADOOP AT EBAY: PRESENT 20
  • 21. HADOOP AT EBAY: PRESENT 21
  • 22. HADOOP AT EBAY: PRESENT 22
  • 23. HADOOP AT EBAY: PRESENT 23
  • 24. ldap-admin •Command-line tool written in Ruby •Swiss-army knife tool, features added on demand for support issues •Often used features: – Add a user to a group – View key details for LDAP users and groups – List all users, batch users, hadoop groups – Reset batch user passwords and keytabs – Show/add/remove sudoers for a batch account – Run user diagnostics: check permissions, keytabs, etc HADOOP AT EBAY: PRESENT 24
  • 25. HADOOP AT EBAY: FUTURE HDFS Federation YARN New Scenarios Storage and Operational Efficiency
  • 26. HDFS HA and Federation • HDFS High-Availability for Reliability – NameNode in Hadoop 1.0 is a Single Point of Failure – Automated failover to hot standby – Depends on ZooKeeper • HDFS Federation for Scalability and Isolation – Hadoop 1.0: Single NameNode service • “Secondary NameNode” is not for failover • Storage scales horizontally, but Namespace scales vertically • No isolation for different tenants or applications – Hadoop 2.0: HDFS Federation • Partition the HDFS Namespace • Many independent NameNodes • Allows direct access to Block Storage w/o going through HDFS interface HADOOP AT EBAY: FUTURE 26
  • 27. HDFS HA HADOOP AT EBAY: FUTURE 27
  • 28. HDFS HA HADOOP AT EBAY: FUTURE 28
  • 29. HDFS HA HADOOP AT EBAY: FUTURE 29
  • 30. HDFS Federation Horizontal Scalability of HDFS Namespace Multiple independent NameNodes serving a subtree of the NameSpace Example: NN1 provides /users, NN2 provides /reports HADOOP AT EBAY: FUTURE 30
  • 31. YARN Hadoop 1.0: MapReduce – JobTracker and TaskTracker services – Handles Resource Management, Job Execution Hadoop 2.0: YARN - Refactoring Responsiblities of JobTracker and TaskTracker into more general platform - Global ResourceManager - Cluster-wide resource managements - Per-application ApplicationMaster - Application-specific job control HADOOP AT EBAY: FUTURE 31
  • 32. YARN HADOOP AT EBAY: FUTURE 32
  • 33. YARN HADOOP AT EBAY: FUTURE 33
  • 34. YARN HADOOP AT EBAY: FUTURE 34
  • 35. YARN HADOOP AT EBAY: FUTURE 35
  • 36. New Scenarios • Iterative Query – Stinger (Hive), Impala, etc – Rapid Data exploration and analysis • Graph Databases – TitanDB, Giraph – Billions of vertices and edges – Complex Graph Traversals – Applications: PayPal fraud detection, Social Graph Analysis • Real-Time Processing – Storm (Twitter), Apache S4 – Reinforcement Learning, Monitoring HADOOP AT EBAY: FUTURE 36
  • 37. Efficiency and Reliability • Storage Efficiency – HDFS introduces a 3x storage cost for its replicas – HDFS-RAID: more reliability for 1.5x storage cost • Reed-Solomon • Locally Repairable Codes (Project Xorbas) – Tradeoff: the cost of repairing lost data is much higher • Operational Efficiency – More automation – More self-service tools – Better Monitoring HADOOP AT EBAY: FUTURE 37
  • 38. Open Source • HMC Metadata – Long term goal: standardize on open source technologies (HCatalog) – Short term: explore what should be open sourced • Hadoop Management Console – Hadoop Access Request Automation – Batch user creation and management – Metadata management – Code generation of dataset to Hive tables and Java POJOs • ldap_admin tools – Very useful but tightly coupled to eBay’s LDAP configuration – Willing to open source if there is interest HADOOP AT EBAY: FUTURE 38