SlideShare a Scribd company logo
1 of 19
Hadoop @ eBay Marketplaces
Ming Ma
June 27th, 2013
Overview
• Hadoop growth @ eBay Marketplaces
• Availability study
• Opportunities ahead
Big Data @ eBay Marketplaces
120+ Million Active users
300+ Million search queries every single day
350+ Million items available
hadoop @ eBay Marketplaces 3
Data Sets
•Inventory Data
– Product Listings, Catalogue, Quantity etc.
•Transactional Data
– Buying, Returning etc.
•User Behavioral Data
– Click stream, comments, suggestions, user activities etc.
•Customer profiles
– Buyer, Seller, Partner information etc.
•Machine data
– Logs, application data etc.
hadoop @ eBay Marketplaces 4
Hadoop Evolution @ eBay Marketplaces
2007
Single digit
nodes
2010
Shared
cluster
• 100s nodes
• 1000s +
core
• PB
• CDH2
2011
• Shared
clusters
• 1000s node
• 10,000+ core
• 10s PB
• Wilma (0.20)
2012
• Shared
clusters
• 1000s node
• 10,000+ core
• 10s PB
2013
• Shared
clusters
• 4k+ node
• 40,000+ core
• 50s PB
• HDP
2009
Search
• 10s-
nodes
hadoop @ eBay Marketplaces 5
Shared vs. Dedicated Clusters
Shared clusters
– 10s of PB and 10s of thousands of slots per cluster
– Run HDP 1.2
– Used primarily for analytics of user behavior and inventory
– Mix of production and ad-hoc jobs
– Mix of MR, Hive, PIG, Cascading etc.
– Hadoop and HBase security enabled
Dedicated clusters
– Very specific use cases like Index Building
– Tight SLAs for jobs (in order of minutes)
– Immediate revenue impact
– Usually smaller than our shared clusters, but still big (100s of nodes…)
hadoop @ eBay Marketplaces 6
Job Distribution by Type
hadoop @ eBay Marketplaces 7
Use Case Examples
•Cassini, full re-write of eBay’s search engine:
– Use MR to build full and incremental near-real-time indexes
– Data for indexing is stored in HBase for efficient updates and random read
– Strong SLAs
– Run on dedicated clusters
•Related and similar Items recommendations:
– Use transactional data, click stream data, search index, etc.
– Production MR jobs on a shared cluster
•Analytics dashboard:
– Run Mobius MR jobs to join click stream data and transactional data
– Store summary data in HBase
– Web application to query HBase
hadoop @ eBay Marketplaces 8
eBay Hadoop Data Platform
hadoop @ eBay Marketplaces 9
Data Ingest
Extract
Load Validate
Transform
Clients
Java
Scala
Pig
Hive Cascading
Mobius
Hadoop Behavioral Transactional Inventory
Metadata Metastore Type System ServiceAPI
Data Access
Java POJO
Pig UDF
Hive UDF
Tools
ETL Monitor
Metadata Mgmt
Data Catalog
User Mgmt
Platform Innovation
•Many reliability improvements
•New Security features
– Multi-realm support
– Encryption
– https in hadoop 1
•Hadoop 2.0
– MR 1 and YARN binary compatibility
•Automation for operations
– Machine decommission and re-commission process
•Data and user management
– Metadata management
– User account provisioning
hadoop @ eBay Marketplaces 10
Overview
• Hadoop growth @ eBay
• Availability study
• Next steps
Case study – defective applications
•HBase: A test app created heavy write load
– Test app used all region server RPC threads
– All RPCs are blocked by region flush
– RPC requests from production HBase MR job timed out
•HDFS: An app created lots of small files inside map tasks
– NN RPC Queue length spiked
– DN heartbeat RPC can’t be processed
– HDFS replication storm
hadoop @ eBay Marketplaces 12
Case study – platform bugs
•Hadoop:
– DFSClient.LeaseChecker thread leak in job tracker -> bi-weekly JT restart
– dfs.datanode.balance.bandwidthPerSec set to 200MB -> big performance impact
•JVM:
– leap second bug -> All clusters were down the same time
– GC setting -> NN full GC happened regularly
•OS:
– “Divide by zero” in CentOS and RH 6.1 -> machine reboot
hadoop @ eBay Marketplaces 13
Case study – cluster maintenance
•Code rollout:
– NN SPOF
– RPC compatibility between old and new versions
•Hadoop configuration change:
– Likely required Hadoop JVM restart
– Rolling restart has impact on job latency
– Datanode rolling restart caused HBase region servers to exit
•Machines re-commission:
– Hadoop version drift
– OS configuration bug reappeared
hadoop @ eBay Marketplaces 14
Metrics
•Definition:
– Availability = MTBF ( mean time between failure ) / MTBF + MDT ( mean down time )
– Down time includes planned maintenance
•Measurement:
– Synthetic transaction approach
– Run regular canary work count MR job
– Canary job times out in X minutes
hadoop @ eBay Marketplaces 15
More about metrics
•Availability != MTTR ( mean time to recover )
– MTTR is more important for applications like Cassini index build
•What is considered “available”?
– Performance degradation
– % of live slave nodes
– Other entry points such as Web UI
– Core data set availability
– Multi-tenancy scenario
hadoop @ eBay Marketplaces 16
Ways to improve availability
•Automation
– Use puppet and daemontools
– Monitor system health
•Redundancy
– Namenode HA
– Hot standby region server
•Isolation
– HDFS federation
– Region server grouping
•Congestion control
– RPC congestion control, Hadoop-9640
– Apply to both HDFS and HBase
•Features to enable “no downtime maintenance”
– Dynamic configuration update
– RPC compatibility
– Better ways to do rolling restart
hadoop @ eBay Marketplaces 17
Overview
• Hadoop growth @ eBay
• Availability study
• Next steps
Opportunities ahead
•More automation
•Availability and scalability
– Hadoop 2.0
– HBase fast recovery time
•Multi-tenancy
– Run production jobs with strong SLAs in big shared clusters
– QoS in HDFS and HBase
•New scenarios
– Interactive Analysis with SQL language
– Direct Hadoop Access from dev machines
hadoop @ eBay Marketplaces 19

More Related Content

What's hot

Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data FabricAlan McSweeney
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemPetr Novotný
 
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Con LA
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...Salesforce Engineering
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
Keynote: Art of the Possible - Chandra Rangan
Keynote: Art of the Possible - Chandra RanganKeynote: Art of the Possible - Chandra Rangan
Keynote: Art of the Possible - Chandra RanganNeo4j
 
Power BI On AIR - Melissa Coates: "What You Need to Know to Administer Power BI"
Power BI On AIR - Melissa Coates: "What You Need to Know to Administer Power BI"Power BI On AIR - Melissa Coates: "What You Need to Know to Administer Power BI"
Power BI On AIR - Melissa Coates: "What You Need to Know to Administer Power BI"Bohdan Maherus
 

What's hot (20)

Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
Big Data
Big DataBig Data
Big Data
 
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
High Scale Relational Storage at Salesforce Built with Apache HBase and Apach...
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
It service management
It service managementIt service management
It service management
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
IaaS and PaaS
IaaS and PaaSIaaS and PaaS
IaaS and PaaS
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Keynote: Art of the Possible - Chandra Rangan
Keynote: Art of the Possible - Chandra RanganKeynote: Art of the Possible - Chandra Rangan
Keynote: Art of the Possible - Chandra Rangan
 
Power BI On AIR - Melissa Coates: "What You Need to Know to Administer Power BI"
Power BI On AIR - Melissa Coates: "What You Need to Know to Administer Power BI"Power BI On AIR - Melissa Coates: "What You Need to Know to Administer Power BI"
Power BI On AIR - Melissa Coates: "What You Need to Know to Administer Power BI"
 

Viewers also liked

Process of Inventory management & control
Process of Inventory management & controlProcess of Inventory management & control
Process of Inventory management & controlRashmiranjan Das
 
Inventory control & management
Inventory control & managementInventory control & management
Inventory control & managementGoa App
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesData Con LA
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Alex Levenson
 
What You Should Know About Buying A Lake House
What You Should Know About Buying A Lake HouseWhat You Should Know About Buying A Lake House
What You Should Know About Buying A Lake HouseTrusted Choice
 
3.3.3.4 lab using wireshark to view network traffic
3.3.3.4 lab   using wireshark to view network traffic3.3.3.4 lab   using wireshark to view network traffic
3.3.3.4 lab using wireshark to view network trafficAransues
 
Coinsurance & Builder's Risk Insurance
Coinsurance & Builder's Risk InsuranceCoinsurance & Builder's Risk Insurance
Coinsurance & Builder's Risk InsuranceSeth Row
 
Teradata Demand Chain Management (DCM): Version 4
Teradata Demand Chain Management (DCM): Version 4Teradata Demand Chain Management (DCM): Version 4
Teradata Demand Chain Management (DCM): Version 4Teradata
 

Viewers also liked (12)

Process of Inventory management & control
Process of Inventory management & controlProcess of Inventory management & control
Process of Inventory management & control
 
Inventory control & management
Inventory control & managementInventory control & management
Inventory control & management
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use Cases
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Comparision
ComparisionComparision
Comparision
 
Lapsed policy
Lapsed policyLapsed policy
Lapsed policy
 
Pets Health Insurance
Pets Health InsurancePets Health Insurance
Pets Health Insurance
 
What You Should Know About Buying A Lake House
What You Should Know About Buying A Lake HouseWhat You Should Know About Buying A Lake House
What You Should Know About Buying A Lake House
 
3.3.3.4 lab using wireshark to view network traffic
3.3.3.4 lab   using wireshark to view network traffic3.3.3.4 lab   using wireshark to view network traffic
3.3.3.4 lab using wireshark to view network traffic
 
Food & Beverage Liability Insurance
Food & Beverage Liability InsuranceFood & Beverage Liability Insurance
Food & Beverage Liability Insurance
 
Coinsurance & Builder's Risk Insurance
Coinsurance & Builder's Risk InsuranceCoinsurance & Builder's Risk Insurance
Coinsurance & Builder's Risk Insurance
 
Teradata Demand Chain Management (DCM): Version 4
Teradata Demand Chain Management (DCM): Version 4Teradata Demand Chain Management (DCM): Version 4
Teradata Demand Chain Management (DCM): Version 4
 

Similar to Hadoop and HBase @eBay

DC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion DaysDC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion DaysRahul Agarwal
 
A Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemA Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemSerendio Inc.
 
A Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop EcosystemA Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop EcosystemDataWorks Summit
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectHadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectSoftServe
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experienceVitaliy Bashun
 
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...DataWorks Summit
 

Similar to Hadoop and HBase @eBay (20)

Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
DC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion DaysDC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion Days
 
A Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemA Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop Ecosystem
 
A Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop EcosystemA Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop Ecosystem
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Big data
Big dataBig data
Big data
 
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectHadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon Valley
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Hadoop and HBase @eBay

  • 1. Hadoop @ eBay Marketplaces Ming Ma June 27th, 2013
  • 2. Overview • Hadoop growth @ eBay Marketplaces • Availability study • Opportunities ahead
  • 3. Big Data @ eBay Marketplaces 120+ Million Active users 300+ Million search queries every single day 350+ Million items available hadoop @ eBay Marketplaces 3
  • 4. Data Sets •Inventory Data – Product Listings, Catalogue, Quantity etc. •Transactional Data – Buying, Returning etc. •User Behavioral Data – Click stream, comments, suggestions, user activities etc. •Customer profiles – Buyer, Seller, Partner information etc. •Machine data – Logs, application data etc. hadoop @ eBay Marketplaces 4
  • 5. Hadoop Evolution @ eBay Marketplaces 2007 Single digit nodes 2010 Shared cluster • 100s nodes • 1000s + core • PB • CDH2 2011 • Shared clusters • 1000s node • 10,000+ core • 10s PB • Wilma (0.20) 2012 • Shared clusters • 1000s node • 10,000+ core • 10s PB 2013 • Shared clusters • 4k+ node • 40,000+ core • 50s PB • HDP 2009 Search • 10s- nodes hadoop @ eBay Marketplaces 5
  • 6. Shared vs. Dedicated Clusters Shared clusters – 10s of PB and 10s of thousands of slots per cluster – Run HDP 1.2 – Used primarily for analytics of user behavior and inventory – Mix of production and ad-hoc jobs – Mix of MR, Hive, PIG, Cascading etc. – Hadoop and HBase security enabled Dedicated clusters – Very specific use cases like Index Building – Tight SLAs for jobs (in order of minutes) – Immediate revenue impact – Usually smaller than our shared clusters, but still big (100s of nodes…) hadoop @ eBay Marketplaces 6
  • 7. Job Distribution by Type hadoop @ eBay Marketplaces 7
  • 8. Use Case Examples •Cassini, full re-write of eBay’s search engine: – Use MR to build full and incremental near-real-time indexes – Data for indexing is stored in HBase for efficient updates and random read – Strong SLAs – Run on dedicated clusters •Related and similar Items recommendations: – Use transactional data, click stream data, search index, etc. – Production MR jobs on a shared cluster •Analytics dashboard: – Run Mobius MR jobs to join click stream data and transactional data – Store summary data in HBase – Web application to query HBase hadoop @ eBay Marketplaces 8
  • 9. eBay Hadoop Data Platform hadoop @ eBay Marketplaces 9 Data Ingest Extract Load Validate Transform Clients Java Scala Pig Hive Cascading Mobius Hadoop Behavioral Transactional Inventory Metadata Metastore Type System ServiceAPI Data Access Java POJO Pig UDF Hive UDF Tools ETL Monitor Metadata Mgmt Data Catalog User Mgmt
  • 10. Platform Innovation •Many reliability improvements •New Security features – Multi-realm support – Encryption – https in hadoop 1 •Hadoop 2.0 – MR 1 and YARN binary compatibility •Automation for operations – Machine decommission and re-commission process •Data and user management – Metadata management – User account provisioning hadoop @ eBay Marketplaces 10
  • 11. Overview • Hadoop growth @ eBay • Availability study • Next steps
  • 12. Case study – defective applications •HBase: A test app created heavy write load – Test app used all region server RPC threads – All RPCs are blocked by region flush – RPC requests from production HBase MR job timed out •HDFS: An app created lots of small files inside map tasks – NN RPC Queue length spiked – DN heartbeat RPC can’t be processed – HDFS replication storm hadoop @ eBay Marketplaces 12
  • 13. Case study – platform bugs •Hadoop: – DFSClient.LeaseChecker thread leak in job tracker -> bi-weekly JT restart – dfs.datanode.balance.bandwidthPerSec set to 200MB -> big performance impact •JVM: – leap second bug -> All clusters were down the same time – GC setting -> NN full GC happened regularly •OS: – “Divide by zero” in CentOS and RH 6.1 -> machine reboot hadoop @ eBay Marketplaces 13
  • 14. Case study – cluster maintenance •Code rollout: – NN SPOF – RPC compatibility between old and new versions •Hadoop configuration change: – Likely required Hadoop JVM restart – Rolling restart has impact on job latency – Datanode rolling restart caused HBase region servers to exit •Machines re-commission: – Hadoop version drift – OS configuration bug reappeared hadoop @ eBay Marketplaces 14
  • 15. Metrics •Definition: – Availability = MTBF ( mean time between failure ) / MTBF + MDT ( mean down time ) – Down time includes planned maintenance •Measurement: – Synthetic transaction approach – Run regular canary work count MR job – Canary job times out in X minutes hadoop @ eBay Marketplaces 15
  • 16. More about metrics •Availability != MTTR ( mean time to recover ) – MTTR is more important for applications like Cassini index build •What is considered “available”? – Performance degradation – % of live slave nodes – Other entry points such as Web UI – Core data set availability – Multi-tenancy scenario hadoop @ eBay Marketplaces 16
  • 17. Ways to improve availability •Automation – Use puppet and daemontools – Monitor system health •Redundancy – Namenode HA – Hot standby region server •Isolation – HDFS federation – Region server grouping •Congestion control – RPC congestion control, Hadoop-9640 – Apply to both HDFS and HBase •Features to enable “no downtime maintenance” – Dynamic configuration update – RPC compatibility – Better ways to do rolling restart hadoop @ eBay Marketplaces 17
  • 18. Overview • Hadoop growth @ eBay • Availability study • Next steps
  • 19. Opportunities ahead •More automation •Availability and scalability – Hadoop 2.0 – HBase fast recovery time •Multi-tenancy – Run production jobs with strong SLAs in big shared clusters – QoS in HDFS and HBase •New scenarios – Interactive Analysis with SQL language – Direct Hadoop Access from dev machines hadoop @ eBay Marketplaces 19

Editor's Notes

  1. Need to identify User or Usage MetricsClick ratesVolume of data in the hub Cluster sizeSize of data in the cluster----- Meeting Notes (5/15/13 16:22) -----numbers needs to be adjusted - Charles Cox/Bass Chong
  2. This list needs updated – Stephen lee – Data domains