Hadoop @ eBay Marketplaces
Ming Ma
June 27th, 2013
Overview
• Hadoop growth @ eBay Marketplaces
• Availability study
• Opportunities ahead
Big Data @ eBay Marketplaces
120+ Million Active users
300+ Million search queries every single day
350+ Million items ava...
Data Sets
•Inventory Data
– Product Listings, Catalogue, Quantity etc.
•Transactional Data
– Buying, Returning etc.
•User ...
Hadoop Evolution @ eBay Marketplaces
2007
Single digit
nodes
2010
Shared
cluster
• 100s nodes
• 1000s +
core
• PB
• CDH2
2...
Shared vs. Dedicated Clusters
Shared clusters
– 10s of PB and 10s of thousands of slots per cluster
– Run HDP 1.2
– Used p...
Job Distribution by Type
hadoop @ eBay Marketplaces 7
Use Case Examples
•Cassini, full re-write of eBay’s search engine:
– Use MR to build full and incremental near-real-time i...
eBay Hadoop Data Platform
hadoop @ eBay Marketplaces 9
Data Ingest
Extract
Load Validate
Transform
Clients
Java
Scala
Pig
...
Platform Innovation
•Many reliability improvements
•New Security features
– Multi-realm support
– Encryption
– https in ha...
Overview
• Hadoop growth @ eBay
• Availability study
• Next steps
Case study – defective applications
•HBase: A test app created heavy write load
– Test app used all region server RPC thre...
Case study – platform bugs
•Hadoop:
– DFSClient.LeaseChecker thread leak in job tracker -> bi-weekly JT restart
– dfs.data...
Case study – cluster maintenance
•Code rollout:
– NN SPOF
– RPC compatibility between old and new versions
•Hadoop configu...
Metrics
•Definition:
– Availability = MTBF ( mean time between failure ) / MTBF + MDT ( mean down time )
– Down time inclu...
More about metrics
•Availability != MTTR ( mean time to recover )
– MTTR is more important for applications like Cassini i...
Ways to improve availability
•Automation
– Use puppet and daemontools
– Monitor system health
•Redundancy
– Namenode HA
– ...
Overview
• Hadoop growth @ eBay
• Availability study
• Next steps
Opportunities ahead
•More automation
•Availability and scalability
– Hadoop 2.0
– HBase fast recovery time
•Multi-tenancy
...
Upcoming SlideShare
Loading in...5
×

Hadoop and HBase @eBay

4,442

Published on

eBay has one of the largest Hadoop clusters in the industry with many petabytes of data. This talk will give an overview of how Hadoop and HBase have been used within eBay, the lessons we have learned from supporting large-scale production clusters, as well as how we plan to use and improve Hadoop and HBase moving forward. Specific use cases, production issues and platform improvement work will be discussed.

1 Comment
6 Likes
Statistics
Notes
No Downloads
Views
Total Views
4,442
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
148
Comments
1
Likes
6
Embeds 0
No embeds

No notes for slide
  • Need to identify User or Usage MetricsClick ratesVolume of data in the hub Cluster sizeSize of data in the cluster----- Meeting Notes (5/15/13 16:22) -----numbers needs to be adjusted - Charles Cox/Bass Chong
  • This list needs updated – Stephen lee – Data domains
  • Hadoop and HBase @eBay

    1. 1. Hadoop @ eBay Marketplaces Ming Ma June 27th, 2013
    2. 2. Overview • Hadoop growth @ eBay Marketplaces • Availability study • Opportunities ahead
    3. 3. Big Data @ eBay Marketplaces 120+ Million Active users 300+ Million search queries every single day 350+ Million items available hadoop @ eBay Marketplaces 3
    4. 4. Data Sets •Inventory Data – Product Listings, Catalogue, Quantity etc. •Transactional Data – Buying, Returning etc. •User Behavioral Data – Click stream, comments, suggestions, user activities etc. •Customer profiles – Buyer, Seller, Partner information etc. •Machine data – Logs, application data etc. hadoop @ eBay Marketplaces 4
    5. 5. Hadoop Evolution @ eBay Marketplaces 2007 Single digit nodes 2010 Shared cluster • 100s nodes • 1000s + core • PB • CDH2 2011 • Shared clusters • 1000s node • 10,000+ core • 10s PB • Wilma (0.20) 2012 • Shared clusters • 1000s node • 10,000+ core • 10s PB 2013 • Shared clusters • 4k+ node • 40,000+ core • 50s PB • HDP 2009 Search • 10s- nodes hadoop @ eBay Marketplaces 5
    6. 6. Shared vs. Dedicated Clusters Shared clusters – 10s of PB and 10s of thousands of slots per cluster – Run HDP 1.2 – Used primarily for analytics of user behavior and inventory – Mix of production and ad-hoc jobs – Mix of MR, Hive, PIG, Cascading etc. – Hadoop and HBase security enabled Dedicated clusters – Very specific use cases like Index Building – Tight SLAs for jobs (in order of minutes) – Immediate revenue impact – Usually smaller than our shared clusters, but still big (100s of nodes…) hadoop @ eBay Marketplaces 6
    7. 7. Job Distribution by Type hadoop @ eBay Marketplaces 7
    8. 8. Use Case Examples •Cassini, full re-write of eBay’s search engine: – Use MR to build full and incremental near-real-time indexes – Data for indexing is stored in HBase for efficient updates and random read – Strong SLAs – Run on dedicated clusters •Related and similar Items recommendations: – Use transactional data, click stream data, search index, etc. – Production MR jobs on a shared cluster •Analytics dashboard: – Run Mobius MR jobs to join click stream data and transactional data – Store summary data in HBase – Web application to query HBase hadoop @ eBay Marketplaces 8
    9. 9. eBay Hadoop Data Platform hadoop @ eBay Marketplaces 9 Data Ingest Extract Load Validate Transform Clients Java Scala Pig Hive Cascading Mobius Hadoop Behavioral Transactional Inventory Metadata Metastore Type System ServiceAPI Data Access Java POJO Pig UDF Hive UDF Tools ETL Monitor Metadata Mgmt Data Catalog User Mgmt
    10. 10. Platform Innovation •Many reliability improvements •New Security features – Multi-realm support – Encryption – https in hadoop 1 •Hadoop 2.0 – MR 1 and YARN binary compatibility •Automation for operations – Machine decommission and re-commission process •Data and user management – Metadata management – User account provisioning hadoop @ eBay Marketplaces 10
    11. 11. Overview • Hadoop growth @ eBay • Availability study • Next steps
    12. 12. Case study – defective applications •HBase: A test app created heavy write load – Test app used all region server RPC threads – All RPCs are blocked by region flush – RPC requests from production HBase MR job timed out •HDFS: An app created lots of small files inside map tasks – NN RPC Queue length spiked – DN heartbeat RPC can’t be processed – HDFS replication storm hadoop @ eBay Marketplaces 12
    13. 13. Case study – platform bugs •Hadoop: – DFSClient.LeaseChecker thread leak in job tracker -> bi-weekly JT restart – dfs.datanode.balance.bandwidthPerSec set to 200MB -> big performance impact •JVM: – leap second bug -> All clusters were down the same time – GC setting -> NN full GC happened regularly •OS: – “Divide by zero” in CentOS and RH 6.1 -> machine reboot hadoop @ eBay Marketplaces 13
    14. 14. Case study – cluster maintenance •Code rollout: – NN SPOF – RPC compatibility between old and new versions •Hadoop configuration change: – Likely required Hadoop JVM restart – Rolling restart has impact on job latency – Datanode rolling restart caused HBase region servers to exit •Machines re-commission: – Hadoop version drift – OS configuration bug reappeared hadoop @ eBay Marketplaces 14
    15. 15. Metrics •Definition: – Availability = MTBF ( mean time between failure ) / MTBF + MDT ( mean down time ) – Down time includes planned maintenance •Measurement: – Synthetic transaction approach – Run regular canary work count MR job – Canary job times out in X minutes hadoop @ eBay Marketplaces 15
    16. 16. More about metrics •Availability != MTTR ( mean time to recover ) – MTTR is more important for applications like Cassini index build •What is considered “available”? – Performance degradation – % of live slave nodes – Other entry points such as Web UI – Core data set availability – Multi-tenancy scenario hadoop @ eBay Marketplaces 16
    17. 17. Ways to improve availability •Automation – Use puppet and daemontools – Monitor system health •Redundancy – Namenode HA – Hot standby region server •Isolation – HDFS federation – Region server grouping •Congestion control – RPC congestion control, Hadoop-9640 – Apply to both HDFS and HBase •Features to enable “no downtime maintenance” – Dynamic configuration update – RPC compatibility – Better ways to do rolling restart hadoop @ eBay Marketplaces 17
    18. 18. Overview • Hadoop growth @ eBay • Availability study • Next steps
    19. 19. Opportunities ahead •More automation •Availability and scalability – Hadoop 2.0 – HBase fast recovery time •Multi-tenancy – Run production jobs with strong SLAs in big shared clusters – QoS in HDFS and HBase •New scenarios – Interactive Analysis with SQL language – Direct Hadoop Access from dev machines hadoop @ eBay Marketplaces 19
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×