at ebay
Aroop Maliakkal Padmanabhan
Rajesh Chandramohan
Vikas Kanth
2
1-10 nodes1-10 nodes
20072007
100+ nodes100+ nodes
1000 + core1000 + core
1 PB1 PB
20102010
20112011
1000+ node1000+ node
10,000+ core10,000+ core
10+ PB10+ PB
4000+ node4000+ node
40,000+ core40,000+ core
50+ PB50+ PB
20132013
20152015
10,000+ nodes10,000+ nodes
150,000+ cores150,000+ cores
150+ PB150+ PB
20092009
10+ nodes10+ nodes
Hadoop @ ebay
Hadoop @ ebay
10+ large Hadoop clusters
10,000+ nodes
50,000+ jobs per day
50,000,000+ tasks per day
500+ types of Hadoop/Hbase metrics
Billions of audit events per day
3
Dedicated clusters
• Very specific use case like index building (Near Real Time Indexing)
• Tight SLAs for jobs (seconds to few minutes)
• Immediate revenue impact
• TSDB clusters for monitoring
Shared clusters
• Used primarily for analytics of user behavior and inventory
• Batch and ad-hoc jobs
• YARN, Hbase, Hive, Pig, Hue, Spark, etc.
• Security enabled with Kerberos
HAAS clusters
• Used primarily for DEV and QA
Hadoop @ ebay
5
5.5PBof data generates a 650
million item index in only 2.5
hours
1.68 million items
processed in
3 minutes
Hadoop Platform Use Case: Search Backend
DB
(Oracle)
DB
(Oracle)
HBASE
Item +
Seller
Table
HBASE
Item +
Seller
Table
BESBES
BULK DATA
LOADER
BULK DATA
LOADER
A
G
E
N
T
S
Query
Nodes
Mini Index
(Built every few mins)
Bulk Index
(Built every 6 hours)
EBay World Wide
Event based
update
Batch Update
(Every few hours)
Indexing Pipeline Overview
• Cluster Availability
• Security
• Supporting NRT on commodity hardware
• Lights out Management.
• Organic growth leads to heterogeneous environment
(SKUs, operating systems, vendors, network topology etc)
• Resource Management: Quota management, Queue management
Ecosystem Requirements
Eagle Data Activity Monitoring
Perimeter Security
Enable 2FA
Data Loss Prevention
Authorization and Access Control
Deploy Ranger for centralized Access Control
Kerberoized cluster
Architecture
 Use Case:
 Analyze HDFS file/directory metadata to find anomalies
in users' HDFS usage patterns in pseudo real time
 Retrieve HDFS Metadata like permissions, size, block
level properties etc which are not visible in HDFS Audit
logs
 Block unauthorized operations and send alerts
 Challenges:
 RPC to Namenode for this is an overhead !
 OIV is SLOW !!
Real Time Incremental Hadoop Image Processing
Solution
Hadoop Robot is the action and auto-remediation center for Hadoop
maintenance
Hadoop Robot
Data Flow
What’s Eagle
The uniform monitoring and alerting framework
to monitor large-scale distributed system like
hadoop, spark, cloud, etc. in real time.
Eagle Ecosystem
Apps
DAM
JPA
Interface
Web Portal
REST Services
Ambari Plugin
Integration
Kafka
Storm
HBase
Druid
Elastic Search
Eagle Framework
Provide full-stack monitoring framework for efficiently
developing highly scalable real-time monitoring
applications.
Eagle Apps
Provide built-in monitoring applications for domains like
hadoop, storm and cloud.
Eagle Integration
Integrate with distributed real-time execution environment
like storm, message bus like kafka and storage layer like
hbase, and also support extensions.
Eagle Interface
Allow to access or manage eagle through REST service,
web UI or Ambari plugin.
Eagle
Framework
JPA: Job Performance Analyser
Historical job analysis
Running job analysis
Anomaly host detection
Job data skew detection
Job performance suggestion
Anomaly Prediction based on machine learning
Monitor and analyze job performance in real-time
JPA Solutions
HDFS Tiered Storage
Logs Visualizer
Hadoop ElasticSearch
THANK YOU

Hadoop at Ebay

  • 1.
    at ebay Aroop MaliakkalPadmanabhan Rajesh Chandramohan Vikas Kanth
  • 2.
    2 1-10 nodes1-10 nodes 20072007 100+nodes100+ nodes 1000 + core1000 + core 1 PB1 PB 20102010 20112011 1000+ node1000+ node 10,000+ core10,000+ core 10+ PB10+ PB 4000+ node4000+ node 40,000+ core40,000+ core 50+ PB50+ PB 20132013 20152015 10,000+ nodes10,000+ nodes 150,000+ cores150,000+ cores 150+ PB150+ PB 20092009 10+ nodes10+ nodes Hadoop @ ebay
  • 3.
    Hadoop @ ebay 10+large Hadoop clusters 10,000+ nodes 50,000+ jobs per day 50,000,000+ tasks per day 500+ types of Hadoop/Hbase metrics Billions of audit events per day 3
  • 4.
    Dedicated clusters • Veryspecific use case like index building (Near Real Time Indexing) • Tight SLAs for jobs (seconds to few minutes) • Immediate revenue impact • TSDB clusters for monitoring Shared clusters • Used primarily for analytics of user behavior and inventory • Batch and ad-hoc jobs • YARN, Hbase, Hive, Pig, Hue, Spark, etc. • Security enabled with Kerberos HAAS clusters • Used primarily for DEV and QA Hadoop @ ebay
  • 5.
    5 5.5PBof data generatesa 650 million item index in only 2.5 hours 1.68 million items processed in 3 minutes Hadoop Platform Use Case: Search Backend
  • 6.
    DB (Oracle) DB (Oracle) HBASE Item + Seller Table HBASE Item + Seller Table BESBES BULKDATA LOADER BULK DATA LOADER A G E N T S Query Nodes Mini Index (Built every few mins) Bulk Index (Built every 6 hours) EBay World Wide Event based update Batch Update (Every few hours) Indexing Pipeline Overview
  • 7.
    • Cluster Availability •Security • Supporting NRT on commodity hardware • Lights out Management. • Organic growth leads to heterogeneous environment (SKUs, operating systems, vendors, network topology etc) • Resource Management: Quota management, Queue management Ecosystem Requirements
  • 8.
    Eagle Data ActivityMonitoring Perimeter Security Enable 2FA Data Loss Prevention Authorization and Access Control Deploy Ranger for centralized Access Control Kerberoized cluster
  • 9.
  • 10.
     Use Case: Analyze HDFS file/directory metadata to find anomalies in users' HDFS usage patterns in pseudo real time  Retrieve HDFS Metadata like permissions, size, block level properties etc which are not visible in HDFS Audit logs  Block unauthorized operations and send alerts  Challenges:  RPC to Namenode for this is an overhead !  OIV is SLOW !! Real Time Incremental Hadoop Image Processing
  • 11.
  • 12.
    Hadoop Robot isthe action and auto-remediation center for Hadoop maintenance Hadoop Robot
  • 13.
  • 14.
    What’s Eagle The uniformmonitoring and alerting framework to monitor large-scale distributed system like hadoop, spark, cloud, etc. in real time.
  • 15.
    Eagle Ecosystem Apps DAM JPA Interface Web Portal RESTServices Ambari Plugin Integration Kafka Storm HBase Druid Elastic Search Eagle Framework Provide full-stack monitoring framework for efficiently developing highly scalable real-time monitoring applications. Eagle Apps Provide built-in monitoring applications for domains like hadoop, storm and cloud. Eagle Integration Integrate with distributed real-time execution environment like storm, message bus like kafka and storage layer like hbase, and also support extensions. Eagle Interface Allow to access or manage eagle through REST service, web UI or Ambari plugin. Eagle Framework
  • 16.
    JPA: Job PerformanceAnalyser Historical job analysis Running job analysis Anomaly host detection Job data skew detection Job performance suggestion Anomaly Prediction based on machine learning Monitor and analyze job performance in real-time
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.