SlideShare a Scribd company logo
1 of 21
at ebay
Aroop Maliakkal Padmanabhan
Rajesh Chandramohan
Vikas Kanth
2
1-10 nodes1-10 nodes
20072007
100+ nodes100+ nodes
1000 + core1000 + core
1 PB1 PB
20102010
20112011
1000+ node1000+ node
10,000+ core10,000+ core
10+ PB10+ PB
4000+ node4000+ node
40,000+ core40,000+ core
50+ PB50+ PB
20132013
20152015
10,000+ nodes10,000+ nodes
150,000+ cores150,000+ cores
150+ PB150+ PB
20092009
10+ nodes10+ nodes
Hadoop @ ebay
Hadoop @ ebay
10+ large Hadoop clusters
10,000+ nodes
50,000+ jobs per day
50,000,000+ tasks per day
500+ types of Hadoop/Hbase metrics
Billions of audit events per day
3
Dedicated clusters
• Very specific use case like index building (Near Real Time Indexing)
• Tight SLAs for jobs (seconds to few minutes)
• Immediate revenue impact
• TSDB clusters for monitoring
Shared clusters
• Used primarily for analytics of user behavior and inventory
• Batch and ad-hoc jobs
• YARN, Hbase, Hive, Pig, Hue, Spark, etc.
• Security enabled with Kerberos
HAAS clusters
• Used primarily for DEV and QA
Hadoop @ ebay
5
5.5PBof data generates a 650
million item index in only 2.5
hours
1.68 million items
processed in
3 minutes
Hadoop Platform Use Case: Search Backend
DB
(Oracle)
DB
(Oracle)
HBASE
Item +
Seller
Table
HBASE
Item +
Seller
Table
BESBES
BULK DATA
LOADER
BULK DATA
LOADER
A
G
E
N
T
S
Query
Nodes
Mini Index
(Built every few mins)
Bulk Index
(Built every 6 hours)
EBay World Wide
Event based
update
Batch Update
(Every few hours)
Indexing Pipeline Overview
• Cluster Availability
• Security
• Supporting NRT on commodity hardware
• Lights out Management.
• Organic growth leads to heterogeneous environment
(SKUs, operating systems, vendors, network topology etc)
• Resource Management: Quota management, Queue management
Ecosystem Requirements
Eagle Data Activity Monitoring
Perimeter Security
Enable 2FA
Data Loss Prevention
Authorization and Access Control
Deploy Ranger for centralized Access Control
Kerberoized cluster
Architecture
 Use Case:
 Analyze HDFS file/directory metadata to find anomalies
in users' HDFS usage patterns in pseudo real time
 Retrieve HDFS Metadata like permissions, size, block
level properties etc which are not visible in HDFS Audit
logs
 Block unauthorized operations and send alerts
 Challenges:
 RPC to Namenode for this is an overhead !
 OIV is SLOW !!
Real Time Incremental Hadoop Image Processing
Solution
Hadoop Robot is the action and auto-remediation center for Hadoop
maintenance
Hadoop Robot
Data Flow
What’s Eagle
The uniform monitoring and alerting framework
to monitor large-scale distributed system like
hadoop, spark, cloud, etc. in real time.
Eagle Ecosystem
Apps
DAM
JPA
Interface
Web Portal
REST Services
Ambari Plugin
Integration
Kafka
Storm
HBase
Druid
Elastic Search
Eagle Framework
Provide full-stack monitoring framework for efficiently
developing highly scalable real-time monitoring
applications.
Eagle Apps
Provide built-in monitoring applications for domains like
hadoop, storm and cloud.
Eagle Integration
Integrate with distributed real-time execution environment
like storm, message bus like kafka and storage layer like
hbase, and also support extensions.
Eagle Interface
Allow to access or manage eagle through REST service,
web UI or Ambari plugin.
Eagle
Framework
JPA: Job Performance Analyser
Historical job analysis
Running job analysis
Anomaly host detection
Job data skew detection
Job performance suggestion
Anomaly Prediction based on machine learning
Monitor and analyze job performance in real-time
JPA Solutions
HDFS Tiered Storage
Logs Visualizer
Hadoop ElasticSearch
THANK YOU

More Related Content

What's hot

Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At SpotifyAdam Kawa
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Distributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using MLDistributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using MLJorge Cardoso
 
Tableau desktop & server
Tableau desktop & serverTableau desktop & server
Tableau desktop & serverChris Raby
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Amazon Web Services
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Casesnzhang
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentationDavid Rice
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseJames Serra
 
API Security - OWASP top 10 for APIs + tips for pentesters
API Security - OWASP top 10 for APIs + tips for pentestersAPI Security - OWASP top 10 for APIs + tips for pentesters
API Security - OWASP top 10 for APIs + tips for pentestersInon Shkedy
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentDatabricks
 
What is Business Objects
What is Business Objects What is Business Objects
What is Business Objects BigClasses.com
 

What's hot (20)

Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Distributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using MLDistributed Trace & Log Analysis using ML
Distributed Trace & Log Analysis using ML
 
Tableau desktop & server
Tableau desktop & serverTableau desktop & server
Tableau desktop & server
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentation
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
API Security - OWASP top 10 for APIs + tips for pentesters
API Security - OWASP top 10 for APIs + tips for pentestersAPI Security - OWASP top 10 for APIs + tips for pentesters
API Security - OWASP top 10 for APIs + tips for pentesters
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
 
What is Business Objects
What is Business Objects What is Business Objects
What is Business Objects
 

Viewers also liked

内容组织
内容组织内容组织
内容组织flydream
 
Fame analysis techniques
Fame analysis techniquesFame analysis techniques
Fame analysis techniquesSarfaraz Nasri
 
Common Practices in Religion
Common Practices in ReligionCommon Practices in Religion
Common Practices in ReligionStacey Troup
 
7 steps to cloud onboarding
7 steps to cloud onboarding7 steps to cloud onboarding
7 steps to cloud onboardingInterxion
 
Treatment plan for Implants in funtional zone
Treatment plan for Implants in funtional zoneTreatment plan for Implants in funtional zone
Treatment plan for Implants in funtional zoneVinay Kadavakolanu
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Alex Levenson
 
오픈소스로 구축하는 클라우드 이야기
오픈소스로 구축하는 클라우드 이야기오픈소스로 구축하는 클라우드 이야기
오픈소스로 구축하는 클라우드 이야기Nalee Jang
 
Focus Strategy(Short Tutorial)
Focus Strategy(Short Tutorial)Focus Strategy(Short Tutorial)
Focus Strategy(Short Tutorial)debrizz
 
Enterprise Security Architecture
Enterprise Security ArchitectureEnterprise Security Architecture
Enterprise Security ArchitectureKris Kimmerle
 
Energy conservation week celebration
Energy conservation week celebrationEnergy conservation week celebration
Energy conservation week celebrationSudha Arun
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Clusterairbots
 
Cloud Computing v.s. Cyber Security
Cloud Computing v.s. Cyber Security Cloud Computing v.s. Cyber Security
Cloud Computing v.s. Cyber Security Bahtiyar Bircan
 
Export-Oriented Industrialization (EOI): Arguments For and Against What Have ...
Export-Oriented Industrialization (EOI): Arguments For and Against What Have ...Export-Oriented Industrialization (EOI): Arguments For and Against What Have ...
Export-Oriented Industrialization (EOI): Arguments For and Against What Have ...Dr.Choen Krainara
 
Making Display Advertising Work for Auto Dealers
Making Display Advertising Work for Auto DealersMaking Display Advertising Work for Auto Dealers
Making Display Advertising Work for Auto DealersSpeed Shift Media
 
Real-World Data Governance: Data Governance Roles & Responsibilities
Real-World Data Governance: Data Governance Roles & ResponsibilitiesReal-World Data Governance: Data Governance Roles & Responsibilities
Real-World Data Governance: Data Governance Roles & ResponsibilitiesDATAVERSITY
 

Viewers also liked (20)

内容组织
内容组织内容组织
内容组织
 
Fame analysis techniques
Fame analysis techniquesFame analysis techniques
Fame analysis techniques
 
Common Practices in Religion
Common Practices in ReligionCommon Practices in Religion
Common Practices in Religion
 
Kasus penggelapan uang para nasabah citibank oleh melinda dee
Kasus penggelapan uang para nasabah citibank oleh melinda deeKasus penggelapan uang para nasabah citibank oleh melinda dee
Kasus penggelapan uang para nasabah citibank oleh melinda dee
 
Lesson #1 Share
Lesson #1 ShareLesson #1 Share
Lesson #1 Share
 
7 steps to cloud onboarding
7 steps to cloud onboarding7 steps to cloud onboarding
7 steps to cloud onboarding
 
Treatment plan for Implants in funtional zone
Treatment plan for Implants in funtional zoneTreatment plan for Implants in funtional zone
Treatment plan for Implants in funtional zone
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Filter sterilization
Filter sterilization Filter sterilization
Filter sterilization
 
ERP components
ERP componentsERP components
ERP components
 
오픈소스로 구축하는 클라우드 이야기
오픈소스로 구축하는 클라우드 이야기오픈소스로 구축하는 클라우드 이야기
오픈소스로 구축하는 클라우드 이야기
 
Focus Strategy(Short Tutorial)
Focus Strategy(Short Tutorial)Focus Strategy(Short Tutorial)
Focus Strategy(Short Tutorial)
 
Enterprise Security Architecture
Enterprise Security ArchitectureEnterprise Security Architecture
Enterprise Security Architecture
 
Energy conservation week celebration
Energy conservation week celebrationEnergy conservation week celebration
Energy conservation week celebration
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Cluster
 
Cloud Computing v.s. Cyber Security
Cloud Computing v.s. Cyber Security Cloud Computing v.s. Cyber Security
Cloud Computing v.s. Cyber Security
 
Export-Oriented Industrialization (EOI): Arguments For and Against What Have ...
Export-Oriented Industrialization (EOI): Arguments For and Against What Have ...Export-Oriented Industrialization (EOI): Arguments For and Against What Have ...
Export-Oriented Industrialization (EOI): Arguments For and Against What Have ...
 
Making Display Advertising Work for Auto Dealers
Making Display Advertising Work for Auto DealersMaking Display Advertising Work for Auto Dealers
Making Display Advertising Work for Auto Dealers
 
Real-World Data Governance: Data Governance Roles & Responsibilities
Real-World Data Governance: Data Governance Roles & ResponsibilitiesReal-World Data Governance: Data Governance Roles & Responsibilities
Real-World Data Governance: Data Governance Roles & Responsibilities
 

Similar to Hadoop at Ebay

Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Ashish Narasimham
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseHao Chen
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...nnakasone
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Stac summit june 14th - goodbye datalakes
Stac summit june 14th - goodbye datalakesStac summit june 14th - goodbye datalakes
Stac summit june 14th - goodbye datalakesiguazio
 
How leading financial services organisations are winning with tech
How leading financial services organisations are winning with techHow leading financial services organisations are winning with tech
How leading financial services organisations are winning with techMongoDB
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Building Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaBuilding Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaAshish Thapliyal
 
Big Data Applications Made Easy: Fact Or Fiction?
Big Data Applications Made Easy: Fact Or Fiction?Big Data Applications Made Easy: Fact Or Fiction?
Big Data Applications Made Easy: Fact Or Fiction?Glenn Renfro
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 

Similar to Hadoop at Ebay (20)

Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Stac summit june 14th - goodbye datalakes
Stac summit june 14th - goodbye datalakesStac summit june 14th - goodbye datalakes
Stac summit june 14th - goodbye datalakes
 
How leading financial services organisations are winning with tech
How leading financial services organisations are winning with techHow leading financial services organisations are winning with tech
How leading financial services organisations are winning with tech
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Building Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaBuilding Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and Kafka
 
Big Data Applications Made Easy: Fact Or Fiction?
Big Data Applications Made Easy: Fact Or Fiction?Big Data Applications Made Easy: Fact Or Fiction?
Big Data Applications Made Easy: Fact Or Fiction?
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
hadoop resume
hadoop resumehadoop resume
hadoop resume
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 

Recently uploaded

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Recently uploaded (20)

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 

Hadoop at Ebay

  • 1. at ebay Aroop Maliakkal Padmanabhan Rajesh Chandramohan Vikas Kanth
  • 2. 2 1-10 nodes1-10 nodes 20072007 100+ nodes100+ nodes 1000 + core1000 + core 1 PB1 PB 20102010 20112011 1000+ node1000+ node 10,000+ core10,000+ core 10+ PB10+ PB 4000+ node4000+ node 40,000+ core40,000+ core 50+ PB50+ PB 20132013 20152015 10,000+ nodes10,000+ nodes 150,000+ cores150,000+ cores 150+ PB150+ PB 20092009 10+ nodes10+ nodes Hadoop @ ebay
  • 3. Hadoop @ ebay 10+ large Hadoop clusters 10,000+ nodes 50,000+ jobs per day 50,000,000+ tasks per day 500+ types of Hadoop/Hbase metrics Billions of audit events per day 3
  • 4. Dedicated clusters • Very specific use case like index building (Near Real Time Indexing) • Tight SLAs for jobs (seconds to few minutes) • Immediate revenue impact • TSDB clusters for monitoring Shared clusters • Used primarily for analytics of user behavior and inventory • Batch and ad-hoc jobs • YARN, Hbase, Hive, Pig, Hue, Spark, etc. • Security enabled with Kerberos HAAS clusters • Used primarily for DEV and QA Hadoop @ ebay
  • 5. 5 5.5PBof data generates a 650 million item index in only 2.5 hours 1.68 million items processed in 3 minutes Hadoop Platform Use Case: Search Backend
  • 6. DB (Oracle) DB (Oracle) HBASE Item + Seller Table HBASE Item + Seller Table BESBES BULK DATA LOADER BULK DATA LOADER A G E N T S Query Nodes Mini Index (Built every few mins) Bulk Index (Built every 6 hours) EBay World Wide Event based update Batch Update (Every few hours) Indexing Pipeline Overview
  • 7. • Cluster Availability • Security • Supporting NRT on commodity hardware • Lights out Management. • Organic growth leads to heterogeneous environment (SKUs, operating systems, vendors, network topology etc) • Resource Management: Quota management, Queue management Ecosystem Requirements
  • 8. Eagle Data Activity Monitoring Perimeter Security Enable 2FA Data Loss Prevention Authorization and Access Control Deploy Ranger for centralized Access Control Kerberoized cluster
  • 10.  Use Case:  Analyze HDFS file/directory metadata to find anomalies in users' HDFS usage patterns in pseudo real time  Retrieve HDFS Metadata like permissions, size, block level properties etc which are not visible in HDFS Audit logs  Block unauthorized operations and send alerts  Challenges:  RPC to Namenode for this is an overhead !  OIV is SLOW !! Real Time Incremental Hadoop Image Processing
  • 12. Hadoop Robot is the action and auto-remediation center for Hadoop maintenance Hadoop Robot
  • 14. What’s Eagle The uniform monitoring and alerting framework to monitor large-scale distributed system like hadoop, spark, cloud, etc. in real time.
  • 15. Eagle Ecosystem Apps DAM JPA Interface Web Portal REST Services Ambari Plugin Integration Kafka Storm HBase Druid Elastic Search Eagle Framework Provide full-stack monitoring framework for efficiently developing highly scalable real-time monitoring applications. Eagle Apps Provide built-in monitoring applications for domains like hadoop, storm and cloud. Eagle Integration Integrate with distributed real-time execution environment like storm, message bus like kafka and storage layer like hbase, and also support extensions. Eagle Interface Allow to access or manage eagle through REST service, web UI or Ambari plugin. Eagle Framework
  • 16. JPA: Job Performance Analyser Historical job analysis Running job analysis Anomaly host detection Job data skew detection Job performance suggestion Anomaly Prediction based on machine learning Monitor and analyze job performance in real-time