SlideShare a Scribd company logo
Self-Serve Reporting Platform on Hadoop
Shirshanka Das
Strata Singapore 2015
4
Ingest Process Serve Visualize
Reporting Pipelines
5
Ingest Process Serve Visualize
Reporting at LinkedIn: Evolution
Sources
Oracle MSTR
Tableau
Internal
Tools
Espresso
Kafka
External
Custom
Custom
Custom
Hadoop Voldemort
Pinot
MySQL
INFA + MSTR OracleTeradataon+ Scripts
Jobs
on
Infra Scale
6
Number of Hadoop clusters: 12
Total number of machines: ~7k
Largest Cluster: ~3k machines
Data volume generated per day: XX Terabytes
Total accumulated data: XX Petabytes
People Scale
7
Reporting Platform Team: ~10
Core Warehouse Team: 1x
Data Scientists: 10x
Business Analysts: 10x
Product Managers: 10x
Sales and Marketing: 100x
8
Ingest Process Serve Visualize
Challenges
Disjointed efforts, unreliable systems
Unpredictable SLA across all systems
Fragmented data pipelines with inconsistent data
9
Ingest Process Serve Visualize
Houston
we have a problem
Step 1
Central transport pipeline
Still have
a problem
Step 2
Central
Ingestion
Framework
13
Stream + Batch
REST
SFTP
JDBC
Diverse Sources
Open source @ github.com/linkedin/gobblin
In production @ LinkedIn, Intel, Swisscom,
NerdWallet
@LinkedIn
~20 distinct source types
Hundreds of TB per day
Hundreds of datasets
Data Quality
15
Ingest Process Serve Visualize
Unified
Metrics
Platform
16
Single
Source
of
Truth
Easy
Onboarding Operability
Requirements
Workflow Metric
Definition
Sandbox
Code
Repository
Metric
Owner
System Jobs
Build
Core Metrics
Job
Central Team,
Relevant
Stakeholders
1. iterate
2. create 3. review
4. check in
Metric Definition
Name Description TagsOwners
Dataset
Dimensions
Time
Script
Metrics
Entity Ids
Tier
Formulas
Entity
Dimensions
Input
Datasets
Temporality
An example: video play analysis
name: "video"
description: “Metrics for video tracking”
label: “video”
tags: [flagship, feed]
owners: [jdoe, jsmith]
enabled: true
retention: 90d
timestamp: timestamp
frequency: daily
script: video_play.pig
output_window: 1d
dimensions:[
{
name: platform
doc: “phone, tablet or desktop"
}
{
name: action_type
doc: “click play or auto-play“
}
]
input_datasets
[
{
name: actionsRaw
path: Tracking.ActionEvent
range: 1d
}
]
An example contd…
metrics: [
name: unique_viewers
doc: “Count of unique viewers”
formula: “unique(member_id)”
tier: 2
good_direction: "up"
}
{
name: play_actions
doc: “Sum of play actions"
tier: 2
formula: “sum(play_actions)"
good_direction: "up"
}
]
entity_ids: [
{
name: member_id
category: member
}
{
name:video_id
category: video
}
]
UMP Data FlowUmp
Monitor
Primary
Data
(tracking,
databases,
external)
UMP Raw
Data
UMP
Aggregated
Data Relevance
Experiment
analysis
Ad-hoc
Metrics
Script
Data Prep
agg
cube
dimension
verify
HDFS
+ Pinot
Dashboards
…
First version in production since early 2014
Significant redesign in 2015
Total amount of data being scanned per day: Hundreds of TBs
Total number of metrics being computed: 2k+
Total number of scripts: ~ 400
Number of authors for these metrics: ~ 200
Maximum number of dimensions per dataset: ~ 30
Number of people responsible for upkeep of pipeline: 2
UMP by the numbers
22
Learnings so far
23
Ease of onboarding
Hard when you have > 1000 users with different skill sets
Need great UX to complement developer friendly
alternatives
Single source of truth
Not just a technology challenge
Organization needs to rally around it
Operability
Multi-tenant Hadoop pipeline with SLA-s and QoS: hard
Cost 2 Serve: Managing metrics lifecycle is important
The Next Big Things
Bridging streaming and batch
Code-free metrics
Sessions, Funnels, Cohorts
Open source
24
Ingest Process Serve Visualize
P not
SQL-like
interface
(minus joins)
Sub second
query latency
Data load
from Hadoop
and Kafka
Capabilities
Pinot Data Flow
Kafka Hadoop
Samza Process
Pinot
minutes
hour +
Pinot@LinkedIn
Site-­‐facing	
  Apps Reporting	
  dashboards Monitoring
In production since 2012
Open source @ github.com/linkedin/pinot
28
Ingest Process Serve Visualize
Raptor
Standardize Visualization
29
Leverage
- Standalone app, with support for embedding
- Can use existing analytics backend: Pinot
Strategic
- Reduces dependency on 3rd party BI tools
- Closer integration with LinkedIn’s ecosystem of
experimentation, anomaly detection solutions
30
Requirements
Support
apps
ecosystem
Core
Visualization
Capabilities
Metadata
Integration
Raptor 1.0
31
First version built by 3 engineers in a quarter
Features
- Integration with UMP, Pinot
- Time series, bar charts, …
- Create, Publish, Clone, Discover
Dashboards
Numbers
- Number of dashboards: ~100
- Weekly unique users: ~400
The Future for Raptor
39
Social Collaboration features
Intelligence
- Anomaly detection
- Dashboards You May Like
Embedding into data products
Open Source
40
Ingest Process Serve Visualize
A Few Good Hammers
Unified
Metrics
Platform
P not Raptor
41
Ingest Process Serve Visualize
What we’re excited about
Unified
Metrics
Platform
P not Raptor
Metadata Bus
42
Metadata driven e2e Optimizations
Dynamic prioritization of data ingest
Surface source data quality issues in dashboard
Surface backfill status on dashboard
Cascading deprecation of dashboards,
computation and data sources through lineage
43
Shirshanka Das
@shirshanka
Catch me offline to chat about…
What we’re doing for
- Views on Hadoop
- Data Quality
- Metadata

More Related Content

What's hot

All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the Databus
Amy W. Tang
 
Continus sql with sql stream builder
Continus sql with sql stream builderContinus sql with sql stream builder
Continus sql with sql stream builder
Timothy Spann
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Data Con LA
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
Amy W. Tang
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Rick Bilodeau
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with Superset
DataWorks Summit
 
Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...
Bilgin Ibryam
 
Spark meetup - Zoomdata Streaming
Spark meetup  - Zoomdata StreamingSpark meetup  - Zoomdata Streaming
Spark meetup - Zoomdata Streaming
Zoomdata
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
DataWorks Summit
 
DevNation Live: Kafka and Debezium
DevNation Live: Kafka and DebeziumDevNation Live: Kafka and Debezium
DevNation Live: Kafka and Debezium
Red Hat Developers
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1
Chris Nauroth
 
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
DataWorks Summit/Hadoop Summit
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
FlyData Inc.
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Data Con LA
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Databricks
 
SQL Server on Linux - march 2017
SQL Server on Linux - march 2017SQL Server on Linux - march 2017
SQL Server on Linux - march 2017
Sorin Peste
 
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha NarkhededotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
confluent
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
MongoDB Europe 2016 - Deploying MongoDB on NetApp storage
MongoDB Europe 2016 - Deploying MongoDB on NetApp storageMongoDB Europe 2016 - Deploying MongoDB on NetApp storage
MongoDB Europe 2016 - Deploying MongoDB on NetApp storage
MongoDB
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQL
DataWorks Summit
 

What's hot (20)

All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the Databus
 
Continus sql with sql stream builder
Continus sql with sql stream builderContinus sql with sql stream builder
Continus sql with sql stream builder
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with Superset
 
Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...
 
Spark meetup - Zoomdata Streaming
Spark meetup  - Zoomdata StreamingSpark meetup  - Zoomdata Streaming
Spark meetup - Zoomdata Streaming
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
DevNation Live: Kafka and Debezium
DevNation Live: Kafka and DebeziumDevNation Live: Kafka and Debezium
DevNation Live: Kafka and Debezium
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1
 
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
 
SQL Server on Linux - march 2017
SQL Server on Linux - march 2017SQL Server on Linux - march 2017
SQL Server on Linux - march 2017
 
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha NarkhededotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
 
MongoDB Europe 2016 - Deploying MongoDB on NetApp storage
MongoDB Europe 2016 - Deploying MongoDB on NetApp storageMongoDB Europe 2016 - Deploying MongoDB on NetApp storage
MongoDB Europe 2016 - Deploying MongoDB on NetApp storage
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQL
 

Viewers also liked

Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Shirshanka Das
 
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Shirshanka Das
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Shirshanka Das
 
Aksyon radyo
Aksyon radyoAksyon radyo
Aksyon radyo
OnlinRadioTune
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Shirshanka Das
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data Application
Amy W. Tang
 
Resume- William Myers FD2016.1.4
Resume- William Myers FD2016.1.4Resume- William Myers FD2016.1.4
Resume- William Myers FD2016.1.4William Myers
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedInData Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
Amy W. Tang
 
Personal branding playbook
Personal branding playbookPersonal branding playbook
Personal branding playbook
Online Business
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsPerficient, Inc.
 
Unlocking the Experts
Unlocking the ExpertsUnlocking the Experts
Unlocking the Experts
LinkedIn
 
Participatory Design: Bringing Users Into Your Process
Participatory Design: Bringing Users Into Your ProcessParticipatory Design: Bringing Users Into Your Process
Participatory Design: Bringing Users Into Your Process
David Sherwin
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Edureka!
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
SlideShare
 
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Carol Smith
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
Edureka!
 

Viewers also liked (19)

Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
 
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
Taming the ever-evolving Compliance Beast : Lessons learnt at LinkedIn [Strat...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
 
Aksyon radyo
Aksyon radyoAksyon radyo
Aksyon radyo
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
LinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data ApplicationLinkedIn Segmentation & Targeting Platform: A Big Data Application
LinkedIn Segmentation & Targeting Platform: A Big Data Application
 
Resume- William Myers FD2016.1.4
Resume- William Myers FD2016.1.4Resume- William Myers FD2016.1.4
Resume- William Myers FD2016.1.4
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedInData Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Personal branding playbook
Personal branding playbookPersonal branding playbook
Personal branding playbook
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
 
Unlocking the Experts
Unlocking the ExpertsUnlocking the Experts
Unlocking the Experts
 
Participatory Design: Bringing Users Into Your Process
Participatory Design: Bringing Users Into Your ProcessParticipatory Design: Bringing Users Into Your Process
Participatory Design: Bringing Users Into Your Process
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
Making Great User Experiences, Pittsburgh Scrum MeetUp, Oct 17, 2017
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
What is Artificial Intelligence | Artificial Intelligence Tutorial For Beginn...
 

Similar to Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop

Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
confluent
 
Big Data Applications Made Easy: Fact Or Fiction?
Big Data Applications Made Easy: Fact Or Fiction?Big Data Applications Made Easy: Fact Or Fiction?
Big Data Applications Made Easy: Fact Or Fiction?
Glenn Renfro
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
AboutYouGmbH
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
Hortonworks
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and MoreWSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
confluent
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
Jim Haughwout
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanDataWorks Summit
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
David Chen
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
Sriskandarajah Suhothayan
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
Hortonworks
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Alluxio, Inc.
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
kgshukla
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
iguazio
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
Revolution Analytics
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
Keiichiro Ono
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 

Similar to Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop (20)

Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
Big Data Applications Made Easy: Fact Or Fiction?
Big Data Applications Made Easy: Fact Or Fiction?Big Data Applications Made Easy: Fact Or Fiction?
Big Data Applications Made Easy: Fact Or Fiction?
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and MoreWSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 

Recently uploaded

一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 

Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop