Bringing OLTP woth OLAP: Lumos on Hadoop

DataWorks Summit
DataWorks SummitDataWorks Summit
Scaling ETL on Hadoop: Bridging OLTP with OLAP
Agenda
 Data Ecosystem @ LinkedIn
 Problem : Bridging OLTP with OLAP
 Solution
 Details
 Conclusion and Future Work
2
Data Ecosystem @ LinkedIn
3
Data Ecosystem - Overview
4
Serving App
Online Stores
Espresso
Oracle
MySQL
Logs
Analytics Infra
Business
Engines
Serving
OLAP
Data Ecosystem – Data
5
 Tracking Data
 Tracks user activity at web site
 Append only
 Example: Page View
 Database Data
 Member provided data in online-stores
 Inserts, Updates and Deletes
 Example: Member Profiles, Likes, Comments
Problem
Scaling ETL on Hadoop
6
Bridging OLTP to OLAP
7
OLTP OLAP
 Integrating site-serving data stores with Hadoop
at scale with low latency.
 Critical to LinkedIn’s
 Member engagement
 Business decision making
Kafka
Engines
Serving
OLAP
Databases
Tracking Data
Espresso
Oracle
MySQL
Challenge - Scalable ETL
8
 600+ Tracking topics
 500+ Database tables
 XXX TB of Data at rest
 X TB of new data generated per day
 5000 Nodes, Several Hadoop clusters
Kafka
Engines
Serving
OLAP
Databases
Tracking Data
Espresso
Oracle
MySQL
OLTP OLAP
Challenge – Consistent Snapshot with SLA
9
 Apply updates, deletes
 Copy full tables
 But, resource overheads
 Small fraction of data changes
Kafka
Engines
Serving
OLAP
Databases
Tracking Data
Espresso
Oracle
MySQL
OLTP OLAP
Engines
Requirements
10
OLTP
Oracle Espresso
OLAP
 Refresh data on HDFS frequently
 Seamless handling of schema evolution
 Optimal resource usage
 Handle multi data centers
 Efficient change capture on source
 Ensure Last-Update semantics
 Handle deletes
Serving
OLAP
Database Data
Tracking Data
Solution
11
Lumos
12
Data Capture
 Can use commit logs
 Delta processing
 Latencies in minutes
 Schema agnostic framework
Databus
Others
Hadoop : Data Center
DB
Extract
Files
Data Center
Colo-1
Databases
Colo-2
Databases
Lumos
databases
(HDFS)
dbchanges
(HDFS)
Lumos – Multi-Datacenter
13
Data Capture
 Handle multi-datacenter stores
 Resolve updates via commit order
Databus
Others
Hadoop : Data Center
DB
Extract
Files
Data Center
Colo-1
Databases
Colo-2
Databases
Lumos
databases
(HDFS)
dbchanges
(HDFS)
Lumos – Data Organization
14
-
Virtual Snapshot
HDFS Layout
InputFormat
Pig&Hive
Loaders
 Database Snapshot
- Entire database on HDFS
- With added latency
 Database Virtual Snapshot
- Previous Snapshot + Delta
- Enables faster refresh
/db/table/snapshot-0
_delta
dir-1
dir-2
dir-3
Lumos - High Level Architecture
15

Virtual
Snapshot
Builder
ETL Hadoop Cluster
Staging
(internal)
Lazy
Snapshot
Builder
User
Jobs
HDFS
Published
Virtual
Snapshot
MR/Pig/Hiv
e
Loaders
Compactor
Change
Captur
e Increments
Pre-
Process
Full Drops
Alternative Approaches
 Sqoop
 Hbase
 Hive Streaming
16
Details
17
Change Capture – File Based
18
 File Format
 Compressed CSV
 Metadata
 Full Drop
 Via Fast Reader (Oracle, MySQL)
 Via MySQL backups (Espresso)
 Runs for hours with Dirty reads
 Increments
 Via SQL
 Transactional
Full Drop
1am 4am
Inc
h-1
Inc
h-2
Inc
h-3
2am 3am
Prev.
HW
New
High-water mark
DB
Files
Web
Service
HDFS
HTTPS
Pulls
Inc
H-4
Change Capture – Databus Based
19
Databus
Relay
Mapper
Databus
Consumer
dbchanges
(HDFS)
Reducer
Database
Mapper
Databus
Consumer
Reducer
 Reads Database commit logs
 Multi datacenter via Databus Relay
 Runs as MR Job
 Output : date-time partitioned with multiple versions
 True change capture (including hard deletes)
Databus
RelayDatabase
Hadoop
Pre-Processing
20
 Data format conversion
 Field level transformations
 Privacy
 Cleansing – Eg. Remove recursive schema
 Metadata annotation
 Add row counts for data validation
 Virtual
Snapshot
Builder
(HDFS)
Internal
Staging
Lazy
Snapshot
Builder
User Jobs
(HDFS)
Published
Virtual
Snapshot
MR/Pig/Hive
Loaders
Compactor
Change
Capture Increments
Pre-
Process
Full Drops
Snapshotting – Lazy Materializer
21
 One MR job per table, consumes full drops
 Supports dirty reads.
 Hash Partition on primary key
 Number of partitions based on data size
 Sorts on primary key
 Results published into staging directory
 Virtual
Snapshot
Builder
(HDFS)
Internal
Staging
Lazy
Snapshot
Builder
User Jobs
(HDFS)
Published
Virtual
Snapshot
MR/Pig/Hive
Loaders
Compactor
Change
Capture Increments
Pre-
Process
Full Drops
Snapshotting – Virtual Snapshot Builder
22
 One MR Job for all tables
 Identifies all existing snapshots, both published and staged
 Creates appropriate delta partitions for every snapshot
 Delta partition count equals Snapshot partition count
 Club multiple partition in one file
 Outputs latest row using delta column
 Publishes staged snapshots with new deltas
 Previously published snapshots updated with new deltas
 Virtual
Snapshot
Builder
(HDFS)
Internal
Staging
Lazy
Snapshot
Builder
User Jobs
(HDFS)
Published
Virtual
Snapshot
MR/Pig/Hive
Loaders
Compactor
Change
Capture Increments
Pre-
Process
Full Drops
Snapshotting – Virtual Snapshot Builder
23
/db/table/snapshot-0
(10 partitions, 10 Avro files)
_delta
inc-1
(10 partitions, 2 Avro file)
Part-0 . .
.Part-9
Index files
Inc-2
(10 partitions, 2 Avro file)
Part-0
Part-5
Part-0
 Incremental data is small
 Rolls increments
 Avoid creating small files
 Equi-partitions INC as Snapshot
 Seek and Read a partition
Partition-0
Part-0.avro File
Partition-4
Partition-5
Partition-9
Index file
Index files
Part-5
Index file
Part-5.avro File
Snapshotting – Loaders
24
 Custom InputFormat (MR)
 Uses the Index file to create Splits
 RecordReader merges partition-0 of Snapshot and
Delta
 Returns latest row from Delta if present
 Masks row if deleted
 Otherwise returns row from snapshot
 Pig Loader enables reading virtual snapshot via Pig
 Storage handler enables reading virtual snapshot via Hive
Snapshotting – Loaders (2)
25
/db/table/snapshot-0
(10 partitions, 10 Avro files)
_delta
Part-0
Part-9
Delta-1
(10 partitions, 2 Avro file)
Part-5
Part-0
Custom
InputFormat
Index files
Part-1
Part-2 . .
.
Mapper-0
Custom
InputFormat
Mapper-9
 Delta-1.Part-0 contains partitions 0 to 4
 Delta-2.Part-5 contains partitions 5 to 9
 Snapshot-0.Part-0 contains partition 0
 Both sorted on primary key
Snapshotting – Compactor
26
 Required when partition size exceeds threshold
 Materializes Virtual Snapshot to Snapshot
 With more partitions
 MR job with Reducer
 Virtual
Snapshot
Builder
(HDFS)
Internal
Staging
Lazy
Snapshot
Builder
User Jobs
(HDFS)
Published
Virtual
Snapshot
MR/Pig/Hive
Loaders
Compactor
Change
Capture
Increments
Pre-
Process
Full Drops
Operating billions of rows per day
 Dude, where’s my row?
– Automatic Data validation
 When data misses the bus
– Handling late data
– Look back window
 Cluster downtime
– Restart-ability
– Active-active
– Idempotent processing
27
Conclusion and Future Work
 Conclusion
 Lumos : Scalable ETL framework
 Battle tested in production
 Future Work
 Unify Internal and External data
 Open source
28
Q & A
29
Questions?
Appendix
30
1 of 30

More Related Content

What's hot(20)

Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
BalajiVaradarajan13731 views
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
Daniel Abadi10.7K views
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
hadooparchbook7.7K views
Hadoop introductionHadoop introduction
Hadoop introduction
musrath mohammad171 views
Jstorm introduction-0.9.6Jstorm introduction-0.9.6
Jstorm introduction-0.9.6
longda feng8.7K views
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETL
Cloudera, Inc.4.1K views
SQL on HadoopSQL on Hadoop
SQL on Hadoop
nvvrajesh2.3K views
SQL on HadoopSQL on Hadoop
SQL on Hadoop
Bigdatapump1.5K views
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
DataWorks Summit1.9K views
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit2.4K views
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit1.4K views
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
DataWorks Summit540 views
PolyalgebraPolyalgebra
Polyalgebra
DataWorks Summit/Hadoop Summit809 views
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
eakasit_dpu2.1K views

Similar to Bringing OLTP woth OLAP: Lumos on Hadoop(20)

More from DataWorks Summit(20)

Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit19.3K views
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit1K views

Recently uploaded(20)

The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya59 views
ThroughputThroughput
Throughput
Moisés Armani Ramírez31 views
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver24 views
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation24 views

Bringing OLTP woth OLAP: Lumos on Hadoop

  • 1. Scaling ETL on Hadoop: Bridging OLTP with OLAP
  • 2. Agenda  Data Ecosystem @ LinkedIn  Problem : Bridging OLTP with OLAP  Solution  Details  Conclusion and Future Work 2
  • 3. Data Ecosystem @ LinkedIn 3
  • 4. Data Ecosystem - Overview 4 Serving App Online Stores Espresso Oracle MySQL Logs Analytics Infra Business Engines Serving OLAP
  • 5. Data Ecosystem – Data 5  Tracking Data  Tracks user activity at web site  Append only  Example: Page View  Database Data  Member provided data in online-stores  Inserts, Updates and Deletes  Example: Member Profiles, Likes, Comments
  • 7. Bridging OLTP to OLAP 7 OLTP OLAP  Integrating site-serving data stores with Hadoop at scale with low latency.  Critical to LinkedIn’s  Member engagement  Business decision making Kafka Engines Serving OLAP Databases Tracking Data Espresso Oracle MySQL
  • 8. Challenge - Scalable ETL 8  600+ Tracking topics  500+ Database tables  XXX TB of Data at rest  X TB of new data generated per day  5000 Nodes, Several Hadoop clusters Kafka Engines Serving OLAP Databases Tracking Data Espresso Oracle MySQL OLTP OLAP
  • 9. Challenge – Consistent Snapshot with SLA 9  Apply updates, deletes  Copy full tables  But, resource overheads  Small fraction of data changes Kafka Engines Serving OLAP Databases Tracking Data Espresso Oracle MySQL OLTP OLAP
  • 10. Engines Requirements 10 OLTP Oracle Espresso OLAP  Refresh data on HDFS frequently  Seamless handling of schema evolution  Optimal resource usage  Handle multi data centers  Efficient change capture on source  Ensure Last-Update semantics  Handle deletes Serving OLAP Database Data Tracking Data
  • 12. Lumos 12 Data Capture  Can use commit logs  Delta processing  Latencies in minutes  Schema agnostic framework Databus Others Hadoop : Data Center DB Extract Files Data Center Colo-1 Databases Colo-2 Databases Lumos databases (HDFS) dbchanges (HDFS)
  • 13. Lumos – Multi-Datacenter 13 Data Capture  Handle multi-datacenter stores  Resolve updates via commit order Databus Others Hadoop : Data Center DB Extract Files Data Center Colo-1 Databases Colo-2 Databases Lumos databases (HDFS) dbchanges (HDFS)
  • 14. Lumos – Data Organization 14 - Virtual Snapshot HDFS Layout InputFormat Pig&Hive Loaders  Database Snapshot - Entire database on HDFS - With added latency  Database Virtual Snapshot - Previous Snapshot + Delta - Enables faster refresh /db/table/snapshot-0 _delta dir-1 dir-2 dir-3
  • 15. Lumos - High Level Architecture 15 Virtual Snapshot Builder ETL Hadoop Cluster Staging (internal) Lazy Snapshot Builder User Jobs HDFS Published Virtual Snapshot MR/Pig/Hiv e Loaders Compactor Change Captur e Increments Pre- Process Full Drops
  • 16. Alternative Approaches  Sqoop  Hbase  Hive Streaming 16
  • 18. Change Capture – File Based 18  File Format  Compressed CSV  Metadata  Full Drop  Via Fast Reader (Oracle, MySQL)  Via MySQL backups (Espresso)  Runs for hours with Dirty reads  Increments  Via SQL  Transactional Full Drop 1am 4am Inc h-1 Inc h-2 Inc h-3 2am 3am Prev. HW New High-water mark DB Files Web Service HDFS HTTPS Pulls Inc H-4
  • 19. Change Capture – Databus Based 19 Databus Relay Mapper Databus Consumer dbchanges (HDFS) Reducer Database Mapper Databus Consumer Reducer  Reads Database commit logs  Multi datacenter via Databus Relay  Runs as MR Job  Output : date-time partitioned with multiple versions  True change capture (including hard deletes) Databus RelayDatabase Hadoop
  • 20. Pre-Processing 20  Data format conversion  Field level transformations  Privacy  Cleansing – Eg. Remove recursive schema  Metadata annotation  Add row counts for data validation Virtual Snapshot Builder (HDFS) Internal Staging Lazy Snapshot Builder User Jobs (HDFS) Published Virtual Snapshot MR/Pig/Hive Loaders Compactor Change Capture Increments Pre- Process Full Drops
  • 21. Snapshotting – Lazy Materializer 21  One MR job per table, consumes full drops  Supports dirty reads.  Hash Partition on primary key  Number of partitions based on data size  Sorts on primary key  Results published into staging directory Virtual Snapshot Builder (HDFS) Internal Staging Lazy Snapshot Builder User Jobs (HDFS) Published Virtual Snapshot MR/Pig/Hive Loaders Compactor Change Capture Increments Pre- Process Full Drops
  • 22. Snapshotting – Virtual Snapshot Builder 22  One MR Job for all tables  Identifies all existing snapshots, both published and staged  Creates appropriate delta partitions for every snapshot  Delta partition count equals Snapshot partition count  Club multiple partition in one file  Outputs latest row using delta column  Publishes staged snapshots with new deltas  Previously published snapshots updated with new deltas Virtual Snapshot Builder (HDFS) Internal Staging Lazy Snapshot Builder User Jobs (HDFS) Published Virtual Snapshot MR/Pig/Hive Loaders Compactor Change Capture Increments Pre- Process Full Drops
  • 23. Snapshotting – Virtual Snapshot Builder 23 /db/table/snapshot-0 (10 partitions, 10 Avro files) _delta inc-1 (10 partitions, 2 Avro file) Part-0 . . .Part-9 Index files Inc-2 (10 partitions, 2 Avro file) Part-0 Part-5 Part-0  Incremental data is small  Rolls increments  Avoid creating small files  Equi-partitions INC as Snapshot  Seek and Read a partition Partition-0 Part-0.avro File Partition-4 Partition-5 Partition-9 Index file Index files Part-5 Index file Part-5.avro File
  • 24. Snapshotting – Loaders 24  Custom InputFormat (MR)  Uses the Index file to create Splits  RecordReader merges partition-0 of Snapshot and Delta  Returns latest row from Delta if present  Masks row if deleted  Otherwise returns row from snapshot  Pig Loader enables reading virtual snapshot via Pig  Storage handler enables reading virtual snapshot via Hive
  • 25. Snapshotting – Loaders (2) 25 /db/table/snapshot-0 (10 partitions, 10 Avro files) _delta Part-0 Part-9 Delta-1 (10 partitions, 2 Avro file) Part-5 Part-0 Custom InputFormat Index files Part-1 Part-2 . . . Mapper-0 Custom InputFormat Mapper-9  Delta-1.Part-0 contains partitions 0 to 4  Delta-2.Part-5 contains partitions 5 to 9  Snapshot-0.Part-0 contains partition 0  Both sorted on primary key
  • 26. Snapshotting – Compactor 26  Required when partition size exceeds threshold  Materializes Virtual Snapshot to Snapshot  With more partitions  MR job with Reducer Virtual Snapshot Builder (HDFS) Internal Staging Lazy Snapshot Builder User Jobs (HDFS) Published Virtual Snapshot MR/Pig/Hive Loaders Compactor Change Capture Increments Pre- Process Full Drops
  • 27. Operating billions of rows per day  Dude, where’s my row? – Automatic Data validation  When data misses the bus – Handling late data – Look back window  Cluster downtime – Restart-ability – Active-active – Idempotent processing 27
  • 28. Conclusion and Future Work  Conclusion  Lumos : Scalable ETL framework  Battle tested in production  Future Work  Unify Internal and External data  Open source 28

Editor's Notes

  1. Today, Talk about Scaling ETL in order to consolidate and democratize data and analytics on Hadoop at LinkedIn.
  2. Let’s start with the overall Data Ecosystem Then focus on the specific problem of integrating online data-stores with Hadoop and go over the solution
  3. Members interact with the site apps And they generate actions and data mutations Which gets persisted in LOGS store and ONLINE data stores Espresso, MySQL and Oracle are primary online data stores. Espresso is a document oriented partitioned data store with transactional support. It is home grown. Kafka is used as the LOG store. Online Data sources are periodically replicated to hadoop for creating cubes & enrichments. Cubes are used externally on the site as well as internally on the reports/insights for analysts. (Eg: “Who viewed your profile”, “Campaign performance reports”, Member sign-up reports) Cubes are delivered via Cube serving Engines. There are primarily 3 cube serving stack. Voldemort is a key-value store : used to deliver static reports with pre-computed metrics. Pinot : search technology : used for delivering some what dynamic reports with pre-compute metrics (drill) Finally, the traditional BI stack comprised of TD + Tableau + MSTR: deliver insights to business users.
  4. Explain interactively what action generated what data  real use case. Tracking: User activity at the site turns into tracking data Example -> Tracking -> PageView, AdClick Append -> each user activity generates new data Immutable -> Once generated, does not change but grows over time Usually organized by time and accessed over time range Database: is user provided data stored in online stores. This data is mutable over time Example -> Member Profile, Education Organized as full table as of some time and accessed in full
  5. The problem is simply replicating the data from ONLINE to HADOOP But, LNKD has 300m members and generates lots of data => humongous amount of data Fresh data directly impacts the member engagement and business decision making
  6. PROD data center that is accessible from outside HADOOP is CORP data center
  7. Deletes for compliance Move the data entirely, but it puts load on the source system, network and hadoop resources
  8. Commit time or Since tracking data generates is append only, it is easier to handler and arrange them in time window. DB data can have updates or deletes, and reflecting that on HDFS in low latency and with optimal resouce usage is a challenge
  9. TALK about schema evaluation
  10. TALK about schema evaluation
  11. This is not HDFS snaphsot not HBASE snapshot
  12. Schema changes + rewrite the complete data Sqoop: Cross-colo database connections are not allowed Sqoop: May put load on the production databases Hbase Write the change logs and periodically do a snapshot and replicate not all companies run Hbase as part of the standard deployment not clear if this will meet the low-latency requirement Hive Streaming looks similar to what we do caveat: it only supports ORCA
  13. Change to Data Extract
  14. Bottom right
  15. TODO: cluster of databases and Relay Reading off of databus With a picture Checkpoint  Scn to time mapping Backup slides towards the end
  16. Db Dump format to Avro Oracle data types Map-Only Job Field Level transformation Eliminate recursive schema Avro Schema Attribute JSON Meta info Key and delta column begin_date, end_date, drop_date, full_drop date Row counts
  17. Db Dump format to Avro Oracle data types Map-Only Job Field Level transformation Eliminate recursive schema Avro Schema Attribute JSON Meta info Key and delta column begin_date, end_date, drop_date, full_drop date Row counts
  18. Db Dump format to Avro Oracle data types Map-Only Job Field Level transformation Eliminate recursive schema Avro Schema Attribute JSON Meta info Key and delta column begin_date, end_date, drop_date, full_drop date Row counts
  19. Db Dump format to Avro Oracle data types Map-Only Job Field Level transformation Eliminate recursive schema Avro Schema Attribute JSON Meta info Key and delta column begin_date, end_date, drop_date, full_drop date Row counts
  20. Change to Data Extract