Your SlideShare is downloading. ×
Bringing OLTP woth OLAP: Lumos on Hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Bringing OLTP woth OLAP: Lumos on Hadoop

943
views

Published on

Published in: Technology

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
943
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Today, Talk about Scaling ETL in order to consolidate and democratize data and analytics on Hadoop at LinkedIn.
  • Let’s start with the overall Data Ecosystem
    Then focus on the specific problem of integrating online data-stores with Hadoop
    and go over the solution
  • Members interact with the site apps
    And they generate actions and data mutations
    Which gets persisted in LOGS store and ONLINE data stores
    Espresso, MySQL and Oracle are primary online data stores.
    Espresso is a document oriented partitioned data store with transactional support. It is home grown.
    Kafka is used as the LOG store.
    Online Data sources are periodically replicated to hadoop for creating cubes & enrichments.
    Cubes are used externally on the site as well as internally on the reports/insights for analysts.
    (Eg: “Who viewed your profile”, “Campaign performance reports”, Member sign-up reports)
    Cubes are delivered via Cube serving Engines. There are primarily 3 cube serving stack.
    Voldemort is a key-value store : used to deliver static reports with pre-computed metrics.
    Pinot : search technology : used for delivering some what dynamic reports with pre-compute metrics (drill)
    Finally, the traditional BI stack comprised of TD + Tableau + MSTR: deliver insights to business users.
  • Explain interactively what action generated what data  real use case.

    Tracking: User activity at the site turns into tracking data
    Example -> Tracking -> PageView, AdClick
    Append -> each user activity generates new data
    Immutable -> Once generated, does not change but grows over time
    Usually organized by time and accessed over time range

    Database: is user provided data stored in online stores.
    This data is mutable over time
    Example -> Member Profile, Education
    Organized as full table as of some time and accessed in full
  • The problem is simply replicating the data from ONLINE to HADOOP
    But, LNKD has 300m members and generates lots of data => humongous amount of data
    Fresh data directly impacts the member engagement and business decision making
  • PROD data center that is accessible from outside
    HADOOP is CORP data center
  • Deletes for compliance
    Move the data entirely, but it puts load on the source system, network and hadoop resources
  • Commit time or

    Since tracking data generates is append only, it is easier to handler and arrange them in time window.
    DB data can have updates or deletes,
    and reflecting that on HDFS in low latency and with optimal resouce usage is a challenge
  • TALK about schema evaluation
  • TALK about schema evaluation

  • This is not HDFS snaphsot not HBASE snapshot
  • Schema changes + rewrite the complete data

    Sqoop: Cross-colo database connections are not allowed
    Sqoop: May put load on the production databases
    Hbase
    Write the change logs and periodically do a snapshot and replicate
    not all companies run Hbase as part of the standard deployment
    not clear if this will meet the low-latency requirement
    Hive Streaming
    looks similar to what we do
    caveat: it only supports ORCA
  • Change to Data Extract
  • Bottom right
  • TODO: cluster of databases and Relay
    Reading off of databus
    With a picture
    Checkpoint 
    Scn to time mapping
    Backup slides towards the end
  • Db Dump format to Avro
    Oracle data types
    Map-Only Job
    Field Level transformation
    Eliminate recursive schema

    Avro Schema Attribute JSON
    Meta info
    Key and delta column
    begin_date, end_date,
    drop_date, full_drop date
    Row counts


  • Db Dump format to Avro
    Oracle data types
    Map-Only Job
    Field Level transformation
    Eliminate recursive schema

    Avro Schema Attribute JSON
    Meta info
    Key and delta column
    begin_date, end_date,
    drop_date, full_drop date
    Row counts


  • Db Dump format to Avro
    Oracle data types
    Map-Only Job
    Field Level transformation
    Eliminate recursive schema

    Avro Schema Attribute JSON
    Meta info
    Key and delta column
    begin_date, end_date,
    drop_date, full_drop date
    Row counts


  • Db Dump format to Avro
    Oracle data types
    Map-Only Job
    Field Level transformation
    Eliminate recursive schema

    Avro Schema Attribute JSON
    Meta info
    Key and delta column
    begin_date, end_date,
    drop_date, full_drop date
    Row counts


  • Change to Data Extract
  • Transcript

    • 1. Scaling ETL on Hadoop: Bridging OLTP with OLAP
    • 2. Agenda  Data Ecosystem @ LinkedIn  Problem : Bridging OLTP with OLAP  Solution  Details  Conclusion and Future Work 2
    • 3. Data Ecosystem @ LinkedIn 3
    • 4. Data Ecosystem - Overview 4 Serving App Online Stores Espresso Oracle MySQL Logs Analytics Infra Business Engines Serving OLAP
    • 5. Data Ecosystem – Data 5  Tracking Data  Tracks user activity at web site  Append only  Example: Page View  Database Data  Member provided data in online-stores  Inserts, Updates and Deletes  Example: Member Profiles, Likes, Comments
    • 6. Problem Scaling ETL on Hadoop 6
    • 7. Bridging OLTP to OLAP 7 OLTP OLAP  Integrating site-serving data stores with Hadoop at scale with low latency.  Critical to LinkedIn’s  Member engagement  Business decision making Kafka Engines Serving OLAP Databases Tracking Data Espresso Oracle MySQL
    • 8. Challenge - Scalable ETL 8  600+ Tracking topics  500+ Database tables  XXX TB of Data at rest  X TB of new data generated per day  5000 Nodes, Several Hadoop clusters Kafka Engines Serving OLAP Databases Tracking Data Espresso Oracle MySQL OLTP OLAP
    • 9. Challenge – Consistent Snapshot with SLA 9  Apply updates, deletes  Copy full tables  But, resource overheads  Small fraction of data changes Kafka Engines Serving OLAP Databases Tracking Data Espresso Oracle MySQL OLTP OLAP
    • 10. Engines Requirements 10 OLTP Oracle Espresso OLAP  Refresh data on HDFS frequently  Seamless handling of schema evolution  Optimal resource usage  Handle multi data centers  Efficient change capture on source  Ensure Last-Update semantics  Handle deletes Serving OLAP Database Data Tracking Data
    • 11. Solution 11
    • 12. Lumos 12 Data Capture  Can use commit logs  Delta processing  Latencies in minutes  Schema agnostic framework Databus Others Hadoop : Data Center DB Extract Files Data Center Colo-1 Databases Colo-2 Databases Lumos databases (HDFS) dbchanges (HDFS)
    • 13. Lumos – Multi-Datacenter 13 Data Capture  Handle multi-datacenter stores  Resolve updates via commit order Databus Others Hadoop : Data Center DB Extract Files Data Center Colo-1 Databases Colo-2 Databases Lumos databases (HDFS) dbchanges (HDFS)
    • 14. Lumos – Data Organization 14 - Virtual Snapshot HDFS Layout InputFormat Pig&Hive Loaders  Database Snapshot - Entire database on HDFS - With added latency  Database Virtual Snapshot - Previous Snapshot + Delta - Enables faster refresh /db/table/snapshot-0 _delta dir-1 dir-2 dir-3
    • 15. Lumos - High Level Architecture 15 Virtual Snapshot Builder ETL Hadoop Cluster Staging (internal) Lazy Snapshot Builder User Jobs HDFS Published Virtual Snapshot MR/Pig/Hiv e Loaders Compactor Change Captur e Increments Pre- Process Full Drops
    • 16. Alternative Approaches  Sqoop  Hbase  Hive Streaming 16
    • 17. Details 17
    • 18. Change Capture – File Based 18  File Format  Compressed CSV  Metadata  Full Drop  Via Fast Reader (Oracle, MySQL)  Via MySQL backups (Espresso)  Runs for hours with Dirty reads  Increments  Via SQL  Transactional Full Drop 1am 4am Inc h-1 Inc h-2 Inc h-3 2am 3am Prev. HW New High-water mark DB Files Web Service HDFS HTTPS Pulls Inc H-4
    • 19. Change Capture – Databus Based 19 Databus Relay Mapper Databus Consumer dbchanges (HDFS) Reducer Database Mapper Databus Consumer Reducer  Reads Database commit logs  Multi datacenter via Databus Relay  Runs as MR Job  Output : date-time partitioned with multiple versions  True change capture (including hard deletes) Databus RelayDatabase Hadoop
    • 20. Pre-Processing 20  Data format conversion  Field level transformations  Privacy  Cleansing – Eg. Remove recursive schema  Metadata annotation  Add row counts for data validation Virtual Snapshot Builder (HDFS) Internal Staging Lazy Snapshot Builder User Jobs (HDFS) Published Virtual Snapshot MR/Pig/Hive Loaders Compactor Change Capture Increments Pre- Process Full Drops
    • 21. Snapshotting – Lazy Materializer 21  One MR job per table, consumes full drops  Supports dirty reads.  Hash Partition on primary key  Number of partitions based on data size  Sorts on primary key  Results published into staging directory Virtual Snapshot Builder (HDFS) Internal Staging Lazy Snapshot Builder User Jobs (HDFS) Published Virtual Snapshot MR/Pig/Hive Loaders Compactor Change Capture Increments Pre- Process Full Drops
    • 22. Snapshotting – Virtual Snapshot Builder 22  One MR Job for all tables  Identifies all existing snapshots, both published and staged  Creates appropriate delta partitions for every snapshot  Delta partition count equals Snapshot partition count  Club multiple partition in one file  Outputs latest row using delta column  Publishes staged snapshots with new deltas  Previously published snapshots updated with new deltas Virtual Snapshot Builder (HDFS) Internal Staging Lazy Snapshot Builder User Jobs (HDFS) Published Virtual Snapshot MR/Pig/Hive Loaders Compactor Change Capture Increments Pre- Process Full Drops
    • 23. Snapshotting – Virtual Snapshot Builder 23 /db/table/snapshot-0 (10 partitions, 10 Avro files) _delta inc-1 (10 partitions, 2 Avro file) Part-0 . . .Part-9 Index files Inc-2 (10 partitions, 2 Avro file) Part-0 Part-5 Part-0  Incremental data is small  Rolls increments  Avoid creating small files  Equi-partitions INC as Snapshot  Seek and Read a partition Partition-0 Part-0.avro File Partition-4 Partition-5 Partition-9 Index file Index files Part-5 Index file Part-5.avro File
    • 24. Snapshotting – Loaders 24  Custom InputFormat (MR)  Uses the Index file to create Splits  RecordReader merges partition-0 of Snapshot and Delta  Returns latest row from Delta if present  Masks row if deleted  Otherwise returns row from snapshot  Pig Loader enables reading virtual snapshot via Pig  Storage handler enables reading virtual snapshot via Hive
    • 25. Snapshotting – Loaders (2) 25 /db/table/snapshot-0 (10 partitions, 10 Avro files) _delta Part-0 Part-9 Delta-1 (10 partitions, 2 Avro file) Part-5 Part-0 Custom InputFormat Index files Part-1 Part-2 . . . Mapper-0 Custom InputFormat Mapper-9  Delta-1.Part-0 contains partitions 0 to 4  Delta-2.Part-5 contains partitions 5 to 9  Snapshot-0.Part-0 contains partition 0  Both sorted on primary key
    • 26. Snapshotting – Compactor 26  Required when partition size exceeds threshold  Materializes Virtual Snapshot to Snapshot  With more partitions  MR job with Reducer Virtual Snapshot Builder (HDFS) Internal Staging Lazy Snapshot Builder User Jobs (HDFS) Published Virtual Snapshot MR/Pig/Hive Loaders Compactor Change Capture Increments Pre- Process Full Drops
    • 27. Operating billions of rows per day  Dude, where’s my row? – Automatic Data validation  When data misses the bus – Handling late data – Look back window  Cluster downtime – Restart-ability – Active-active – Idempotent processing 27
    • 28. Conclusion and Future Work  Conclusion  Lumos : Scalable ETL framework  Battle tested in production  Future Work  Unify Internal and External data  Open source 28
    • 29. Q & A 29 Questions?
    • 30. Appendix 30

    ×