Data Linage Solution in MOMO

Data Lineage in
MOMO
Mr. Hải Nguyễn
Head of Data Platform

Growths
● Founded date 2007
● Number #1 e-wallet with most connected
banks
● F&B, Transportation, eCommerce, Billpay,
supermarket,...
● ~13,5 million users

sources:https://analyticks.wordpress.com/2016/11/14/the-perfect-big-data-platform-my-blueprint/

Tech stacks
● Streaming sources: Kafka, Google Pubsub
● Orgirin sources: RDMS, Cassandra.
● Migration tools: Beam on DataFlow, Spark on DataProc.
● Data Lake: GCS, BigQuery.
● ETL tools: Jupyter Notebook, Apache Airﬂow
● Report tools: Google Data Studio
● Tools: GK8

Achievements
● Number of dataset: ~80
● Number of table: ~500
● Number of generated records (daily): 300 mil
● Number of queries (daily): ~40,000
● Data scanned (daily): ~100TB
● Number of ETL: ~500

ETL
Source: https://www.astera.com/type/blog/etl-vs-elt-whats-the-difference/

Apache Airflow
Airflow is a platform created by community to
programmatically author, schedule and monitor workflows

Department A Department B
Department C

What happens if
exceptions occur in
the middle of
nowhere?

● Downstream job still process => wasting resources
● Based on wrong data => Data correctness
● Preproccess impact downstreams
Challenges

Enki
Data Lineage
Source: https://www.ancient.eu/Enki/

● Data lineage includes the data origin, what happens to it and where it
moves over time.[1]
Data lineage gives visibility while greatly
simplifying the ability to trace errors back to the root cause in a data
analytics process
Source:( Hoang, Natalie (2017-03-16). "Data Lineage Helps Drives Business Value | Trifacta".
Trifacta. Retrieved 2017-09-20.)
.
Data Lineage

Collecting all information giving us:
- Data source dependencies
- Data source status
- Data source logic
=> Metadata
Our Approach

Core information
Data Lineage Service

Data source status keeper service
Data Heartbeat Service

Achievements
● No more wrong data sources
● The ability to backﬁll all data sources with a
step
● Reduce debug time: day => within an hour

● Business information
● Lineage version control
.
What next.

Thank you!
We ‘re hiring
hai.nguyen1@mservice.com.vn

Data Linage Solution in MOMO

Recommended

Recommended

More Related Content

Similar to Data Linage Solution in MOMO

Similar to Data Linage Solution in MOMO (20)

Recently uploaded

Recently uploaded (20)

Data Linage Solution in MOMO