SlideShare a Scribd company logo
Hari Karthikeyan
Data Engineering Intern
Fall Intern Presentation
December 15th, 2021
Manager: Ronak Shah, Director, Data Engineering
Mentor: Alok Shenoy, Senior Data Engineer
2
Agenda
1 Introduction
2 Starter Projects
3 Argos Project
4 MAT 16 Project
5 Acknowledgements
3
About Me
- Computer Engineering @
University of Waterloo
- Graduating April 2022
- Huge soccer and basketball fan!
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
Starter Projects
Implemented a MEGA Export job in
Airflow
● Created an Airflow operator and a test DAG to
move data between two redshift instances
Expanded LaForge module to send
out alerts via pagerduty
● Integrated PD plugin into LaForge; documented,
tested and deployed the changes to ensure
alerting when DAG’s don’t meet their SLA’s
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
4
How it started…
5
Argos Project Overview
Goal: Data Quality and Anomaly Detection system that will be central to
implementing the trust but verify principle across all data pipelines at Coursera
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
Process
Planning Development Testing Demo Deployment
6
Technical
Design
Document
Storage and
persistence layer
Architectural design
sessions
Stage-check-e
xchange
principle
Data Quality check
persistence operator
Anomaly Detection
operator
Predictions
Databricks notebook
Argos plugin
Airflow DAG’s
and data
backfilling
DAG’s to perform
end-to-end testing
Backfilling of test
data in EDW/EDS
Impressions
demo pipeline
with dbt
Integration with dbt
to handle anomalies
Circuit-breaker
functionality of Argos
Productionizing
Argos
Databricks jobs to move
Argos metadata from raw
layer -> L0 -> L1
Extensive documentation
- Operator development
guideline
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
7
Argos
Motivation
● More pipelines built upon the core data sets
leads to data quality being impacted
● Moving away from MEGA means we need a
system to rely on Airflow to carry out
blocking/non-blocking checks
● Need for an extensible framework to act as a
circuit-breaker in all data pipelines
● Ability to run data quality checks and
anomaly detection on EDW/EDS tables
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
8
Argos
Implementation
● Plugin to parse a config file and inject tasks
into a DAG with dependencies
● Operator to persist check results as JSON in
raw layer S3 bucket
● Operator to perform anomaly detection by
comparing today’s check result to latest
prediction result in L1 layer
● Databricks notebook to generate lower and
upper prediction bounds based on historical
check results data
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
9
Argos
Challenges
● Finalizing technical design and project
planning with so many moving parts
● Issues with local Airflow env setup; used
the dev Airflow cluster for testing
● Connecting Airflow to Databricks in
order to trigger a notebook run
● Plenty of edge cases to consider;
performed iterative development
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
Argos check
results metadata
Example
This table (l1.argos_check_results) has the
row count check results for all tables run
from various DAG’s
10
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
Place your screenshot here
11
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
Argos prediction
results metadata
Example
This table (l1.argos_prediction_results) has
the row count prediction results for all tables
run from various DAG’s
Argos Project
UI to add checks on
tables and a Looker
dashboard to
visualize them
12
Next Steps
Argos as a central
microservice (flask
application)
Consolidation of
Argos logs by
integrating with
project Helios
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
Sophisticated
predictions and
anomaly detection
13
Value of extensive documentation
and testing; monitoring after
deployment of ETL pipelines
Importance of architectural
design sessions; teamwork and
collaboration
Python, SQL, Airflow, Databricks,
AWS, dbt, Docker, Terraform
Key Learnings and Takeaways
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
14
MAT 16 Project: SkillsMatch
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
● Full-service destination hub for career discovery, skills development, and job matching
● Mapping learner skills and skills proficiency level to job openings and addressing skill gaps
through a learner portal
● Filtering Coursera learners based on skills and targeting specific learner segments to fill job
openings with qualified candidates through an employer portal
● Extracting real-world job data (skills, description, URL, salary, etc.) from Burning Glass API’s
● Backend data model to facilitate matching algorithm and job/course recommendations based
on users skill scores
Thank you!
● Mentor: Alok
● Manager: Ronak
● Demo: Chibu
● MAT: Steven, Simon (DE), entire SkillsMatch team
● Entire DE team
● Brown Bag Series
Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements

More Related Content

Similar to Copy of Hari Intern Presentation.pdf

Resume_ETL__Testing
Resume_ETL__TestingResume_ETL__Testing
Resume_ETL__Testing
Ashish Lakhade
 
Madhu_Resume
Madhu_ResumeMadhu_Resume
Madhu_Resume
madhu latha pulimi
 
Informatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQLInformatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQL
sivakumar s
 
SivakumarS
SivakumarSSivakumarS
SivakumarS
sivakumar s
 
Docs-as-Code: Evolving the API Documentation Experience
Docs-as-Code: Evolving the API Documentation ExperienceDocs-as-Code: Evolving the API Documentation Experience
Docs-as-Code: Evolving the API Documentation Experience
Pronovix
 
Manigandan_narasimhan_resume
Manigandan_narasimhan_resumeManigandan_narasimhan_resume
Manigandan_narasimhan_resume
manigandan narasimhan
 
Arun-Kumar-OEDQ-Developer
Arun-Kumar-OEDQ-DeveloperArun-Kumar-OEDQ-Developer
Arun-Kumar-OEDQ-Developer
Arun Kumar
 
vikram ch resume
vikram ch resumevikram ch resume
vikram ch resume
vikram cherukuri
 
Data Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudDataData Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudData
WeCloudData
 
BALWANT SINGH_RESUME
BALWANT SINGH_RESUMEBALWANT SINGH_RESUME
BALWANT SINGH_RESUME
Balwant Singh
 
An online tool for requirements engineering, modeling and verification of dis...
An online tool for requirements engineering, modeling and verification of dis...An online tool for requirements engineering, modeling and verification of dis...
An online tool for requirements engineering, modeling and verification of dis...
Sergey Staroletov
 
Gaurav Bharadwaj_ORACLE_ADF_HNC.DOCX
Gaurav Bharadwaj_ORACLE_ADF_HNC.DOCXGaurav Bharadwaj_ORACLE_ADF_HNC.DOCX
Gaurav Bharadwaj_ORACLE_ADF_HNC.DOCX
Gaurav Bharadwaj
 
Prakash sahoo
Prakash sahooPrakash sahoo
Prakash sahoo
Prakash Sahoo
 
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Gabriele Bartolini
 
Resume_gmail
Resume_gmailResume_gmail
Resume_Feb_2016
Resume_Feb_2016Resume_Feb_2016
Resume_Feb_2016
Alex Pershyn
 
Harish Srivastava -Resume
Harish Srivastava -ResumeHarish Srivastava -Resume
Harish Srivastava -Resume
Harish Srivastava
 
Zakir_Hussain_cv
Zakir_Hussain_cvZakir_Hussain_cv
Zakir_Hussain_cv
zakir hussain
 
LEELA_RAM_RADHAKRISHNAN_Res3x
LEELA_RAM_RADHAKRISHNAN_Res3xLEELA_RAM_RADHAKRISHNAN_Res3x
LEELA_RAM_RADHAKRISHNAN_Res3x
Leelaram Radhakrishnan
 
scpo_Technical_Implementation_Basics.pptx
scpo_Technical_Implementation_Basics.pptxscpo_Technical_Implementation_Basics.pptx
scpo_Technical_Implementation_Basics.pptx
Thirupathis9
 

Similar to Copy of Hari Intern Presentation.pdf (20)

Resume_ETL__Testing
Resume_ETL__TestingResume_ETL__Testing
Resume_ETL__Testing
 
Madhu_Resume
Madhu_ResumeMadhu_Resume
Madhu_Resume
 
Informatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQLInformatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQL
 
SivakumarS
SivakumarSSivakumarS
SivakumarS
 
Docs-as-Code: Evolving the API Documentation Experience
Docs-as-Code: Evolving the API Documentation ExperienceDocs-as-Code: Evolving the API Documentation Experience
Docs-as-Code: Evolving the API Documentation Experience
 
Manigandan_narasimhan_resume
Manigandan_narasimhan_resumeManigandan_narasimhan_resume
Manigandan_narasimhan_resume
 
Arun-Kumar-OEDQ-Developer
Arun-Kumar-OEDQ-DeveloperArun-Kumar-OEDQ-Developer
Arun-Kumar-OEDQ-Developer
 
vikram ch resume
vikram ch resumevikram ch resume
vikram ch resume
 
Data Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudDataData Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudData
 
BALWANT SINGH_RESUME
BALWANT SINGH_RESUMEBALWANT SINGH_RESUME
BALWANT SINGH_RESUME
 
An online tool for requirements engineering, modeling and verification of dis...
An online tool for requirements engineering, modeling and verification of dis...An online tool for requirements engineering, modeling and verification of dis...
An online tool for requirements engineering, modeling and verification of dis...
 
Gaurav Bharadwaj_ORACLE_ADF_HNC.DOCX
Gaurav Bharadwaj_ORACLE_ADF_HNC.DOCXGaurav Bharadwaj_ORACLE_ADF_HNC.DOCX
Gaurav Bharadwaj_ORACLE_ADF_HNC.DOCX
 
Prakash sahoo
Prakash sahooPrakash sahoo
Prakash sahoo
 
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
 
Resume_gmail
Resume_gmailResume_gmail
Resume_gmail
 
Resume_Feb_2016
Resume_Feb_2016Resume_Feb_2016
Resume_Feb_2016
 
Harish Srivastava -Resume
Harish Srivastava -ResumeHarish Srivastava -Resume
Harish Srivastava -Resume
 
Zakir_Hussain_cv
Zakir_Hussain_cvZakir_Hussain_cv
Zakir_Hussain_cv
 
LEELA_RAM_RADHAKRISHNAN_Res3x
LEELA_RAM_RADHAKRISHNAN_Res3xLEELA_RAM_RADHAKRISHNAN_Res3x
LEELA_RAM_RADHAKRISHNAN_Res3x
 
scpo_Technical_Implementation_Basics.pptx
scpo_Technical_Implementation_Basics.pptxscpo_Technical_Implementation_Basics.pptx
scpo_Technical_Implementation_Basics.pptx
 

Recently uploaded

Introducing Gopay Mobile App For Environment.pptx
Introducing Gopay Mobile App For Environment.pptxIntroducing Gopay Mobile App For Environment.pptx
Introducing Gopay Mobile App For Environment.pptx
FauzanHarits1
 
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
atwvhyhm
 
Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...
Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...
Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...
dsnow9802
 
Job Finding Apps Everything You Need to Know in 2024
Job Finding Apps Everything You Need to Know in 2024Job Finding Apps Everything You Need to Know in 2024
Job Finding Apps Everything You Need to Know in 2024
SnapJob
 
5 Common Mistakes to Avoid During the Job Application Process.pdf
5 Common Mistakes to Avoid During the Job Application Process.pdf5 Common Mistakes to Avoid During the Job Application Process.pdf
5 Common Mistakes to Avoid During the Job Application Process.pdf
Alliance Jobs
 
Lbs last rank 2023 9988kr47h4744j445.pdf
Lbs last rank 2023 9988kr47h4744j445.pdfLbs last rank 2023 9988kr47h4744j445.pdf
Lbs last rank 2023 9988kr47h4744j445.pdf
ashiquepa3
 
在线制作加拿大萨省大学毕业证文凭证书实拍图原版一模一样
在线制作加拿大萨省大学毕业证文凭证书实拍图原版一模一样在线制作加拿大萨省大学毕业证文凭证书实拍图原版一模一样
在线制作加拿大萨省大学毕业证文凭证书实拍图原版一模一样
2zjra9bn
 
Leadership Ambassador club Adventist module
Leadership Ambassador club Adventist moduleLeadership Ambassador club Adventist module
Leadership Ambassador club Adventist module
kakomaeric00
 
labb123456789123456789123456789123456789
labb123456789123456789123456789123456789labb123456789123456789123456789123456789
labb123456789123456789123456789123456789
Ghh
 
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdfRECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
AlessandroMartins454470
 
一比一原版(SFU毕业证)西蒙弗雷泽大学毕业证如何办理
一比一原版(SFU毕业证)西蒙弗雷泽大学毕业证如何办理一比一原版(SFU毕业证)西蒙弗雷泽大学毕业证如何办理
一比一原版(SFU毕业证)西蒙弗雷泽大学毕业证如何办理
pxyhy
 
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
yuhofha
 
A Guide to a Winning Interview June 2024
A Guide to a Winning Interview June 2024A Guide to a Winning Interview June 2024
A Guide to a Winning Interview June 2024
Bruce Bennett
 
lab.123456789123456789123456789123456789
lab.123456789123456789123456789123456789lab.123456789123456789123456789123456789
lab.123456789123456789123456789123456789
Ghh
 
Resumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineResumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying Online
Bruce Bennett
 
0624.speakingengagementsandteaching-01.pdf
0624.speakingengagementsandteaching-01.pdf0624.speakingengagementsandteaching-01.pdf
0624.speakingengagementsandteaching-01.pdf
Thomas GIRARD BDes
 
一比一原版(YU毕业证)约克大学毕业证如何办理
一比一原版(YU毕业证)约克大学毕业证如何办理一比一原版(YU毕业证)约克大学毕业证如何办理
一比一原版(YU毕业证)约克大学毕业证如何办理
yuhofha
 
thyroid case presentation.pptx Kamala's Lakshaman palatial
thyroid case presentation.pptx Kamala's Lakshaman palatialthyroid case presentation.pptx Kamala's Lakshaman palatial
thyroid case presentation.pptx Kamala's Lakshaman palatial
Aditya Raghav
 
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
taqyea
 
Status of Women in Pakistan.pptxStatus of Women in Pakistan.pptx
Status of Women in Pakistan.pptxStatus of Women in Pakistan.pptxStatus of Women in Pakistan.pptxStatus of Women in Pakistan.pptx
Status of Women in Pakistan.pptxStatus of Women in Pakistan.pptx
MuhammadWaqasBaloch1
 

Recently uploaded (20)

Introducing Gopay Mobile App For Environment.pptx
Introducing Gopay Mobile App For Environment.pptxIntroducing Gopay Mobile App For Environment.pptx
Introducing Gopay Mobile App For Environment.pptx
 
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
 
Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...
Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...
Jill Pizzola's Tenure as Senior Talent Acquisition Partner at THOMSON REUTERS...
 
Job Finding Apps Everything You Need to Know in 2024
Job Finding Apps Everything You Need to Know in 2024Job Finding Apps Everything You Need to Know in 2024
Job Finding Apps Everything You Need to Know in 2024
 
5 Common Mistakes to Avoid During the Job Application Process.pdf
5 Common Mistakes to Avoid During the Job Application Process.pdf5 Common Mistakes to Avoid During the Job Application Process.pdf
5 Common Mistakes to Avoid During the Job Application Process.pdf
 
Lbs last rank 2023 9988kr47h4744j445.pdf
Lbs last rank 2023 9988kr47h4744j445.pdfLbs last rank 2023 9988kr47h4744j445.pdf
Lbs last rank 2023 9988kr47h4744j445.pdf
 
在线制作加拿大萨省大学毕业证文凭证书实拍图原版一模一样
在线制作加拿大萨省大学毕业证文凭证书实拍图原版一模一样在线制作加拿大萨省大学毕业证文凭证书实拍图原版一模一样
在线制作加拿大萨省大学毕业证文凭证书实拍图原版一模一样
 
Leadership Ambassador club Adventist module
Leadership Ambassador club Adventist moduleLeadership Ambassador club Adventist module
Leadership Ambassador club Adventist module
 
labb123456789123456789123456789123456789
labb123456789123456789123456789123456789labb123456789123456789123456789123456789
labb123456789123456789123456789123456789
 
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdfRECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
RECOGNITION AWARD 13 - TO ALESSANDRO MARTINS.pdf
 
一比一原版(SFU毕业证)西蒙弗雷泽大学毕业证如何办理
一比一原版(SFU毕业证)西蒙弗雷泽大学毕业证如何办理一比一原版(SFU毕业证)西蒙弗雷泽大学毕业证如何办理
一比一原版(SFU毕业证)西蒙弗雷泽大学毕业证如何办理
 
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
 
A Guide to a Winning Interview June 2024
A Guide to a Winning Interview June 2024A Guide to a Winning Interview June 2024
A Guide to a Winning Interview June 2024
 
lab.123456789123456789123456789123456789
lab.123456789123456789123456789123456789lab.123456789123456789123456789123456789
lab.123456789123456789123456789123456789
 
Resumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying OnlineResumes, Cover Letters, and Applying Online
Resumes, Cover Letters, and Applying Online
 
0624.speakingengagementsandteaching-01.pdf
0624.speakingengagementsandteaching-01.pdf0624.speakingengagementsandteaching-01.pdf
0624.speakingengagementsandteaching-01.pdf
 
一比一原版(YU毕业证)约克大学毕业证如何办理
一比一原版(YU毕业证)约克大学毕业证如何办理一比一原版(YU毕业证)约克大学毕业证如何办理
一比一原版(YU毕业证)约克大学毕业证如何办理
 
thyroid case presentation.pptx Kamala's Lakshaman palatial
thyroid case presentation.pptx Kamala's Lakshaman palatialthyroid case presentation.pptx Kamala's Lakshaman palatial
thyroid case presentation.pptx Kamala's Lakshaman palatial
 
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
一比一原版布拉德福德大学毕业证(bradford毕业证)如何办理
 
Status of Women in Pakistan.pptxStatus of Women in Pakistan.pptx
Status of Women in Pakistan.pptxStatus of Women in Pakistan.pptxStatus of Women in Pakistan.pptxStatus of Women in Pakistan.pptx
Status of Women in Pakistan.pptxStatus of Women in Pakistan.pptx
 

Copy of Hari Intern Presentation.pdf

  • 1. Hari Karthikeyan Data Engineering Intern Fall Intern Presentation December 15th, 2021 Manager: Ronak Shah, Director, Data Engineering Mentor: Alok Shenoy, Senior Data Engineer
  • 2. 2 Agenda 1 Introduction 2 Starter Projects 3 Argos Project 4 MAT 16 Project 5 Acknowledgements
  • 3. 3 About Me - Computer Engineering @ University of Waterloo - Graduating April 2022 - Huge soccer and basketball fan! Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
  • 4. Starter Projects Implemented a MEGA Export job in Airflow ● Created an Airflow operator and a test DAG to move data between two redshift instances Expanded LaForge module to send out alerts via pagerduty ● Integrated PD plugin into LaForge; documented, tested and deployed the changes to ensure alerting when DAG’s don’t meet their SLA’s Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements 4 How it started…
  • 5. 5 Argos Project Overview Goal: Data Quality and Anomaly Detection system that will be central to implementing the trust but verify principle across all data pipelines at Coursera Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
  • 6. Process Planning Development Testing Demo Deployment 6 Technical Design Document Storage and persistence layer Architectural design sessions Stage-check-e xchange principle Data Quality check persistence operator Anomaly Detection operator Predictions Databricks notebook Argos plugin Airflow DAG’s and data backfilling DAG’s to perform end-to-end testing Backfilling of test data in EDW/EDS Impressions demo pipeline with dbt Integration with dbt to handle anomalies Circuit-breaker functionality of Argos Productionizing Argos Databricks jobs to move Argos metadata from raw layer -> L0 -> L1 Extensive documentation - Operator development guideline Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
  • 7. 7 Argos Motivation ● More pipelines built upon the core data sets leads to data quality being impacted ● Moving away from MEGA means we need a system to rely on Airflow to carry out blocking/non-blocking checks ● Need for an extensible framework to act as a circuit-breaker in all data pipelines ● Ability to run data quality checks and anomaly detection on EDW/EDS tables Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
  • 8. 8 Argos Implementation ● Plugin to parse a config file and inject tasks into a DAG with dependencies ● Operator to persist check results as JSON in raw layer S3 bucket ● Operator to perform anomaly detection by comparing today’s check result to latest prediction result in L1 layer ● Databricks notebook to generate lower and upper prediction bounds based on historical check results data Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
  • 9. 9 Argos Challenges ● Finalizing technical design and project planning with so many moving parts ● Issues with local Airflow env setup; used the dev Airflow cluster for testing ● Connecting Airflow to Databricks in order to trigger a notebook run ● Plenty of edge cases to consider; performed iterative development Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
  • 10. Argos check results metadata Example This table (l1.argos_check_results) has the row count check results for all tables run from various DAG’s 10 Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
  • 11. Place your screenshot here 11 Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements Argos prediction results metadata Example This table (l1.argos_prediction_results) has the row count prediction results for all tables run from various DAG’s
  • 12. Argos Project UI to add checks on tables and a Looker dashboard to visualize them 12 Next Steps Argos as a central microservice (flask application) Consolidation of Argos logs by integrating with project Helios Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements Sophisticated predictions and anomaly detection
  • 13. 13 Value of extensive documentation and testing; monitoring after deployment of ETL pipelines Importance of architectural design sessions; teamwork and collaboration Python, SQL, Airflow, Databricks, AWS, dbt, Docker, Terraform Key Learnings and Takeaways Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements
  • 14. 14 MAT 16 Project: SkillsMatch Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements ● Full-service destination hub for career discovery, skills development, and job matching ● Mapping learner skills and skills proficiency level to job openings and addressing skill gaps through a learner portal ● Filtering Coursera learners based on skills and targeting specific learner segments to fill job openings with qualified candidates through an employer portal ● Extracting real-world job data (skills, description, URL, salary, etc.) from Burning Glass API’s ● Backend data model to facilitate matching algorithm and job/course recommendations based on users skill scores
  • 15. Thank you! ● Mentor: Alok ● Manager: Ronak ● Demo: Chibu ● MAT: Steven, Simon (DE), entire SkillsMatch team ● Entire DE team ● Brown Bag Series Introduction Starter Projects Argos Project MAT 16 Project Acknowledgements