SlideShare a Scribd company logo
1 of 9
Download to read offline
Data
Engineering
101
for absolute beginner
Who is Data Engineer?
Data engineering field could be thought of as a superset of business
intelligence and data warehousing that brings more elements from software
engineering. This discipline also integrates specialization around the operation of so
called “big data” distributed systems, along with concepts around the extended
Hadoop ecosystem, stream processing, and in computation at scale.
-Maxime Beauchemin, the original author of Airflow
Think of Artificial Intelligence as the top of a pyramid of needs. Yes, self-
actualization (AI) is great, but you first need food, water, and shelter (data literacy,
collection, and infrastructure).
https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
What Data
Engineer
Do?
one of their highly sought-after
skills is the ability to design, build,
and maintain data warehouses.
A data warehouse is a place
where raw data is transformed
and stored in query-able forms.
In many ways, data warehouses
are both the engine and the fuels
that enable higher level analytics,
be it business intelligence, online
experimentation, or machine
learning.
ETL: Extract,
Transform,
and Load
Extract: this is the step where sensors wait
for upstream data sources to land (e.g. a
upstream source could be machine or
user-generated logs, relational database
copy, external dataset … etc). Upon
available, we transport the data from their
source locations to further transformations.
Transform: This is the heart of any ETL job,
where we apply business logic and perform
actions such as filtering, grouping, and
aggregation to translate raw data into
analysis-ready datasets. This step requires
a great deal of business understanding and
domain knowledge.
Load: we load the processed data and
transport them to a final destination. Often,
this dataset can be either consumed
directly by end-users or it can be treated as
yet another upstream dependency to
another ETL job, forming the so called data
lineage.
These three conceptual steps are how most
data pipelines are designed and structured.
They serve as a blueprint for how raw data is
transformed to analysis-ready data.
OLTP vs OLAP
We can divide IT systems into transactional (OLTP) and
analytical (OLAP). In general we can assume that OLTP systems
provide source data to data warehouses, whereas OLAP systems
help to analyze it.
Data Engineer Data Scientist
Source
https://medium.com/@rchang/a-beginners-guide-to-data-
engineering-part-i-4227c5c457d7
https://www.datawarehouse4u.info/OLTP-vs-OLAP.html
https://www.datacamp.com/community/blog/data-scientist-vs-
data-engineer
This deck created using canva.com

More Related Content

What's hot

Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyShivam Dhawan
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
What is data engineering?
What is data engineering?What is data engineering?
What is data engineering?yongdam kim
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesEric Kavanagh
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 

What's hot (20)

Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and Strategy
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
What is data engineering?
What is data engineering?What is data engineering?
What is data engineering?
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 

Similar to Data Engineering 101 for Beginners

MapR Data Hub White Paper V2 2014
MapR Data Hub White Paper V2 2014MapR Data Hub White Paper V2 2014
MapR Data Hub White Paper V2 2014Erni Susanti
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapersKai Zhao
 
Data Warehousing And Data Mining Presentation Transcript
Data Warehousing And Data Mining   Presentation TranscriptData Warehousing And Data Mining   Presentation Transcript
Data Warehousing And Data Mining Presentation TranscriptSUBODH009
 
ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita DubeyAnkita Dubey
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
Information On Line Transaction Processing
Information On Line Transaction ProcessingInformation On Line Transaction Processing
Information On Line Transaction ProcessingStefanie Yang
 
A Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsA Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsRhonda Cetnar
 
Informatica Interview Questions & Answers
Informatica Interview Questions & AnswersInformatica Interview Questions & Answers
Informatica Interview Questions & AnswersZaranTech LLC
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfDatacademy.ai
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investmentvijayk23x
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesCindy Irby
 
TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)ruchabhandiwad
 
Data junction tool
Data junction toolData junction tool
Data junction toolSara shall
 

Similar to Data Engineering 101 for Beginners (20)

MapR Data Hub White Paper V2 2014
MapR Data Hub White Paper V2 2014MapR Data Hub White Paper V2 2014
MapR Data Hub White Paper V2 2014
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data Warehousing And Data Mining Presentation Transcript
Data Warehousing And Data Mining   Presentation TranscriptData Warehousing And Data Mining   Presentation Transcript
Data Warehousing And Data Mining Presentation Transcript
 
ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita Dubey
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
[IJET-V1I5P5] Authors: T.Jalaja, M.Shailaja
 
Information On Line Transaction Processing
Information On Line Transaction ProcessingInformation On Line Transaction Processing
Information On Line Transaction Processing
 
A Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsA Comparitive Study Of ETL Tools
A Comparitive Study Of ETL Tools
 
Informatica Interview Questions & Answers
Informatica Interview Questions & AnswersInformatica Interview Questions & Answers
Informatica Interview Questions & Answers
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdf
 
DMDW 1st module.pdf
DMDW 1st module.pdfDMDW 1st module.pdf
DMDW 1st module.pdf
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investment
 
Dwh faqs
Dwh faqsDwh faqs
Dwh faqs
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
 
TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)TheETLBottleneckinBigDataAnalytics(1)
TheETLBottleneckinBigDataAnalytics(1)
 
Data junction tool
Data junction toolData junction tool
Data junction tool
 

More from Novita Sari

Summary python coding
Summary python codingSummary python coding
Summary python codingNovita Sari
 
Summary machine learning and model deployment
Summary machine learning and model deploymentSummary machine learning and model deployment
Summary machine learning and model deploymentNovita Sari
 
Summary data visualization
Summary data visualizationSummary data visualization
Summary data visualizationNovita Sari
 
Summary data modelling
Summary data modellingSummary data modelling
Summary data modellingNovita Sari
 
Summary business knowledge for data professional
Summary business knowledge for data professionalSummary business knowledge for data professional
Summary business knowledge for data professionalNovita Sari
 
Practice case legal for data professional
Practice case legal for data professionalPractice case legal for data professional
Practice case legal for data professionalNovita Sari
 
Git, Docker, Python Package and Module
Git, Docker, Python Package and ModuleGit, Docker, Python Package and Module
Git, Docker, Python Package and ModuleNovita Sari
 
OOP, Networking, Linux/Unix
OOP, Networking, Linux/UnixOOP, Networking, Linux/Unix
OOP, Networking, Linux/UnixNovita Sari
 
Python Function and Looping
Python Function and LoopingPython Function and Looping
Python Function and LoopingNovita Sari
 
Basic Data Engineering
Basic Data EngineeringBasic Data Engineering
Basic Data EngineeringNovita Sari
 

More from Novita Sari (13)

Advanced python
Advanced pythonAdvanced python
Advanced python
 
Summary python coding
Summary python codingSummary python coding
Summary python coding
 
Summary machine learning and model deployment
Summary machine learning and model deploymentSummary machine learning and model deployment
Summary machine learning and model deployment
 
Summary legal
Summary legalSummary legal
Summary legal
 
Summary data visualization
Summary data visualizationSummary data visualization
Summary data visualization
 
Summary data modelling
Summary data modellingSummary data modelling
Summary data modelling
 
Summary business knowledge for data professional
Summary business knowledge for data professionalSummary business knowledge for data professional
Summary business knowledge for data professional
 
Practice case legal for data professional
Practice case legal for data professionalPractice case legal for data professional
Practice case legal for data professional
 
Big data tools
Big data toolsBig data tools
Big data tools
 
Git, Docker, Python Package and Module
Git, Docker, Python Package and ModuleGit, Docker, Python Package and Module
Git, Docker, Python Package and Module
 
OOP, Networking, Linux/Unix
OOP, Networking, Linux/UnixOOP, Networking, Linux/Unix
OOP, Networking, Linux/Unix
 
Python Function and Looping
Python Function and LoopingPython Function and Looping
Python Function and Looping
 
Basic Data Engineering
Basic Data EngineeringBasic Data Engineering
Basic Data Engineering
 

Recently uploaded

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Recently uploaded (20)

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

Data Engineering 101 for Beginners

  • 2. Who is Data Engineer?
  • 3. Data engineering field could be thought of as a superset of business intelligence and data warehousing that brings more elements from software engineering. This discipline also integrates specialization around the operation of so called “big data” distributed systems, along with concepts around the extended Hadoop ecosystem, stream processing, and in computation at scale. -Maxime Beauchemin, the original author of Airflow
  • 4. Think of Artificial Intelligence as the top of a pyramid of needs. Yes, self- actualization (AI) is great, but you first need food, water, and shelter (data literacy, collection, and infrastructure). https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
  • 5. What Data Engineer Do? one of their highly sought-after skills is the ability to design, build, and maintain data warehouses. A data warehouse is a place where raw data is transformed and stored in query-able forms. In many ways, data warehouses are both the engine and the fuels that enable higher level analytics, be it business intelligence, online experimentation, or machine learning.
  • 6. ETL: Extract, Transform, and Load Extract: this is the step where sensors wait for upstream data sources to land (e.g. a upstream source could be machine or user-generated logs, relational database copy, external dataset … etc). Upon available, we transport the data from their source locations to further transformations. Transform: This is the heart of any ETL job, where we apply business logic and perform actions such as filtering, grouping, and aggregation to translate raw data into analysis-ready datasets. This step requires a great deal of business understanding and domain knowledge. Load: we load the processed data and transport them to a final destination. Often, this dataset can be either consumed directly by end-users or it can be treated as yet another upstream dependency to another ETL job, forming the so called data lineage. These three conceptual steps are how most data pipelines are designed and structured. They serve as a blueprint for how raw data is transformed to analysis-ready data.
  • 7. OLTP vs OLAP We can divide IT systems into transactional (OLTP) and analytical (OLAP). In general we can assume that OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it.
  • 8. Data Engineer Data Scientist