SlideShare a Scribd company logo
1 of 11
Introduction to data
engineering
Yasmine Chelly
ML GDE
“Data is the new oil because data
can be used to derive insights.
Depending on what a company
does, insights can drive customer
retention, upselling, new revenue
models, advertising, etc. If data is
the new oil, insights are the new
money.” - Forbes
A developer with a tester mindset
Data engineers are the people who design the system that unifies data and
can help others navigate it.
Data engineers perform many different tasks including: Acquisition,
Cleansing, Conversion and deduplication.
What is a data engineer ?
Simplified pipeline
automate
Transform
Extract Load
ETLs
Monitor
A real pipeline
Let’s get to know some GCP tools
Google’s role in modern
data engineering
What is Google Cloud Platform
GCP
Google’s DE services
Lab
https://shorturl.at/fkpqA
Thank You!
Yasmine Chelly
ML GDE
bibliography
https://www.dremio.com/resources/guides/intro-data-
engineering/
https://medium.com/analytics-vidhya/the-5-vs-of-big-
data-2758bfcc51d
https://cloud.google.com/blog/products/data-
analytics/building-the-data-engineering-driven-
organization?hl=en
https://maelfabien.github.io/bigdata/gcps_1/#history-
and-context

More Related Content

Similar to First workshop: Intoduction to Data Engineering

Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
Roman Pavlyuk, Yaroslav Ravlinko, Intellias. Enterprise IT Transformation and...
Roman Pavlyuk, Yaroslav Ravlinko, Intellias. Enterprise IT Transformation and...Roman Pavlyuk, Yaroslav Ravlinko, Intellias. Enterprise IT Transformation and...
Roman Pavlyuk, Yaroslav Ravlinko, Intellias. Enterprise IT Transformation and...
IT Arena
 

Similar to First workshop: Intoduction to Data Engineering (20)

Article Evaluation 4
Article Evaluation 4Article Evaluation 4
Article Evaluation 4
 
Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017
 
What is DataOps_ - Bahaa Al Zubaidi.pdf
What is DataOps_ - Bahaa Al Zubaidi.pdfWhat is DataOps_ - Bahaa Al Zubaidi.pdf
What is DataOps_ - Bahaa Al Zubaidi.pdf
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
 
Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics
 
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...
Activating Big Data: The Key To Success with Machine Learning Advanced Analyt...
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
The value of our data
The value of our dataThe value of our data
The value of our data
 
Accelerating Machine Learning as a Service with Automated Feature Engineering
Accelerating Machine Learning as a Service with Automated Feature EngineeringAccelerating Machine Learning as a Service with Automated Feature Engineering
Accelerating Machine Learning as a Service with Automated Feature Engineering
 
MuleSoft Meetup June London 2023.pptx.pdf
MuleSoft Meetup June London 2023.pptx.pdfMuleSoft Meetup June London 2023.pptx.pdf
MuleSoft Meetup June London 2023.pptx.pdf
 
Roman Pavlyuk, Yaroslav Ravlinko, Intellias. Enterprise IT Transformation and...
Roman Pavlyuk, Yaroslav Ravlinko, Intellias. Enterprise IT Transformation and...Roman Pavlyuk, Yaroslav Ravlinko, Intellias. Enterprise IT Transformation and...
Roman Pavlyuk, Yaroslav Ravlinko, Intellias. Enterprise IT Transformation and...
 
Business Data Lake Best Practices
Business Data Lake Best PracticesBusiness Data Lake Best Practices
Business Data Lake Best Practices
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with Cloudera
 
Duke fuqua marketing forum isbell sep 2014 final
Duke fuqua marketing forum isbell sep 2014 finalDuke fuqua marketing forum isbell sep 2014 final
Duke fuqua marketing forum isbell sep 2014 final
 
Devoteam itsmf 2021 - from business automation to continuous value-driven i...
Devoteam   itsmf 2021 - from business automation to continuous value-driven i...Devoteam   itsmf 2021 - from business automation to continuous value-driven i...
Devoteam itsmf 2021 - from business automation to continuous value-driven i...
 
Agile Corporation for MIT
Agile Corporation for MITAgile Corporation for MIT
Agile Corporation for MIT
 
Data_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdfData_and_Analytics_Industry_IESE_v3.pdf
Data_and_Analytics_Industry_IESE_v3.pdf
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Di in the age of digital disruptions v1.0
Di in the age of digital disruptions v1.0Di in the age of digital disruptions v1.0
Di in the age of digital disruptions v1.0
 

Recently uploaded

會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
中 央社
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
Peter Brusilovsky
 

Recently uploaded (20)

An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptx
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
An Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppAn Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge App
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
Supporting Newcomer Multilingual Learners
Supporting Newcomer  Multilingual LearnersSupporting Newcomer  Multilingual Learners
Supporting Newcomer Multilingual Learners
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportBasic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 

First workshop: Intoduction to Data Engineering

Editor's Notes

  1. https://medium.com/analytics-vidhya/the-5-vs-of-big-data-2758bfcc51d
  2. ETL Tools: ETL (extract, transform, load) tools move data between systems. They access data, then apply rules to “transform” the data through steps that make it more suitable for analysis. SQL: Structured Query Language (SQL) is the standard language for querying relational databases. Python: Python is a general programming language. Data engineers may choose to use Python for ETL tasks. Cloud Data Storage: Including Amazon S3, Azure Data Lake Storage (ADLS), Google Cloud Storage, etc. Query Engines: Engines run queries against data to return answers. Data engineers may work with engines like Dremio Sonar, Spark, Flink, and others.
  3. 1980s: Server on-premises. You own everything, and you manage it. 2000s: Data Centers. Rent the space, but pay and manage the hardware. No direct physical access to the computers. Now: First Generation Cloud with Virtualized Data Centers. You rent hardware and space, still controlling and configuring virtual machines. Pay only for what you provision. Next: Managed Services. Completely elastic storage, processing, ML. Pay for what you use.