SlideShare a Scribd company logo
Adi Polak
Microsoft
ETL – Everything
you need to know
About me – Adi Polak
@adipolak
https://medium.com/@adipolak
https://dev.to/adipolak
Use cases
AD tech Automotive
Cyber Security
… EVERYTHING WITH
DATA!
@adipolak
Agenda
• ETL
• Data Pipelines
• Data Challenges
• Big Data
• Stream vs Batch processing
• Architecture
• Tools
• Learn More!
@adipolak
E T L
Extract Transform Load
@adipolak
Data Pipeline != ETL
@adipolak
ETL
ETL
Data Pipeline
VS ETL
Most complex
data pipeline demo
Data Challenges –
Collection of data
@adipolak
Data Challenges - Collection of data
Audio, video, images. Meaningless
without adding some structure
Unstructured
JSON, XML, sensor data, social media,
device data, web logs. Flexible data
model structure
Semi-Structured
Structured CSV, Columnar Storage (Parquet, ORC).
Strict data model structure
@adipolak
Data Challenges –
Duplication of data
@adipolak
Data Challenges –
Inconsistency of data
@adipolak
Schema
Data representation
Data value
Data Challenges –
Variety of data
@adipolak
What about Big Data ?
@adipolak
Big Data Processing Pipelines
Visualize
Azure
Machine
Learning
Why is processing Big Data challenging ?
Variety
Velocity
Volume
@adipolak
Data
Engineers
Analytics tools
Knowledge of SQL
Data Warehousing
Data Architecture
Coding Skills
Machine Learning *
@adipolak
Streaming vs. Batch
Processing
Batch processing
NRT / Stream processing
Why not both?
Lambda Architecture
BATCH LAYER SPEED LAYER SERVICE LAYER
Batch Layer
Apache Spark
Azure Batch
Speed Layer
Spark Streaming
Kafka Stream
Azure stream analytics
Service Layer
Prometheus
Graphana
PowerBI
Tools
Many more…
@adipolak
What did we learn today:
• ETL vs Data Pipelines
• Big Data and Data challenges
• Data Engineers
• Streaming vs Batch processing
• Architectures
• Tools
@adipolak
Learn More !
aka.ms/data-guide
aka.ms/stream-processing
aka.ms/building-blocks
aka.ms/start-with-the-cloud
@adipolak

More Related Content

Similar to ETL – Everything you need to know

Spatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data SharingSpatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data Sharing
Safe Software
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Deepak Chandramouli
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
Trivadis
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Building a Testable Data Access Layer
Building a Testable Data Access LayerBuilding a Testable Data Access Layer
Building a Testable Data Access Layer
Todd Anglin
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
50 Shades of Data - JEEConf 2018 - Kyiv, Ukraine
50 Shades of Data - JEEConf 2018 - Kyiv, Ukraine50 Shades of Data - JEEConf 2018 - Kyiv, Ukraine
50 Shades of Data - JEEConf 2018 - Kyiv, Ukraine
Lucas Jellema
 
Lighthouse - an open-source library to build data lakes - Kris Peeters
Lighthouse - an open-source library to build data lakes - Kris PeetersLighthouse - an open-source library to build data lakes - Kris Peeters
Lighthouse - an open-source library to build data lakes - Kris Peeters
Data Science Leuven
 
Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24
Martin Bém
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Databricks
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
Guido Schmutz
 
The Power of Unified Analytics with Ali Ghodsi
The Power of Unified Analytics with Ali Ghodsi The Power of Unified Analytics with Ali Ghodsi
The Power of Unified Analytics with Ali Ghodsi
Databricks
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open Data
Jongwook Woo
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
Spark Summit
 
Tableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of ThoughtTableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of Thought
MongoDB
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics Webinar
Bill Wong
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 

Similar to ETL – Everything you need to know (20)

Spatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data SharingSpatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data Sharing
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
 
Trivadis Azure Data Lake
Trivadis Azure Data LakeTrivadis Azure Data Lake
Trivadis Azure Data Lake
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Building a Testable Data Access Layer
Building a Testable Data Access LayerBuilding a Testable Data Access Layer
Building a Testable Data Access Layer
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
50 Shades of Data - JEEConf 2018 - Kyiv, Ukraine
50 Shades of Data - JEEConf 2018 - Kyiv, Ukraine50 Shades of Data - JEEConf 2018 - Kyiv, Ukraine
50 Shades of Data - JEEConf 2018 - Kyiv, Ukraine
 
Lighthouse - an open-source library to build data lakes - Kris Peeters
Lighthouse - an open-source library to build data lakes - Kris PeetersLighthouse - an open-source library to build data lakes - Kris Peeters
Lighthouse - an open-source library to build data lakes - Kris Peeters
 
Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24Pitfalls of Data Warehousing_2019-04-24
Pitfalls of Data Warehousing_2019-04-24
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
The Power of Unified Analytics with Ali Ghodsi
The Power of Unified Analytics with Ali Ghodsi The Power of Unified Analytics with Ali Ghodsi
The Power of Unified Analytics with Ali Ghodsi
 
Big Data Trend and Open Data
Big Data Trend and Open DataBig Data Trend and Open Data
Big Data Trend and Open Data
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
Tableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of ThoughtTableau & MongoDB: Visual Analytics at the Speed of Thought
Tableau & MongoDB: Visual Analytics at the Speed of Thought
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics Webinar
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 

More from Adi Polak

Demystifying Apache Spark
Demystifying Apache SparkDemystifying Apache Spark
Demystifying Apache Spark
Adi Polak
 
Burst workloads Cutting costs with Kubernetes and Virtual Kubelet
Burst workloads Cutting costs with Kubernetes and Virtual KubeletBurst workloads Cutting costs with Kubernetes and Virtual Kubelet
Burst workloads Cutting costs with Kubernetes and Virtual Kubelet
Adi Polak
 
AI at Scale
AI at ScaleAI at Scale
AI at Scale
Adi Polak
 
From desktop to the cloud, cutting costs with Virtual kubelet and ACI
From desktop to the cloud, cutting costs with Virtual kubelet and ACIFrom desktop to the cloud, cutting costs with Virtual kubelet and ACI
From desktop to the cloud, cutting costs with Virtual kubelet and ACI
Adi Polak
 
Evolution of VS code Java ecosystem
Evolution of VS code Java ecosystemEvolution of VS code Java ecosystem
Evolution of VS code Java ecosystem
Adi Polak
 
Make it clean - scala clean code
Make it clean - scala clean codeMake it clean - scala clean code
Make it clean - scala clean code
Adi Polak
 
Spark UDFs are EviL, Catalyst to the rEsCue!
Spark UDFs are EviL, Catalyst to the rEsCue!Spark UDFs are EviL, Catalyst to the rEsCue!
Spark UDFs are EviL, Catalyst to the rEsCue!
Adi Polak
 

More from Adi Polak (7)

Demystifying Apache Spark
Demystifying Apache SparkDemystifying Apache Spark
Demystifying Apache Spark
 
Burst workloads Cutting costs with Kubernetes and Virtual Kubelet
Burst workloads Cutting costs with Kubernetes and Virtual KubeletBurst workloads Cutting costs with Kubernetes and Virtual Kubelet
Burst workloads Cutting costs with Kubernetes and Virtual Kubelet
 
AI at Scale
AI at ScaleAI at Scale
AI at Scale
 
From desktop to the cloud, cutting costs with Virtual kubelet and ACI
From desktop to the cloud, cutting costs with Virtual kubelet and ACIFrom desktop to the cloud, cutting costs with Virtual kubelet and ACI
From desktop to the cloud, cutting costs with Virtual kubelet and ACI
 
Evolution of VS code Java ecosystem
Evolution of VS code Java ecosystemEvolution of VS code Java ecosystem
Evolution of VS code Java ecosystem
 
Make it clean - scala clean code
Make it clean - scala clean codeMake it clean - scala clean code
Make it clean - scala clean code
 
Spark UDFs are EviL, Catalyst to the rEsCue!
Spark UDFs are EviL, Catalyst to the rEsCue!Spark UDFs are EviL, Catalyst to the rEsCue!
Spark UDFs are EviL, Catalyst to the rEsCue!
 

Recently uploaded

SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Undress Baby
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 

Recently uploaded (20)

SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdfRevolutionizing Visual Effects Mastering AI Face Swaps.pdf
Revolutionizing Visual Effects Mastering AI Face Swaps.pdf
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 

ETL – Everything you need to know

Editor's Notes

  1. Simple data pipeline
  2. Show command line example ps –A | grep java
  3. Relational databases (RDBMS) work with structured data. Non-relational databases (NoSQL) work with semi-structured data Relational data and non-relational data are data models, describing how data is organized. Structured, semi-structured, and unstructured data are data types
  4. Relational databases (RDBMS) work with structured data. Non-relational databases (NoSQL) work with semi-structured data Relational data and non-relational data are data models, describing how data is organized. Structured, semi-structured, and unstructured data are data types
  5. Relational databases (RDBMS) work with structured data. Non-relational databases (NoSQL) work with semi-structured data Relational data and non-relational data are data models, describing how data is organized. Structured, semi-structured, and unstructured data are data types
  6. Relational databases (RDBMS) work with structured data. Non-relational databases (NoSQL) work with semi-structured data Relational data and non-relational data are data models, describing how data is organized. Structured, semi-structured, and unstructured data are data types
  7. Relational databases (RDBMS) work with structured data. Non-relational databases (NoSQL) work with semi-structured data Relational data and non-relational data are data models, describing how data is organized. Structured, semi-structured, and unstructured data are data types
  8. Variety: It can be structured, semi-structured, or unstructured Velocity: It can be streaming, near real-time or batch Volume: It can be 1GB or 1ZB
  9. Apache Hadoop-based analytics - it is an open source platform for distributed storage and distributed processing of data sets. These services provide data storage, processing, data access, security, governance, and operations. You need to have a good grasp on tools like MapReduce, Hadoop and HBase etc.  Knowledge of SQL - you need to have strong knowledge in relational database management system such as SQL to manage data. Data Warehousing - learning how to construct and use a data warehouse is a must. Data warehousing helps you aggregate unstructured data from one or more sources to compare and analyse for better business. Data Architecture - having knowledge of building complex database systems for companies. This term also refers to processes that address the data at rest, data at motion, data sets, and how they relate to data dependent applications and processes. Coding skills - you need to have good coding skills in Python, Java, Perl etc. Machine Learning - Although, it is said that machine learning is an integral part of data science but having some level of understanding of how to put the data into use using statistical analysis and data modelling is a huge advantage. Therefore, knowing is machine learning can be like cherry to the cake. Operating system - extensive knowledge in operating systems such as Linux, UNIX and Solaris etc. can be very helpful since most of the tools will be based on these systems.