1. Learning
Progress Review
Week 1 - Data Engineer Introduction and Basic Python
Techies SkolaClass Data Engineer Batch 6 - DEputty
2. Learning Progress Review
We Learn, We Grow
Relearn what has been taught by the mentor
To review our progress in learning
To summarize our lesson last week
Objectives
3. What is ... ?
DATA ENGINEER
PROGRAMMING
There are 2 main focuses on this
learning progress review
5. What is Data Engineer?
The person in charge of setting up, creating and
managing the data architecture in a company.
Data
Engineer
PROGRAMMING
DISTRIBUTED
SYSTEM
ANALYTHIC
6. Role of Data Engineer
Source Data Pipeline Data Warehouse
More than 1 Database ETL Tools Data Warehouse
Role of Data Engineer
*ETL = Extract, Transform, Load
7. ELT
Data Engineer Workflow
ETL / ELT Process
ETL, which stands for Extract, Transform and
Load, is a data integration process that
combines data from multiple data sources into
a single, consistent data store that is loaded into
a data warehouse.
Extract Transform Load
ETL
ELT
ETL
Extract
Transform
Load
Extract
Load
Transform (SQL)
ETL ELT
csv csv
python
python
data wh data wh
8. Data Engineer Workflow
Type of Data Processing
Input Data
Streaming Tools
(Streaming Processing)
Extract Data
ETL Tools
(Batch Processing)
Database
Database
Data
Warehouse
ETL Tools
Output Data
Analytic Dashboard
L
o
a
d
D
a
t
a
Load Data Extract
Data
Load
Data
DONE DONE
11. INPUT OUTPUT
Role of Programming in
Data Engineer
Extract Transform Load
PROCESS
Data engineers use a variety of tools. Most
of them use programming languages tools
such as python to process ETL/ELT .
12. Why Data Engineer use Python?
Python language is incredibly easy to use
and learn for new beginners and newcomers.
The python language is one of the most
accessible programming languages
available.
Mature and Supportive Python Community
Hundreds of Python Libraries and
Frameworks
The python language is very convenient to
use in data processing
14. Data Types Values
Bool (bool) True, False
Float (float) -1.0, -0.5, 0.5, 1.0, etc
Integer (Int) 1, 2, 3, 10, 11, 12, 100, 101, 102, etc
String (str) “A”, “a”, “Data Engineer”
Data Types in Python
In programming, data type is an important concept.
Variables can store data of different types, and different types can do different
things.
Python has the following data types built-in by default, in these categories:
15. Data Types in Python - Collection
Basically, collections are a container data types. That can be store more than one
with the same or even different data types.
Collection
list tuple set dict
16. Data Types in Python - Collection
list
lists are used to store multiple items in a single variable. lists are one of 4
built-in data types in Python used to store collections of data.
tuple
A tuple is a collection which is ordered and unchangeable. Tuples are
written with round brackets.
17. Data Types in Python - Collection
set
A set is a collection which is unordered, unchangeable (but the item can be
removed and add new) and unindexed. Sets are written with curly brackets.
dict
Dictionaries are used to store data values in key: value pairs.
A dictionary is a collection which is ordered, changeable and do not allow
duplicates. Dictionaries are written with curly brackets, and have keys and values.
18. Library/Modules
A Python library is a collection of related
modules. It contains bundles of code that
can be used repeatedly in different
situation. It makes Python coding simpler
and convenient for the data engineer.
Before installing the library, make sure
the package manager is installed
properly.
Python's default package manager is the
PIP package manager.
To use the libraries that already installed
on the system, the library must be
imported with code.