Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Big Data Analytics: From
ETL to Data Engineering
By Dmitry Tolpeko – November 2016
Is Data Engineering Just a New Term?
 Is there any difference between ETL and Data Engineering? Or do they mean
the same ...
What is ETL?
 ETL means Extract-Transform-Load, and it is a very important part of building
and maintaining a Data Wareho...
ETL Specifics
 Transform: Mostly SQL and Procedural SQL
 SQL Analytic (Window) functions such as LEAD/LAG for complex ca...
Big Data Analytics
 Data Scientists are the first consumers of data now.
 Feature engineering is the major step of build...
Skills: ETL Developer vs Data Engineer
 ETL Developer:
 SQL and procedural SQL
 Analytic SQL functions
 Data warehouse...
Thank you!
Dmitry Tolpeko
dmtolpeko@gmail.com
http://www.dmtolpeko.com
@dmtolpeko
Upcoming SlideShare
Loading in …5
×

0

Share

Download to read offline

Big Data Analytics: From ETL to Data Engineering

Download to read offline

Is there a difference between ETL and Data Engineering?

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Big Data Analytics: From ETL to Data Engineering

  1. 1. Big Data Analytics: From ETL to Data Engineering By Dmitry Tolpeko – November 2016
  2. 2. Is Data Engineering Just a New Term?  Is there any difference between ETL and Data Engineering? Or do they mean the same thing?  I think they are very different, and I will try to explain.
  3. 3. What is ETL?  ETL means Extract-Transform-Load, and it is a very important part of building and maintaining a Data Warehouse  Extract: Export data from one or many sources.  Transform (at various stages): Clean, transform, aggregate etc. data  Load (at various stages): Load data into the data warehouse, data marts i.e. make data available for BI tools.
  4. 4. ETL Specifics  Transform: Mostly SQL and Procedural SQL  SQL Analytic (Window) functions such as LEAD/LAG for complex calculations  Data in Data Warehouse: Often redundant, duplicated and pre-aggregated for performance reasons. But in most cases data is in first normal form i.e. columns contain atomic values. Also data is in Star schema: a set of dimension and fact tables.  Data Consumers: BI tools that mostly visualize data in interactive (filter, drill-down etc.) reports and dashboards.
  5. 5. Big Data Analytics  Data Scientists are the first consumers of data now.  Feature engineering is the major step of building statistical and machine learning models for advanced analytics:  Feature vectors containing hundreds of elements with each element containing a list, map, nested structs, buckets, list of maps, map of maps and so on.  Complex Calculations to create feature vectors within a data window (reduce phase) require a non-SQL approach.
  6. 6. Skills: ETL Developer vs Data Engineer  ETL Developer:  SQL and procedural SQL  Analytic SQL functions  Data warehouse design and modelling  Data Engineer:  Python (Scala, Java), Spark, Pig, SQL  Distributed data processing concepts (MapReduce, Spark)  Statistics and machine learning concepts
  7. 7. Thank you! Dmitry Tolpeko dmtolpeko@gmail.com http://www.dmtolpeko.com @dmtolpeko

Is there a difference between ETL and Data Engineering?

Views

Total views

4,762

On Slideshare

0

From embeds

0

Number of embeds

1,566

Actions

Downloads

31

Shares

0

Comments

0

Likes

0

×