ETL for the Masses
Régis Baccaro – IBM
@regbac
Our Sponsors
Introduction

Régis Baccaro

@regbac

http://Theblobfarm.wordpress.com
http://Thelovefarm.wordpress.com

regis@baccaro.com...
Agenda
• Power Query and the M language
• E and T and L with Power Query
• Data refresh techniques with PQ
• Next step
Introduction
• Power Query
• Get data experience
• Filter and combine
• Embedded M for repeatable mashup

• Power Query Fo...
Elements of language
• Expressions – central construct
• Evaluated to a single vlaue

• Values
•
•
•
•
•

Primitives
List ...
Evaluation
• Excel-like (surprise !)
• Nested records
• In Records
• In Lists

• Lazy evaluation
• Lists and Records (and ...
Functions and Standard Library
• Mapping from a set of values to a single value
• (named parameters) => function body

• C...
Operators
• Meaning varies depending on kind of value

• & = text or list concatenation and records merge
Metadata
• Information about a value that is associated with a value
• A record
• Exists for every value
• Unobtrusive way...
Let .....in expression
• So far only literal values
• Let allows a set of value to be:
• Computed
• Named
• Used in subseq...
IF expression
• Select between 2 expression based on logical condition
Error expression
• When an expression evaluation cannot yield a value
• Raised with error
• Handled with try
• Produces an...
Keywords and Operators
• and as each else error false if in is let meta not
otherwise or section shared then true try type...
The ”E” - Why is Power Query great for Extracting data
• Multiple data sources

Hey wait ! Where is PDW ?
Query folding - A step toward declarative ETL approach
• Declarative vs Imperative
• Query folding similar to predicate pu...
Real life scenario – ETL for the masses
• Seen a lot of demos
• Build a lot of demos
• They are always so clean !
Real life scenario
Transform
• M is how the magic happens!
• Data manipulation
• Records
• Lists
• Tables

• Merging
• Function calls
What about our scenario?
• Where should I get my data from?
• Pure Excel
• Excel and MDS/DQS/SSIS/SQL
• Web, SQL, XML, ?

...
Let’s go to homegrown data?
• Bad web service
• Bad HTML structure
• Let’s go with local data that we can control

Isolate...
Clean up before you merge!
• DQS
Knowledge base with CVR
+ Cleansing project with LinkedIn input
_________________________...
Smarter Power Query
• Expression.Evaluate()
• Examples
• Load query text from file
• Load function from file
• Passing par...
Refreshing Power Query data
• Different solutions
• All with flaws !
Refreshing Power Query data – with VB6 !
• Back from 2006
Plus

Minus

Can be scheduled

VB6 – are you kidding ?

More rob...
Refreshing Power Query data – with PowerShell

Plus

Minus

Robust

Hard to troubleshoot
Can not run in a task in windows ...
Refreshing Power Query data – The non-technical way
• Let me show you !
Plus

Minus

Very easy

Not very corporate !
The s...
Refreshing Power Query data – The non-technical way part 2
• Let me show you !
Plus

Minus

Very easy

Not very corporate ...
Refreshing Power Query data – with SSIS

Plus

Minus

Robust

Requires a SQL Server (wait, it’s a plus!)
Needs a SSIS / C#...
Refreshing Power Query data – with SSIS
• Using DQS for cleansing input

• Let me show you !
How is Power query going to be used?
• Data store accumulating interesting data points
• Hook into read only data for repo...
Conclusion
• A step toward declarative ETL approach
• Still much work to do !
We have
• A declarative data integration lan...
THANK YOU!
@REGBAC
HTTP://THEBLOBFARM.WORDPRESS.COM
REGIS@BACCARO.COM
Upcoming SlideShare
Loading in …5
×

ETL for the masses with Power Query and M

2,954 views

Published on

This is the slide deck for my presentation given at SQL Saturday 280 in Vienna.on March 6th 2014.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,954
On SlideShare
0
From Embeds
0
Number of Embeds
37
Actions
Shares
0
Downloads
58
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

ETL for the masses with Power Query and M

  1. 1. ETL for the Masses Régis Baccaro – IBM @regbac
  2. 2. Our Sponsors
  3. 3. Introduction Régis Baccaro @regbac http://Theblobfarm.wordpress.com http://Thelovefarm.wordpress.com regis@baccaro.com • • • • • Founder and lead organizer of SQL Saturday Denmark PASS Regional Mentor Works for IBM Passionate about the community .Net developer, BI dude, SharePoint fellow and accidental DBA
  4. 4. Agenda • Power Query and the M language • E and T and L with Power Query • Data refresh techniques with PQ • Next step
  5. 5. Introduction • Power Query • Get data experience • Filter and combine • Embedded M for repeatable mashup • Power Query Formula Language (aka M) • • • • • Mostly pure Higher-order Dynamically typed Partially lazy  Functional programming language
  6. 6. Elements of language • Expressions – central construct • Evaluated to a single vlaue • Values • • • • • Primitives List – ordered seq. Record – set of fields Table Function
  7. 7. Evaluation • Excel-like (surprise !) • Nested records • In Records • In Lists • Lazy evaluation • Lists and Records (and let) • Eager evaluation • Everything else
  8. 8. Functions and Standard Library • Mapping from a set of values to a single value • (named parameters) => function body • Common set of definitions
  9. 9. Operators • Meaning varies depending on kind of value • & = text or list concatenation and records merge
  10. 10. Metadata • Information about a value that is associated with a value • A record • Exists for every value • Unobtrusive way to add information • Accessed with Value.Metadata
  11. 11. Let .....in expression • So far only literal values • Let allows a set of value to be: • Computed • Named • Used in subsequent expressions that follows the in let in Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......), RowCount = Table.RowCount(Source) RowCount
  12. 12. IF expression • Select between 2 expression based on logical condition
  13. 13. Error expression • When an expression evaluation cannot yield a value • Raised with error • Handled with try • Produces an Error record • try...otherwise Used with default values
  14. 14. Keywords and Operators • and as each else error false if in is let meta not otherwise or section shared then true try type #binary #date #datetime #datetimezone #duration #infinity #nan #sections #shared #table #time • , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? => .. ...
  15. 15. The ”E” - Why is Power Query great for Extracting data • Multiple data sources Hey wait ! Where is PDW ?
  16. 16. Query folding - A step toward declarative ETL approach • Declarative vs Imperative • Query folding similar to predicate pushdown • Does Power Query have a Query Optimizer ? • Demo Query folding - the unofficial list: • SQL Databases • OData and OData based sources, such as the Windows Azure Marketplace and SharePoint Lists • Active Directory • HDFS.Files, Folder.Files, and Folder.Contents (for basic operations on paths) • • • • Column removal Renaming Joins Type conversions
  17. 17. Real life scenario – ETL for the masses • Seen a lot of demos • Build a lot of demos • They are always so clean !
  18. 18. Real life scenario
  19. 19. Transform • M is how the magic happens! • Data manipulation • Records • Lists • Tables • Merging • Function calls
  20. 20. What about our scenario? • Where should I get my data from? • Pure Excel • Excel and MDS/DQS/SSIS/SQL • Web, SQL, XML, ? • Let me show you ! Input • (cvr web)
  21. 21. Let’s go to homegrown data? • Bad web service • Bad HTML structure • Let’s go with local data that we can control Isolated DB • SQL Server • Excel • Let’s Query! Local storage
  22. 22. Clean up before you merge! • DQS Knowledge base with CVR + Cleansing project with LinkedIn input ________________________________________ = Demo2.1_AndreasStrandbyClean + • Hit ratio increased... Hit 250 Total 100% 90% 80% 200 70% 60% 150 50% = 40% 100 30% 20% 50 10% 0 0% Clean join Nested Merge join
  23. 23. Smarter Power Query • Expression.Evaluate() • Examples • Load query text from file • Load function from file • Passing parameters (as constants) • Demo
  24. 24. Refreshing Power Query data • Different solutions • All with flaws !
  25. 25. Refreshing Power Query data – with VB6 ! • Back from 2006 Plus Minus Can be scheduled VB6 – are you kidding ? More robust than the non-technical solution • From Kim GreenLee
  26. 26. Refreshing Power Query data – with PowerShell Plus Minus Robust Hard to troubleshoot Can not run in a task in windows task scheduler unless the user has checked that the user has to be logged on to run
  27. 27. Refreshing Power Query data – The non-technical way • Let me show you ! Plus Minus Very easy Not very corporate ! The spreadsheet needs to be open Excel file not saved Locked out when it refreshes
  28. 28. Refreshing Power Query data – The non-technical way part 2 • Let me show you ! Plus Minus Very easy Not very corporate ! Uses technique from previous The spreadsheet needs to be open
  29. 29. Refreshing Power Query data – with SSIS Plus Minus Robust Requires a SQL Server (wait, it’s a plus!) Needs a SSIS / C# developer
  30. 30. Refreshing Power Query data – with SSIS • Using DQS for cleansing input • Let me show you !
  31. 31. How is Power query going to be used? • Data store accumulating interesting data points • Hook into read only data for reporting purposes or data marts • One file to accumulate (Produce) • Multiple files or programs to report (Consume) • I don’t believe in “Data Steward” • I believe someone will be in charge of procuring and monitoring data stores of disparate data (such as IT or DBA’s).
  32. 32. Conclusion • A step toward declarative ETL approach • Still much work to do ! We have • A declarative data integration language • Only surfaced in Power Query • Can push data to an Excel spreadsheet Imagine..... • Connection to heterogenous data sources
  33. 33. THANK YOU! @REGBAC HTTP://THEBLOBFARM.WORDPRESS.COM REGIS@BACCARO.COM

×