2. marco kiesewettermarco kiesewetter
OVERVIEW
Why use manual ETL from your desktop?
A common issue to deal with
Example: Salesforce to SQL - a simple way to load fresh
data
Recap:What is ETL?
Extract data from outside source, i.e. Salesforce.com
Transform data to fit operational needs
Load data into a data storage system such as a SQL database, data mart or
data warehouse
3. marco kiesewettermarco kiesewetter
WHY USE MANUAL ETL
Business Analysts, Financial Analysts and BI Developers run
into two common situations where a manual desktop
upload is needed:
The data does not yet exist in SQL
Often it is helpful to get a sample dataset into SQL to test and justify an
new data source that is needed.
Development using this new data source can begin immediately, even before
the automated ETL job is set up by your IT department
The data needs to be refreshed off-schedule
Many ETL jobs run once or twice a day. If an extra refresh is needed at
times, a manual upload of ‘fresh’ data can be the quickest way
4. marco kiesewettermarco kiesewetter
A COMMON ISSUE TO DEAL WITH
One of the most common issues is the formatting
of the raw data
Field delimiters & row delimiters may not be standard
The use of quotations and commas in text fields can cause delimiter
recognition to fail
SQL Server does only support CSV upload if the data is in a specific
format
Salesforce row delimiters are not recognized
Quotes only work if all fields in a column are enclosed in quotes
7. marco kiesewettermarco kiesewetter
STEP 1: THE EXTRACT FILE
Download the Salesforce report results you want
to upload
Download as CSV
Save in an accessible network path
The SQL server has to be able to access it.
Use a filename that does not change
Avoid dates etc. in the file name.
Scripts will access this filename in this folder. For an update simply
overwrite this file.
8. marco kiesewettermarco kiesewetter
STEP 2: STANDARDIZE THE FILE
Now we are usingWindows PowerShell to prepare the csv for upload
Often we will have commas in comments or other text fields.We should change the
field delimiter to a different symbol.The pipe “|” is a good option.
First we will need change all existing pipes to something else in order to make the
pipe symbol unique as field delimiter.
In the example below I simply remove the pipes by replacing them with an empty
string, they can be replaced with anything else, though.
# Define the file but note that I do not add “.csv”
$csvfile = 'serverfoldersalesforceExtract'
# Now we replace all pipes
get-content ($csvfile + ".csv") | % {$_ -replace "|", ""} |
out-file ($csvfile +" (no Pipes).csv") -force -encoding ascii
9. marco kiesewettermarco kiesewetter
STEP 2: STANDARDIZE THE FILE
Next we will change the delimiter and standardize the CSV file:
The file salesforceExtract (standardized).csv has now all fields in quotes.
Since we do not use commas as delimiters and all pipes were removed
from all text fields, we can safely remove all quotes from the file:
# Make standard CSV but use Pipes as delimiter
Import-csv -path ($csvfile + " (no Pipes).csv") -Delimiter ','
| Export-CSV -path ($csvfile + " (standardized).csv") -
Delimiter '|'
# Remove all Quotes (“)
get-content ($csvfile + " (standardized).csv") | % { $_ -
replace '"',""} | Set-Content ($csvfile + " (upload).csv")
10. marco kiesewettermarco kiesewetter
STEP 3: UPLOAD TO SQL
Finally, we upload this prepared CSV to SQL server using
Microsoft SQL Server Management Studio
For this we use the BULK INSERT command
In most cases we may upload a complete new data set.The easiest way to
handle this is by deleting the old table and re-creating it.
Another advantage for doing this is the ease with which new fields can be
added or the variable types of fields can be changed.The new creation of
the table allows for any such adjustments.
Note that the BULK INSERT functionality is a server-side permission
setting and may need to be activated for your login.
11. marco kiesewettermarco kiesewetter
STEP 3: UPLOAD TO SQL
Here is an example SQL
script you can adjust to
fit your needs:
use YourDB
drop table [dbo].[YourTable]
go
create table [dbo].[YourTable](
Field1 nVARCHAR(255) null,
Field2 datetime null,
Field3 float null,
go
bulk insert YourTable
from 'serverfoldersalesfoceExtract
(upload).csv'
With (
fieldterminator = '|',
rowterminator = 'n',
firstrow = 2
)
go
12. marco kiesewettermarco kiesewetter
STEP 4: PUTTING IT ALLTOGETHER
Put all the PowerShell commands into a text file with the extension .ps1
Put the SQL script into a text file with the extension .sql
Now create a batch script (example below, file extension .bat) that runs
all of the above commands and place everything in the same folder in
which you save your extracted CSV from Salesforce
@echo off
cls
echo Standardizing the CSV file...
powershell.exe -noprofile -ExecutionPolicy ByPass -File “My PowerShell
Script.ps1"
echo.
echo Is SQL Server Management Studio running and logged in to server ?
pause
echo.
echo Loading the SQL Query
“My SQL Script.sql"
13. marco kiesewettermarco kiesewetter
SIMPLE ETL
Your automated ETL solution is ready.
All you have to do now is saving the report results
under the same file name, run the batch script and
hit “Execute” in SQL Management Studio once it
loaded.