Advanced ETL  MS  SSIS 2012 & Talend
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Advanced ETL MS SSIS 2012 & Talend

on

  • 473 views

 

Statistics

Views

Total Views
473
Views on SlideShare
471
Embed Views
2

Actions

Likes
0
Downloads
30
Comments
0

1 Embed 2

http://www.slideee.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Advanced ETL MS SSIS 2012 & Talend Document Transcript

  • 1. Advanced ETL -SSIS 2012 & Talend By Sunny Okoro
  • 2. 1 Contents Database Systems.........................................................................................................................................2 Applications...................................................................................................................................................2 Microsoft SQL Server Integration Services 2012 .....................................................................................4 Talend Open Studio 5.4.............................................................................................................................183
  • 3. 2 Database Systems Microsoft SQL Server 2008R2 Microsoft SQL Server 2012 Applications
  • 4. 3 Microsoft Visio Microsoft Visual Studio 2010
  • 5. 4 Microsoft SQL Server Integration Services 2012
  • 6. 5
  • 7. 6
  • 8. 7 FactInternetSales PK,I1 SalesOrderNumber PK,I1 SalesOrderLineNumber FK6,U6 ProductKey FK3,U5 OrderDateKey FK4,U4 DueDateKey FK5,U1 ShipDateKey FK2,U3 CustomerKey U7 PromotionKey FK1,U2 CurrencyKey FK7 SalesTerritoryKey RevisionNumber OrderQuantity UnitPrice ExtendedAmount UnitPriceDiscountPct DiscountAmount ProductStandardCost TotalProductCost SalesAmount TaxAmt Freight CarrierTrackingNumber CustomerPONumber DimCurrency PK,I2 CurrencyKey I1 CurrencyAlternateKey CurrencyName DimDate PK,I2 DateKey I1 FullDateAlternateKey DayNumberOfWeek EnglishDayNameOfWeek SpanishDayNameOfWeek FrenchDayNameOfWeek DayNumberOfMonth DayNumberOfYear WeekNumberOfYear EnglishMonthName SpanishMonthName FrenchMonthName MonthNumberOfYear CalendarQuarter CalendarYear CalendarSemester FiscalQuarter FiscalYear FiscalSemester DimProduct PK,I2 ProductKey I1 ProductAlternateKey FK1,U1 ProductSubcategoryKey WeightUnitMeasureCode SizeUnitMeasureCode EnglishProductName SpanishProductName FrenchProductName StandardCost FinishedGoodsFlag Color SafetyStockLevel ReorderPoint ListPrice Size SizeRange Weight DaysToManufacture ProductLine DealerPrice Class Style ModelName LargePhoto EnglishDescription FrenchDescription ChineseDescription ArabicDescription HebrewDescription ThaiDescription GermanDescription JapaneseDescription TurkishDescription I1 StartDate EndDate Status DimCustomer PK,I2 CustomerKey FK1,U1 GeographyKey I1 CustomerAlternateKey Title FirstName MiddleName LastName NameStyle BirthDate MaritalStatus Suffix Gender EmailAddress YearlyIncome TotalChildren NumberChildrenAtHome EnglishEducation SpanishEducation FrenchEducation EnglishOccupation SpanishOccupation FrenchOccupation HouseOwnerFlag NumberCarsOwned AddressLine1 AddressLine2 Phone DateFirstPurchase CommuteDistance DimProductCategory PK,I2 ProductCategoryKey I1 ProductCategoryAlternateKey EnglishProductCategoryName SpanishProductCategoryName FrenchProductCategoryName DimProductSubcategory PK,I2 ProductSubcategoryKey I1 ProductSubcategoryAlternateKey EnglishProductSubcategoryName SpanishProductSubcategoryName FrenchProductSubcategoryName FK1 ProductCategoryKey DimSalesTerritory PK,I2 SalesTerritoryKey I1 SalesTerritoryAlternateKey SalesTerritoryRegion SalesTerritoryCountry SalesTerritoryGroup DimGeography PK,I1 GeographyKey City StateProvinceCode StateProvinceName CountryRegionCode EnglishCountryRegionName SpanishCountryRegionName FrenchCountryRegionName PostalCode FK1 SalesTerritoryKey AdventureWorksDW2008R2
  • 9. 8
  • 10. 9
  • 11. 10
  • 12. 11
  • 13. 12
  • 14. 13
  • 15. 14
  • 16. 15
  • 17. 16 Example of Flat files Creation
  • 18. 17
  • 19. 18 The connection string ensures that file is created in the right folder with the right name as declared in the SSIS variable.
  • 20. 19
  • 21. 20
  • 22. 21
  • 23. 22
  • 24. 23 Example of Pivot Creation
  • 25. 24
  • 26. 25 This data flow task contains many tables, files, aggregations and derived columns not all will be illustrated. The pervious demonstrations illustrate some of the key components in this data flow. The following illustrations demonstrates major expression used in derived columns to transform the data.
  • 27. 26
  • 28. 27
  • 29. 28
  • 30. 29 The stored procedure executed from SQL Server management studio displays null data that would be transformed to a specific value using expression in SSIS.
  • 31. 30
  • 32. 31 Countrycode = AU [AUSTRIALIA] STATECODE= VIC[VICTORIA] EXECUTION
  • 33. 32
  • 34. 33
  • 35. 34
  • 36. 35
  • 37. 36
  • 38. 37 Results Abridged
  • 39. 38 Results Abridged
  • 40. 39
  • 41. 40 Results Abridged
  • 42. 41
  • 43. 42
  • 44. 43
  • 45. 44
  • 46. 45
  • 47. 46
  • 48. 47
  • 49. 48
  • 50. 49
  • 51. 50
  • 52. 51
  • 53. 52
  • 54. 53
  • 55. 54
  • 56. 55 Results Abridged
  • 57. 56
  • 58. 57
  • 59. 58
  • 60. 59
  • 61. 60
  • 62. 61 Results Abridged Results Abridged
  • 63. 62
  • 64. 63 The countrycode is changed to US for USA and Statecode to CA for this execution. The [SalesRpt_FiscalYr_City] table does not contain any Australian cities from the previous demonstration because the table was truncated at the beginning of each package execution The countrycode remained the same but the statecode was changed to IL. The data contrnts for the state of Illinios where created in the same folder as state contents for Victoria. The prefixes were changed to IL for each file name to reflect the countrycode and statecode which was done using file connection strings.
  • 65. 64
  • 66. 65 No data found for the city which was in California in the previous execution of this package. I will change the countrycode to CA and state code to BC .
  • 67. 66
  • 68. 67 The output folder is clustered and SSIS will delete every content in the output folder at the beginning of each execution.
  • 69. 68
  • 70. 69 The pervious content has been deleted by SSIS using the file system task which can also be utilized to create directories, copy files etc. The output folder has no content for Great Britain.
  • 71. 70
  • 72. 71
  • 73. 72
  • 74. 73
  • 75. 74
  • 76. 75
  • 77. 76 These files will be imported into MS SQL Server database using foreach loop to grab each csv files and upload them into the product tables.
  • 78. 77
  • 79. 78
  • 80. 79
  • 81. 80
  • 82. 81
  • 83. 82
  • 84. 83
  • 85. 84
  • 86. 85
  • 87. 86
  • 88. 87
  • 89. 88
  • 90. 89
  • 91. 90
  • 92. 91
  • 93. 92
  • 94. 93
  • 95. 94
  • 96. 95
  • 97. 96 DimCurrency PK,I2 CurrencyKey INTEGER I1 CurrencyAlternateKey WCHAR(3) CurrencyName WCHAR(50) FactInternetSales PK,I1 SalesOrderNumber WCHAR(20) PK,I1 SalesOrderLineNumber UTINYINT FK6,U6 ProductKey INTEGER FK3,U5 OrderDateKey INTEGER FK4,U4 DueDateKey INTEGER FK5,U1 ShipDateKey INTEGER FK2,U3 CustomerKey INTEGER U7 PromotionKey INTEGER FK1,U2 CurrencyKey INTEGER FK7 SalesTerritoryKey INTEGER RevisionNumber UTINYINT OrderQuantity SMALLINT UnitPrice CURRENCY ExtendedAmount CURRENCY UnitPriceDiscountPct DOUBLE DiscountAmount DOUBLE ProductStandardCost CURRENCY TotalProductCost CURRENCY SalesAmount CURRENCY TaxAmt CURRENCY Freight CURRENCY CarrierTrackingNumber WCHAR(25) CustomerPONumber WCHAR(25) OrderDate TIMESTAMP DueDate TIMESTAMP ShipDate TIMESTAMP DimDate PK,I2 DateKey INTEGER I1 FullDateAlternateKey WCHAR(10) DayNumberOfWeek UTINYINT EnglishDayNameOfWeek WCHAR(10) SpanishDayNameOfWeek WCHAR(10) FrenchDayNameOfWeek WCHAR(10) DayNumberOfMonth UTINYINT DayNumberOfYear SMALLINT WeekNumberOfYear UTINYINT EnglishMonthName WCHAR(10) SpanishMonthName WCHAR(10) FrenchMonthName WCHAR(10) MonthNumberOfYear UTINYINT CalendarQuarter UTINYINT CalendarYear SMALLINT CalendarSemester UTINYINT FiscalQuarter UTINYINT FiscalYear SMALLINT FiscalSemester UTINYINT DimCustomer PK,I2 CustomerKey INTEGER FK1,U1 GeographyKey INTEGER I1 CustomerAlternateKey WCHAR(15) Title WCHAR(8) FirstName WCHAR(50) MiddleName WCHAR(50) LastName WCHAR(50) NameStyle BOOL BirthDate WCHAR(10) MaritalStatus WCHAR(1) Suffix WCHAR(10) Gender WCHAR(1) EmailAddress WCHAR(50) YearlyIncome CURRENCY TotalChildren UTINYINT NumberChildrenAtHome UTINYINT EnglishEducation WCHAR(40) SpanishEducation WCHAR(40) FrenchEducation WCHAR(40) EnglishOccupation WCHAR(100) SpanishOccupation WCHAR(100) FrenchOccupation WCHAR(100) HouseOwnerFlag WCHAR(1) NumberCarsOwned UTINYINT AddressLine1 WCHAR(120) AddressLine2 WCHAR(120) Phone WCHAR(20) DateFirstPurchase WCHAR(10) CommuteDistance WCHAR(15) DimProduct PK,I2 ProductKey INTEGER I1 ProductAlternateKey WCHAR(25) FK1,U1 ProductSubcategoryKey INTEGER WeightUnitMeasureCode WCHAR(3) SizeUnitMeasureCode WCHAR(3) EnglishProductName WCHAR(50) SpanishProductName WCHAR(50) FrenchProductName WCHAR(50) StandardCost CURRENCY FinishedGoodsFlag BOOL Color WCHAR(15) SafetyStockLevel SMALLINT ReorderPoint SMALLINT ListPrice CURRENCY Size WCHAR(50) SizeRange WCHAR(50) Weight DOUBLE DaysToManufacture INTEGER ProductLine WCHAR(2) DealerPrice CURRENCY Class WCHAR(2) Style WCHAR(2) ModelName WCHAR(50) LargePhoto BINARY(524287) EnglishDescription WCHAR(400) FrenchDescription WCHAR(400) ChineseDescription WCHAR(400) ArabicDescription WCHAR(400) HebrewDescription WCHAR(400) ThaiDescription WCHAR(400) GermanDescription WCHAR(400) JapaneseDescription WCHAR(400) TurkishDescription WCHAR(400) I1 StartDate TIMESTAMP EndDate TIMESTAMP Status WCHAR(7) DimProductCategory PK,I2 ProductCategoryKey INTEGER I1 ProductCategoryAlternateKey INTEGER EnglishProductCategoryName WCHAR(50) SpanishProductCategoryName WCHAR(50) FrenchProductCategoryName WCHAR(50) DimProductSubcategory PK,I2 ProductSubcategoryKey INTEGER I1 ProductSubcategoryAlternateKey INTEGER EnglishProductSubcategoryName WCHAR(50) SpanishProductSubcategoryName WCHAR(50) FrenchProductSubcategoryName WCHAR(50) FK1 ProductCategoryKey INTEGER DimGeography PK,I1 GeographyKey INTEGER City WCHAR(30) StateProvinceCode WCHAR(3) StateProvinceName WCHAR(50) CountryRegionCode WCHAR(3) EnglishCountryRegionName WCHAR(50) SpanishCountryRegionName WCHAR(50) FrenchCountryRegionName WCHAR(50) PostalCode WCHAR(15) FK1 SalesTerritoryKey INTEGER IpAddressLocator WCHAR(15) DimSalesTerritory PK,I2 SalesTerritoryKey INTEGER I1 SalesTerritoryAlternateKey INTEGER SalesTerritoryRegion WCHAR(50) SalesTerritoryCountry WCHAR(50) SalesTerritoryGroup WCHAR(50) SalesTerritoryImage BINARY(524287) AdventureWorksDW 2012
  • 98. 97 For this demonstration, Talend ETL application would be utitlized to transform the data into xml format that can be recognized by SSIS.
  • 99. 98
  • 100. 99
  • 101. 100 Data Mapping
  • 102. 101
  • 103. 102
  • 104. 103
  • 105. 104
  • 106. 105
  • 107. 106 The Adoworks XML document and the Adworks XSD document are created in the XML folder.
  • 108. 107
  • 109. 108
  • 110. 109
  • 111. 110
  • 112. 111
  • 113. 112
  • 114. 113
  • 115. 114
  • 116. 115
  • 117. 116
  • 118. 117
  • 119. 118
  • 120. 119
  • 121. 120
  • 122. 121
  • 123. 122
  • 124. 123
  • 125. 124
  • 126. 125
  • 127. 126
  • 128. 127
  • 129. 128
  • 130. 129
  • 131. 130
  • 132. 131
  • 133. 132
  • 134. 133
  • 135. 134
  • 136. 135
  • 137. 136
  • 138. 137
  • 139. 138
  • 140. 139 Data Validation Only the pivot based reports are displayed fully. The rest of reports are snapshots not the entire data extracted from the database.
  • 141. 140
  • 142. 141
  • 143. 142
  • 144. 143
  • 145. 144
  • 146. 145
  • 147. 146
  • 148. 147
  • 149. 148
  • 150. 149
  • 151. 150
  • 152. 151
  • 153. 152
  • 154. 153
  • 155. 154
  • 156. 155
  • 157. 156
  • 158. 157
  • 159. 158 Another way to create the XML format is to use TSQL XML features like XML Auto and Elements to parse the Query result into an XML Format and extract into an XML file which can be read by SSIS. This method is much faster for smaller data not for big data in a laptop environment.
  • 160. 159
  • 161. 160
  • 162. 161
  • 163. 162
  • 164. 163
  • 165. 164
  • 166. 165
  • 167. 166
  • 168. 167
  • 169. 168
  • 170. 169
  • 171. 170 All of the results are abridged
  • 172. 171 Instead of inserting data for all country when the package is executed. SSIS will insert data using the county code and state code highlighted above and the additional countrycode to determine which table to populate
  • 173. 172 Only the Australian table is populated. The reaming tables were ignored because the condition of the expression on the conditional split did elevate to true
  • 174. 173
  • 175. 174 Australian Customer data All of the results are abridged 4
  • 176. 175
  • 177. 176 Canadian Customer All of the results are Abridged
  • 178. 177 American Customer All of the results are Abridged
  • 179. 178
  • 180. 179
  • 181. 180
  • 182. 181
  • 183. 182
  • 184. 183 Talend Open Studio 5.4
  • 185. 184 DimCurrency PK,I2 CurrencyKey INTEGER I1 CurrencyAlternateKey WCHAR(3) CurrencyName WCHAR(50) FactInternetSales PK,I1 SalesOrderNumber WCHAR(20) PK,I1 SalesOrderLineNumber UTINYINT FK6,U6 ProductKey INTEGER FK3,U5 OrderDateKey INTEGER FK4,U4 DueDateKey INTEGER FK5,U1 ShipDateKey INTEGER FK2,U3 CustomerKey INTEGER U7 PromotionKey INTEGER FK1,U2 CurrencyKey INTEGER FK7 SalesTerritoryKey INTEGER RevisionNumber UTINYINT OrderQuantity SMALLINT UnitPrice CURRENCY ExtendedAmount CURRENCY UnitPriceDiscountPct DOUBLE DiscountAmount DOUBLE ProductStandardCost CURRENCY TotalProductCost CURRENCY SalesAmount CURRENCY TaxAmt CURRENCY Freight CURRENCY CarrierTrackingNumber WCHAR(25) CustomerPONumber WCHAR(25) OrderDate TIMESTAMP DueDate TIMESTAMP ShipDate TIMESTAMP DimDate PK,I2 DateKey INTEGER I1 FullDateAlternateKey WCHAR(10) DayNumberOfWeek UTINYINT EnglishDayNameOfWeek WCHAR(10) SpanishDayNameOfWeek WCHAR(10) FrenchDayNameOfWeek WCHAR(10) DayNumberOfMonth UTINYINT DayNumberOfYear SMALLINT WeekNumberOfYear UTINYINT EnglishMonthName WCHAR(10) SpanishMonthName WCHAR(10) FrenchMonthName WCHAR(10) MonthNumberOfYear UTINYINT CalendarQuarter UTINYINT CalendarYear SMALLINT CalendarSemester UTINYINT FiscalQuarter UTINYINT FiscalYear SMALLINT FiscalSemester UTINYINT DimCustomer PK,I2 CustomerKey INTEGER FK1,U1 GeographyKey INTEGER I1 CustomerAlternateKey WCHAR(15) Title WCHAR(8) FirstName WCHAR(50) MiddleName WCHAR(50) LastName WCHAR(50) NameStyle BOOL BirthDate WCHAR(10) MaritalStatus WCHAR(1) Suffix WCHAR(10) Gender WCHAR(1) EmailAddress WCHAR(50) YearlyIncome CURRENCY TotalChildren UTINYINT NumberChildrenAtHome UTINYINT EnglishEducation WCHAR(40) SpanishEducation WCHAR(40) FrenchEducation WCHAR(40) EnglishOccupation WCHAR(100) SpanishOccupation WCHAR(100) FrenchOccupation WCHAR(100) HouseOwnerFlag WCHAR(1) NumberCarsOwned UTINYINT AddressLine1 WCHAR(120) AddressLine2 WCHAR(120) Phone WCHAR(20) DateFirstPurchase WCHAR(10) CommuteDistance WCHAR(15) DimProduct PK,I2 ProductKey INTEGER I1 ProductAlternateKey WCHAR(25) FK1,U1 ProductSubcategoryKey INTEGER WeightUnitMeasureCode WCHAR(3) SizeUnitMeasureCode WCHAR(3) EnglishProductName WCHAR(50) SpanishProductName WCHAR(50) FrenchProductName WCHAR(50) StandardCost CURRENCY FinishedGoodsFlag BOOL Color WCHAR(15) SafetyStockLevel SMALLINT ReorderPoint SMALLINT ListPrice CURRENCY Size WCHAR(50) SizeRange WCHAR(50) Weight DOUBLE DaysToManufacture INTEGER ProductLine WCHAR(2) DealerPrice CURRENCY Class WCHAR(2) Style WCHAR(2) ModelName WCHAR(50) LargePhoto BINARY(524287) EnglishDescription WCHAR(400) FrenchDescription WCHAR(400) ChineseDescription WCHAR(400) ArabicDescription WCHAR(400) HebrewDescription WCHAR(400) ThaiDescription WCHAR(400) GermanDescription WCHAR(400) JapaneseDescription WCHAR(400) TurkishDescription WCHAR(400) I1 StartDate TIMESTAMP EndDate TIMESTAMP Status WCHAR(7) DimProductCategory PK,I2 ProductCategoryKey INTEGER I1 ProductCategoryAlternateKey INTEGER EnglishProductCategoryName WCHAR(50) SpanishProductCategoryName WCHAR(50) FrenchProductCategoryName WCHAR(50) DimProductSubcategory PK,I2 ProductSubcategoryKey INTEGER I1 ProductSubcategoryAlternateKey INTEGER EnglishProductSubcategoryName WCHAR(50) SpanishProductSubcategoryName WCHAR(50) FrenchProductSubcategoryName WCHAR(50) FK1 ProductCategoryKey INTEGER DimGeography PK,I1 GeographyKey INTEGER City WCHAR(30) StateProvinceCode WCHAR(3) StateProvinceName WCHAR(50) CountryRegionCode WCHAR(3) EnglishCountryRegionName WCHAR(50) SpanishCountryRegionName WCHAR(50) FrenchCountryRegionName WCHAR(50) PostalCode WCHAR(15) FK1 SalesTerritoryKey INTEGER IpAddressLocator WCHAR(15) DimSalesTerritory PK,I2 SalesTerritoryKey INTEGER I1 SalesTerritoryAlternateKey INTEGER SalesTerritoryRegion WCHAR(50) SalesTerritoryCountry WCHAR(50) SalesTerritoryGroup WCHAR(50) SalesTerritoryImage BINARY(524287) AdventureWorksDW 2012
  • 186. 185
  • 187. 186 JDBC drivers have to be uploaded manually to make it easier to connect to different platform like Oracle, MYSQL, Sybase SQL Anywhere, Postgresql and DB2 . Talend allows ODBC to be utilized for connection instead of the traditional JDBC. I had worked with Java based applications like Oracle SQL Developer and JDeveloper, ODBC does not work well in these environments only if the option is available.
  • 188. 187
  • 189. 188
  • 190. 189
  • 191. 190
  • 192. 191
  • 193. 192
  • 194. 193
  • 195. 194
  • 196. 195
  • 197. 196
  • 198. 197
  • 199. 198 All of the results are Abridged
  • 200. 199
  • 201. 200
  • 202. 201
  • 203. 202
  • 204. 203
  • 205. 204
  • 206. 205
  • 207. 206
  • 208. 207
  • 209. 208 Results Abridged
  • 210. 209
  • 211. 210 Results Abridged
  • 212. 211
  • 213. 212
  • 214. 213
  • 215. 214
  • 216. 215
  • 217. 216
  • 218. 217 Results Abridged
  • 219. 218 Results Abridged
  • 220. 219
  • 221. 220
  • 222. 221
  • 223. 222
  • 224. 223 Results Abridged
  • 225. 224
  • 226. 225
  • 227. 226
  • 228. 227
  • 229. 228
  • 230. 229
  • 231. 230
  • 232. 231
  • 233. 232
  • 234. 233
  • 235. 234 Results Abridged
  • 236. 235
  • 237. 236
  • 238. 237
  • 239. 238 Results Abridged
  • 240. 239 Results Abridged
  • 241. 240 Results Abridged
  • 242. 241 Results Abridged
  • 243. 242 Results Abridged
  • 244. 243
  • 245. 244
  • 246. 245
  • 247. 246
  • 248. 247 Results Abridged
  • 249. 248
  • 250. 249
  • 251. 250
  • 252. 251
  • 253. 252 Results Abridged
  • 254. 253 Results Abridged
  • 255. 254 Results Abridged
  • 256. 255 Results Abridged
  • 257. 256
  • 258. 257
  • 259. 258
  • 260. 259
  • 261. 260
  • 262. 261 Results Abridged
  • 263. 262
  • 264. 263
  • 265. 264
  • 266. 265
  • 267. 266
  • 268. 267
  • 269. 268 Only the Excel files will be read and uploaded into the database
  • 270. 269
  • 271. 270 Results Abridged
  • 272. 271
  • 273. 272
  • 274. 273
  • 275. 274
  • 276. 275
  • 277. 276
  • 278. 277
  • 279. 278 Results Abridged Results Abridged
  • 280. 279 Results Abridged
  • 281. 280 Results Abridged
  • 282. 281
  • 283. 282
  • 284. 283 Results Abridged
  • 285. 284
  • 286. 285 Results Abridged
  • 287. 286
  • 288. 287
  • 289. 288 Results Abridged All the file names includes the countrycode passed through the context
  • 290. 289
  • 291. 290
  • 292. 291
  • 293. 292
  • 294. 293
  • 295. 294
  • 296. 295
  • 297. 296
  • 298. 297
  • 299. 298
  • 300. 299
  • 301. 300
  • 302. 301