Data Warehousing (DAY 3) Siwawong W. Project Manager 2010.05.26
Agenda Building Business Intelligence 09:30 – 10:00 09:00 – 09:15 Registration 09:15 – 09:30 Review 2 nd  Day class 10:00 – 10:30 Introduction to SSIS 10:30 – 10:45 Break & Morning Refreshment 10:45 – 12:00 SSIS Workshop & Exercise 12:00 – 13:00 Lunch Break 13:00 – 15:00 Introduction to SSAS 15:00 – 15:15 Break 15:15 – 16:00 SSAS Workshop & Exercise
2 nd  Day Review
Dare Warehouse: Architecture Review Loading (aka, ETL) Refreshing: When & How? Structure/Modeling: star vs snow-flake schema. Data Marts Query Processing Indexing: Bitmap vs Join Pre-Computed Aggregates SQL Extension
OLAP: Review ROLAP vs MOLAP Slicing & Dicing Successful vs Pitfalls
Building Business Intelligence With MS-SQL server 2005
What’s Business Intelligence (BI)? U sed in spotting, digging-out, and analyzing business data provide historical, current, and predictive views of business operations. Common functions of Business Intelligence technologies are  reporting online analytical processing  Analytics data mining etc. Refer from  http :// en . wikipedia . org / wiki / Business_intelligence
What’s Business Intelligence (BI)? Business intelligence (BI)  is more of a concept than a  single  technology. The goal is to gain insight into the business by bringing together data, formatting it in a way that enables better analysis, and then providing tools that give users power—not just to examine and explore the data, but to quickly understand it.   The above definition from  Business Intelligence with Microsoft Office PerformancePoint Server 2007
Related to Data Warehouse Data Warehouse = Business Intelligence To me DW and BI are the same thing.  A data warehouse is useless if users cannot access the data in an easy manor.  Therefore, for this presentation these two words are interchangeable.
Why Business Intelligence? Poor Visibility and Reaction to Market Events High Business and IT Operation Costs Poor Understanding of Customer Needs Inefficient Supply Chains and Demand Chains Poor Business Performance Management by Spreadsheets Data Privacy Concerns and Information Overload Compliance (BASEL, Sarbanes Oxley)  Today’s information TECHNOLOGY 80% of IS Budget to ‘business as usual’
What Products Are Involved? Database Technologies MS-SQL Server 2005 (Database Engine) MS-SQL Server Analysis Services (SSAS) MS-SQL Server Integration Services (SSIS) User Interface Technologies MS-SQL Server Reporting Services (SSRS) MS-SQL Server Management Studio (SSMS) Excel
OLAP Leadership http://www.olapreport.com
BI Platform Selection Requirements Focus on operational BI Extending the reach of BI Scope of functionality Scalability Availability Simplicity
MS-SQL SERVER 2005 BI Refer from:  http :// www . renaissance . co . il / ivbug / meeting74 / SQL%20Server2005%204%20VB%20group . ppt Analysis Services OLAP & Data Mining Data Transformation Services SQL Server Relational Engine Reporting Services Management Tools Development Tools
BI vs Users Analysts Information Consumers Information Explorers 5-10% of users 15-25% of users 65-80% of users Reporting Services Analysis Services
Microsoft BI Components SQL Server 2005 Management Studio SQL Server 2000 Enterprise Manager, Analysis Manager Database Management Tools SQL Server 2005 Business Intelligence Development Studio SQL Server 2000 Enterprise Manager, Analysis Manager, Query Manager… Database Development Tools SQL Server 2005 Report Builder Business Scorecard Manager Microsoft Office Products Microsoft Office Products Ad hoc Query and Analysis SQL Server 2005 Reporting Services SQL Server 2000 Reporting Services Managed Reporting SQL Server 2005 Analysis Services SQL Server 2000 Analysis Services Data Mining SQL Server 2005 Analysis Services SQL Server 2000 Analysis Services Multidimensional Database SQL relational database SQL relational database Relational Data Warehouse SQL Server 2005 Integration Services SQL Server 2000 Data Transformation Services (DTS) Extraction, Transformation and Loading SQL Server 2005 SQL Server 2000 Component
Business Intelligence Opportunity Data acquisition from source systems and integration Data transformation  and synthesis Data enrichment, with business logic, hierarchical views Data discovery via data mining Data presentation and distribution Data access for  the masses Integrate Analyze Report
Information Delivery Data Marts Data Analysis (OLAP, Data Mining) Familiar, Powerful BI Tools Client Portal Devices Enterprise ETL Third Party Applications Enterprise Reporting Performance Scorecard Interactive Reports Business Insights Data Warehouse Highly intuitive, visual tools…greater productivity from developers to users Comprehensive ability to integrate any data… improved data completeness  CRM LOB ERP Source Systems Tightly integrated “all-in-one” technology solution… increased manageability and the best economics
SQL Server Integration Services (SSIS)
MS-SQL2005: SSIS  Introduction to SSIS The Import and Export Wizard Creating A Package Components of Package Saving & Running Packages
Introduction to SSIS? A feature of SQL Server 2005 Latest incarnation of Data Transformation Services (DTS) Used to transform and move data into and out of files and databases SSIS is a p latform for building high performance data integration solutions, including extraction, transformation, and load  (ETL)  packages for data  warehousing SSIS provides a way to build packages made up of tasks that can move data around from place to place and alter it on the way
DTS today – a little history DTS - SQL Server 7.0 “ Visual BCP” – a useful utility DTS - SQL Server 2000 Easy (but slow) workflow & transform engine Customizable SSIS – SQL Server 2005 A completely new codebase Enterprise class ETL Exceptional BI integration – and more Rich APIs and extensibility
SSIS Components SSIS has two broad groups of components Server-Side Extensions to the DBMS that enable advanced SSIS tasks (simpler tasks are supported on most DBMSs via standard drivers and SQL commands) Extensions are “invisible” to the user/programmer unless they’re absent or incorrectly installed Client-Side Software components for both low-level and high-level integration tasks Interfaces to data sources (e.g., spreadsheets, text files, and various DBMSs) Interfaces to data destinations A toolkit (Visual Studio 2005 plus required components and templates) that enables users/programmers to combine components and interfaces to accomplish specific high-level tasks
SSIS Architecture
Development Environment Visual Studio (Business Intelligence Development Studio) aka: BIDS  Visual Studio 2005, .NET Framework 2.0 SQL Server Integration Services components need to be installed in both Visual Studio and SQL Server
Import & Export Wizard Though SSIS is almost infinitely customizable, Microsoft has produced a simple wizard to handle some of the most common ETL tasks: importing data to or exporting data from a SQL Server database. The Import and Export Wizard protects you from the complexity of SSIS while allowing you to move data between any of these data sources: SQL Server databases Flat files Microsoft Access databases Microsoft Excel worksheets Other OLE DB providers
Demo Import & Export Wizard  New DB (1) Start on New DB (2) Selected Data Source (Input) Select “AdventureWorks” (3) Selected Destination (Output) Default destination is selected DB
Demo Import & Export Wizard (Cont’) (4) Select Option for import HumanResources.Department HumanResources.Employee HumanResources.EmployeeAddress  HumanResources.EmployeeDepartmentHistory HumanResources.EmployeePayHistory (5) Selected tables (6) Edit mapping columns
Demo Import & Export Wizard (Cont’) (7) Save & Excuse (8) Confirm before start (9) Result Status
Creating A Package The Import and Export Wizard is easy to use, but it only taps a small part of the functionality of SSIS. To really appreciate the full power of SSIS, you’ll need to use BIDS to build an SSIS  package Package is a collection of SSIS objects including: Connections  to data sources. Data flows , which include the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations. Control flows , which include tasks and containers that execute when the package runs. You can organize tasks in sequences and in loops. Event handlers , which are workflows that runs in response to the events raised by a package, task, or container.
Demo: Creating A Package  (1) Click here (2) File    New    Project… (3) BI Projects    Integration Services Project Selected Here
Components of Package  Main panel Control flows Data flows Event handlers Variables, expressions, package configurations Connection managers
Connections Practically any data source can be used Data Source Views Allows multiple sources to be viewed as a single logical source Reusable and can use friendly names Static – underlying structure is not updated
Working with Connection Managers Control and data flow objects may require a connection manager Various types (OLE DB, flat file, ADO, Excel, OLE DB, FTP) Available properties depend on type All have a connection string Browse to create a connection string
Demo for Connection Manager 1) Right click in this area 2) Select OLE DB Connection 3) Show Existing Connection 4) Create new connection 5) Select SQL Native Client provider 6) Configuration for link to new DB 6) Test Connection for confirm
Demo for Connection Manager 7) Finally, we get new connection. Select and press OK 8) Then, we get new connection 9) Create New data source is “ Flat File ” (we use department.txt for sample) 11) Select department.txt (sample) 12) Check it! 10) Input name & information
Demo for Connection Manager 13) Click on Column 14) Verify data in text file 15)  Click on Advance icon for go to Advance Page
Demo for Connection Manager 16) Click on New button 17) Display new column 18) Change column name to DepartmentName 19) Click on OK button 20) New Connection appear
Demo for Connection Manager 21) Right-Click on DepartmentList (new Connection) 23) Select on new connection and check in Properties panel 22) Select copy & paste on connection manager 24) Change values: - Change the Name property to  DepartmentListBackup .  Change the ConnectionString property to  C:\DepartmentsBackup.txt . 25) Finally, we get..
Building Control Flows File system tasks (copy, delete, rename files, FTP) Execute tasks (SQL Stored Procedure, Windows process task , SSIS package task) Control structures (For loop container, for each loop container) Data flow tasks
Control Flow Objects Tasks   are things that SSIS can do, such as execute SQL statements or transfer objects from one SQL Server to another.  Maintenance Plan tasks   are a special group of tasks that handle jobs such as checking database integrity and rebuilding indexes.  The  Data Flow Task   is a general purpose task for ETL (extract, transform, and load) operations on data. There’s a separate design tab for building the details of a Data Flow Task. Containers   are objects that can hold a group of tasks.
SSIS Container For Loop: Repeat a task a fixed number of times Foreach : Repeat a task by enumerating over a group of objects Sequence: Group multiple tasks into a single unit for easier management
View - Control Flow Process steps or events Steps are linked together by precedence constraints Value – success (green), failure (red), completion (blue) Evaluation operation – constraint, expression Multiple constraints - logical AND, OR
Demo for Building Control Flow 1) Pin the tool bar 2) Select File System Task 3) Drag to area of Control Flow 4) Drag to area of Control Flow 5) Make Green arrow connect
Demo for Building Control Flow 6) Double-click on link. 7) Change the Value from Success to Completion, because you want the Data Flow Task to execute whether the File System Task succeeds or not. 8) Click OK
Demo for Building Control Flow 9) Double-Click on this task 10) Change Property Set the Source property to DepartmentList.  Set the Destination property to DepartmentListBackup. Set the OverwriteDestinationFile property to True. 11) Click on OK button
Demo for Building Control Flow 12) Run it!!! All are Green! But it’s warning in output dialog Overall status Check result in folder, new file has created!
Building the Data Flows Data Flow Sources & Destinations Data Flow Transformations
Data Flow Sources & Destinations Data is strongly typed Editor – connection manager, columns, error handling Advanced editor – column mappings, data type properties, error handling Input and output column mappings are generated automatically Error handling can be defined for individual column and error type Errors can be directed to specific output files
Data Flow Transformations Derived Column Transformation Editor useful for adding new columns or changing data type Original and derived columns Expression builder
Transformations/Effected Aggregate:  Aggregates data from transform or source. Character Map:  This transformation makes string data changes for you, such as changing data from lowercase to uppercase. Data Conversion:  Converts a column's data type to another data type. Data Mining Query :  Performs a data-mining query against Analysis Services. Fuzzy Grouping :  Performs data cleansing by finding rows that are likely duplicates. Fuzzy Lookup :  Matches and standardizes data based on fuzzy logic. For example, this can transform the name Jon to John. Merge Join:  Merges two data sets into a single data set using a join function. OLE DB Command:  Executes an OLE DB command for each row in the data flow. etc.
Demo for Building the Data Flows 1) Select Data Flow tab 2) Select Data Flow task 3) Drag & Drop OLE Source 4) Drag & Drop Character Map 5) Drag & Drop File Destination 6) Drag green Arrow to below box
Demo for Building the Data Flows 7) Double-click on OLEDB source 8) Select table/views: HumanResources.Department  9) Click on OK button
Demo for Building the Data Flows 10) Double-click on Character Map 11) Select Name column 12) Change to in-place change 13) Change to Upper Case    Then press OK button
Demo for Building the Data Flows 14) Double-click on Flat File Destination 15) Select the DepartmentList Flat File Connection Manager 16) Drag Name column to DepartmentName column    Then press OK button
Demo for Building the Data Flows 17) Run it!!! All are Green!  Overall status Result: transform one of the columns in that table to all uppercase characters, and then write that transformed column out to a flat file.
Creating Event Handlers Event handler tasks can be defined for each executable Events include  OnPostvalidate ,  OnTaskFailed ,  OnVariableValueChanged An error handler is itself a control flow and can include multiple steps
Demo for Creating Event Handlers Create new table on our Test DB CREATE TABLE DepartmentExports( ExportID int IDENTITY(1,1) NOT NULL, ExportTime datetime NOT NULL CONSTRAINT DF_DepartmentExports_ExportTime DEFAULT(GETDATE()), CONSTRAINT PK_DepartmentExports PRIMARY KEY CLUSTERED (   ExportID ASC ) ) 2) Click on Event Handler 3) Click on Data Flow Task 4) Select OnPostExecute 5) Click on this link
Demo for Creating Event Handlers 1) Click on Execute SQL task 2) Drag and drop in this area 3) Double-click on this object 4) Change ConnectType = OLEDB 5) Select OLEDB connection 6) Input SQL statement INSERT INTO DepartmentExports (ExportTime) VALUES (GETDATE()) 7) Click on OK button
Demo for Creating Event Handlers 8) Run it!!! (Please delete the existing files and use original files before run) All are Green!  Overall status Check result after running the package
Execution Results (in BIDS) See Progress tab  (after running) Progress is OK Warning message
Saving and Running Packages When you work in BIDS, your  SSIS package  is saved as an XML file (with the extension  dtsx ) directly in the normal Windows file system Storing  SSIS packages  in the  Package Store  or the  msdb  database makes it easier to access and manage them from SQL Server’s administrative and command-line tools without needing to have any knowledge of the physical layout of the server’s hard drive.
Package Features in SQL Server Packages must be imported and re-imported if changed Even though they are displayed when copied to the package directory Runtime Options Packages lack configuration files – these can be added at run time Command files can be specified Connection manager connection strings Other options
Demo for Saving Packages (1) Save current package as… (2) Input information as below picture:  (3) Specifield path as  /File System/ExportDepartments (4) Click on OK button
Demo for Running Packages (1) Select Connect    Integration Services (2) Input login information and press Connect. (3) Expand & Find store package (4) Select Run Package
Demo for Running Packages (5) Click on Execute (6) Display result on pop-up window
SSIS: Exercise One common use of SSIS is in data warehousing - collecting data from a variety of different sources into a single database that can be used for unified reporting. In this exercise you’ll use SSIS to perform a simple data warehousing task. Use SSIS to create a text file,  EmployeeList.txt , containing the last names and network logins of the  AdventureWorks  employees. Retrieve the last names from the  Person.Contact  table in the  AdventureWorks  database.  Retrieve the logins from the  HumanResources.Employee  table in the Sample database. You can use the  Merge Join  data flow transformation to join data from two sources. One tip: the inputs to this transformation need to be sorted on the joining column.
SQL Server Analysis Services (SSAS)
MS-SQL2005: SSAS Introduction MS-SQL2005 SSAS Understanding Analysis Services Creating a Data Cube Exploring a Data Cube
Introduction to MS-SQL2005 SSAS
Analysis Services Why OLAP and Data Mining Matter Powerful business information modeling Cross platform data integration Integrated Relational & OLAP views The best of MOLAP to ROLAP Data enrichment and advanced analytics Key Performance Indicators and Perspectives Real-time, high performance Real-time data in OLAP Cubes Very fast and flexible analytics XML standards for Data Access and Web Services integration Cost and time savings for customers integrating with other systems
Analysis Services High-level Architecture Dashboards Rich Reports BI Front Ends Spreadsheets Ad Hoc Reports Analysis Services Cache SQL  Server Teradata Oracle DB2 LOB DW Datamart XML/A or ODBO UDM
Business Intelligence Enhancements Auto generation of time and other dimensions based on type KPIs, MDX scripts, translations, currency… Data Mining 10 Mining Algorithms Smart applications XML standards for Data Access & Web services integration $$ saving for customers integrating our solution with other systems Unified Dimensional Model Powerful business information modeling Cross platform data integration Integrated Relational & OLAP views KPIs & Perspectives Proactive caching Real-time data in OLAP Cubes Very fast and flexible analytics SQL Server Analysis Services New Paradigm for the Analytics Platform
Understanding Analysis Services Cube Dimension table Dimension Level Fact table Measure Schema
Cube A collection of data that’s been aggregated to allow queries to return data quickly Cubes are ordered into  dimensions   and  measures . Dimensions come from  dimension tables , while measures come from  fact tables .
Dimension table & Dimension Dimension Table : Contains hierarchical data by which you’d like to summarize Dimension :  Each cube has one or more  dimensions , each based on one or more dimension tables. A dimension represents a category for analyzing business data. Typically, a dimension has a natural hierarchy so that lower results can be “ rolled up ” into higher results
Level, Fact Table, Measure & Schema Each type of summary that can be retrieved from a single dimension is called a  level . Fact Table : contains the basic information that you wish to summarize. Every cube will contain one or more  measures , each based on a column in a fact table that you’d like to analyze e.g. Unit Sales or Profit Schema : given that you use the dimension tables to group information from the fact table
Creating a Data Cube (1) File    New    Project… (2) Select “Analysis Services Project”
Creating a Data Cube (3) From new solution, right click on Data Sources (4) Create New Connection for AdventureWorksDW (5) Select Default impersonation information to use the credentials you just supplied for the connection and click Next.
Create Data Source View (1) From new solution, right click on Data Sources View (3) Assign Name “Finance” and click on Finish (2) Select Data Source    Select Tables
Create Data Source View (4) View Schema of new data source Questions What’s kind of schema? What’s fact/dimension tables?
Invoking the Cube Wizard (2) Select option as shown in below pictures and click Next button (1) Right-Click on Cubs    New Cube… (3) Select Data source and click Next
Invoking the Cube Wizard (4) Wait until Cube Processing finish, then click on Next button (5) Identify Dimension & Fact Select table “DimTime” Select Fact & Dimension table (as shown in picture) Click on Next button
Invoking the Cube Wizard (6) Select Time Periods (7) Accept default measure (8) Wait until Cube Wizard detected hierarchy Click on Next button
Invoking the Cube Wizard (9) Accept default Dimension (10) Assign Cube Name Click on Finish button
Deploying and Processing a Cube (1) Build    Deploy (2) Waiting while processing One of the tradeoffs of cubes is that SQL Server does not attempt to keep your OLAP cube data synchronized with the OLTP data that serves as its source. As you add, remove, and update rows in the underlying OLTP database, the cube will get out of date.    Re-Run Process
Exploring a Data Cube Click on Browser Drop a measure in the Totals/Detail area to see the aggregated data for that measure. Drop a dimension or level in the Row Fields area to summarize by that level or dimension on rows.  Drop a dimension or level in the Column Fields area to summarize by that level or dimension on columns  Drop a dimension or level in the Filter Fields area to enable filtering by members of that dimension or level. Use the controls at the top of the report area to select additional filtering expressions Same as  PivotTable in Excel
Exploring a Data Cube Drag into Total/Detail Area Drag into Row Field Area Define hierachy from Dim Time Calendar Year-Calendar Quarter-Month Number of Year Filter Area: Scenario Name
SSAS: Exercise Create a data cube, based on the data in the  AdventureWorksDW  sample database, to answer the following question: what were the internet sales by country and product name for married customers only?
References/External Links (1)  SSIS Tutorial: SQL Server 2005 Integration Services Tutorial   http:// www.accelebrate.com/sql_training/ssis_tutorial.htm   (2)  SSAS Tutorial: SQL Server 2005 Analysis Services Tutorial   http:// www.accelebrate.com/sql_training/ssas_tutorial.htm   (3)  MS SQL Server  Data Transformation Services & Integration Services Chris Riley March 29, 2007 (4) Intro to SQL Server Integration Services & SQL Agent Jobs Stu Yarfitz Analyst/Programmer, Collaborative Data Services. FHCRC 6/18/2008
Thank you for your attention! [email_address] www.blueballgroup.com

It ready dw_day3_rev00

  • 1.
    Data Warehousing (DAY3) Siwawong W. Project Manager 2010.05.26
  • 2.
    Agenda Building BusinessIntelligence 09:30 – 10:00 09:00 – 09:15 Registration 09:15 – 09:30 Review 2 nd Day class 10:00 – 10:30 Introduction to SSIS 10:30 – 10:45 Break & Morning Refreshment 10:45 – 12:00 SSIS Workshop & Exercise 12:00 – 13:00 Lunch Break 13:00 – 15:00 Introduction to SSAS 15:00 – 15:15 Break 15:15 – 16:00 SSAS Workshop & Exercise
  • 3.
    2 nd Day Review
  • 4.
    Dare Warehouse: ArchitectureReview Loading (aka, ETL) Refreshing: When & How? Structure/Modeling: star vs snow-flake schema. Data Marts Query Processing Indexing: Bitmap vs Join Pre-Computed Aggregates SQL Extension
  • 5.
    OLAP: Review ROLAPvs MOLAP Slicing & Dicing Successful vs Pitfalls
  • 6.
    Building Business IntelligenceWith MS-SQL server 2005
  • 7.
    What’s Business Intelligence(BI)? U sed in spotting, digging-out, and analyzing business data provide historical, current, and predictive views of business operations. Common functions of Business Intelligence technologies are reporting online analytical processing  Analytics data mining etc. Refer from http :// en . wikipedia . org / wiki / Business_intelligence
  • 8.
    What’s Business Intelligence(BI)? Business intelligence (BI) is more of a concept than a single technology. The goal is to gain insight into the business by bringing together data, formatting it in a way that enables better analysis, and then providing tools that give users power—not just to examine and explore the data, but to quickly understand it. The above definition from Business Intelligence with Microsoft Office PerformancePoint Server 2007
  • 9.
    Related to DataWarehouse Data Warehouse = Business Intelligence To me DW and BI are the same thing. A data warehouse is useless if users cannot access the data in an easy manor. Therefore, for this presentation these two words are interchangeable.
  • 10.
    Why Business Intelligence?Poor Visibility and Reaction to Market Events High Business and IT Operation Costs Poor Understanding of Customer Needs Inefficient Supply Chains and Demand Chains Poor Business Performance Management by Spreadsheets Data Privacy Concerns and Information Overload Compliance (BASEL, Sarbanes Oxley) Today’s information TECHNOLOGY 80% of IS Budget to ‘business as usual’
  • 11.
    What Products AreInvolved? Database Technologies MS-SQL Server 2005 (Database Engine) MS-SQL Server Analysis Services (SSAS) MS-SQL Server Integration Services (SSIS) User Interface Technologies MS-SQL Server Reporting Services (SSRS) MS-SQL Server Management Studio (SSMS) Excel
  • 12.
  • 13.
    BI Platform SelectionRequirements Focus on operational BI Extending the reach of BI Scope of functionality Scalability Availability Simplicity
  • 14.
    MS-SQL SERVER 2005BI Refer from: http :// www . renaissance . co . il / ivbug / meeting74 / SQL%20Server2005%204%20VB%20group . ppt Analysis Services OLAP & Data Mining Data Transformation Services SQL Server Relational Engine Reporting Services Management Tools Development Tools
  • 15.
    BI vs UsersAnalysts Information Consumers Information Explorers 5-10% of users 15-25% of users 65-80% of users Reporting Services Analysis Services
  • 16.
    Microsoft BI ComponentsSQL Server 2005 Management Studio SQL Server 2000 Enterprise Manager, Analysis Manager Database Management Tools SQL Server 2005 Business Intelligence Development Studio SQL Server 2000 Enterprise Manager, Analysis Manager, Query Manager… Database Development Tools SQL Server 2005 Report Builder Business Scorecard Manager Microsoft Office Products Microsoft Office Products Ad hoc Query and Analysis SQL Server 2005 Reporting Services SQL Server 2000 Reporting Services Managed Reporting SQL Server 2005 Analysis Services SQL Server 2000 Analysis Services Data Mining SQL Server 2005 Analysis Services SQL Server 2000 Analysis Services Multidimensional Database SQL relational database SQL relational database Relational Data Warehouse SQL Server 2005 Integration Services SQL Server 2000 Data Transformation Services (DTS) Extraction, Transformation and Loading SQL Server 2005 SQL Server 2000 Component
  • 17.
    Business Intelligence OpportunityData acquisition from source systems and integration Data transformation and synthesis Data enrichment, with business logic, hierarchical views Data discovery via data mining Data presentation and distribution Data access for the masses Integrate Analyze Report
  • 18.
    Information Delivery DataMarts Data Analysis (OLAP, Data Mining) Familiar, Powerful BI Tools Client Portal Devices Enterprise ETL Third Party Applications Enterprise Reporting Performance Scorecard Interactive Reports Business Insights Data Warehouse Highly intuitive, visual tools…greater productivity from developers to users Comprehensive ability to integrate any data… improved data completeness CRM LOB ERP Source Systems Tightly integrated “all-in-one” technology solution… increased manageability and the best economics
  • 19.
    SQL Server IntegrationServices (SSIS)
  • 20.
    MS-SQL2005: SSIS Introduction to SSIS The Import and Export Wizard Creating A Package Components of Package Saving & Running Packages
  • 21.
    Introduction to SSIS?A feature of SQL Server 2005 Latest incarnation of Data Transformation Services (DTS) Used to transform and move data into and out of files and databases SSIS is a p latform for building high performance data integration solutions, including extraction, transformation, and load (ETL) packages for data warehousing SSIS provides a way to build packages made up of tasks that can move data around from place to place and alter it on the way
  • 22.
    DTS today –a little history DTS - SQL Server 7.0 “ Visual BCP” – a useful utility DTS - SQL Server 2000 Easy (but slow) workflow & transform engine Customizable SSIS – SQL Server 2005 A completely new codebase Enterprise class ETL Exceptional BI integration – and more Rich APIs and extensibility
  • 23.
    SSIS Components SSIShas two broad groups of components Server-Side Extensions to the DBMS that enable advanced SSIS tasks (simpler tasks are supported on most DBMSs via standard drivers and SQL commands) Extensions are “invisible” to the user/programmer unless they’re absent or incorrectly installed Client-Side Software components for both low-level and high-level integration tasks Interfaces to data sources (e.g., spreadsheets, text files, and various DBMSs) Interfaces to data destinations A toolkit (Visual Studio 2005 plus required components and templates) that enables users/programmers to combine components and interfaces to accomplish specific high-level tasks
  • 24.
  • 25.
    Development Environment VisualStudio (Business Intelligence Development Studio) aka: BIDS Visual Studio 2005, .NET Framework 2.0 SQL Server Integration Services components need to be installed in both Visual Studio and SQL Server
  • 26.
    Import & ExportWizard Though SSIS is almost infinitely customizable, Microsoft has produced a simple wizard to handle some of the most common ETL tasks: importing data to or exporting data from a SQL Server database. The Import and Export Wizard protects you from the complexity of SSIS while allowing you to move data between any of these data sources: SQL Server databases Flat files Microsoft Access databases Microsoft Excel worksheets Other OLE DB providers
  • 27.
    Demo Import &Export Wizard New DB (1) Start on New DB (2) Selected Data Source (Input) Select “AdventureWorks” (3) Selected Destination (Output) Default destination is selected DB
  • 28.
    Demo Import &Export Wizard (Cont’) (4) Select Option for import HumanResources.Department HumanResources.Employee HumanResources.EmployeeAddress HumanResources.EmployeeDepartmentHistory HumanResources.EmployeePayHistory (5) Selected tables (6) Edit mapping columns
  • 29.
    Demo Import &Export Wizard (Cont’) (7) Save & Excuse (8) Confirm before start (9) Result Status
  • 30.
    Creating A PackageThe Import and Export Wizard is easy to use, but it only taps a small part of the functionality of SSIS. To really appreciate the full power of SSIS, you’ll need to use BIDS to build an SSIS package Package is a collection of SSIS objects including: Connections to data sources. Data flows , which include the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations. Control flows , which include tasks and containers that execute when the package runs. You can organize tasks in sequences and in loops. Event handlers , which are workflows that runs in response to the events raised by a package, task, or container.
  • 31.
    Demo: Creating APackage (1) Click here (2) File  New  Project… (3) BI Projects  Integration Services Project Selected Here
  • 32.
    Components of Package Main panel Control flows Data flows Event handlers Variables, expressions, package configurations Connection managers
  • 33.
    Connections Practically anydata source can be used Data Source Views Allows multiple sources to be viewed as a single logical source Reusable and can use friendly names Static – underlying structure is not updated
  • 34.
    Working with ConnectionManagers Control and data flow objects may require a connection manager Various types (OLE DB, flat file, ADO, Excel, OLE DB, FTP) Available properties depend on type All have a connection string Browse to create a connection string
  • 35.
    Demo for ConnectionManager 1) Right click in this area 2) Select OLE DB Connection 3) Show Existing Connection 4) Create new connection 5) Select SQL Native Client provider 6) Configuration for link to new DB 6) Test Connection for confirm
  • 36.
    Demo for ConnectionManager 7) Finally, we get new connection. Select and press OK 8) Then, we get new connection 9) Create New data source is “ Flat File ” (we use department.txt for sample) 11) Select department.txt (sample) 12) Check it! 10) Input name & information
  • 37.
    Demo for ConnectionManager 13) Click on Column 14) Verify data in text file 15) Click on Advance icon for go to Advance Page
  • 38.
    Demo for ConnectionManager 16) Click on New button 17) Display new column 18) Change column name to DepartmentName 19) Click on OK button 20) New Connection appear
  • 39.
    Demo for ConnectionManager 21) Right-Click on DepartmentList (new Connection) 23) Select on new connection and check in Properties panel 22) Select copy & paste on connection manager 24) Change values: - Change the Name property to DepartmentListBackup . Change the ConnectionString property to C:\DepartmentsBackup.txt . 25) Finally, we get..
  • 40.
    Building Control FlowsFile system tasks (copy, delete, rename files, FTP) Execute tasks (SQL Stored Procedure, Windows process task , SSIS package task) Control structures (For loop container, for each loop container) Data flow tasks
  • 41.
    Control Flow ObjectsTasks are things that SSIS can do, such as execute SQL statements or transfer objects from one SQL Server to another. Maintenance Plan tasks are a special group of tasks that handle jobs such as checking database integrity and rebuilding indexes. The Data Flow Task is a general purpose task for ETL (extract, transform, and load) operations on data. There’s a separate design tab for building the details of a Data Flow Task. Containers are objects that can hold a group of tasks.
  • 42.
    SSIS Container ForLoop: Repeat a task a fixed number of times Foreach : Repeat a task by enumerating over a group of objects Sequence: Group multiple tasks into a single unit for easier management
  • 43.
    View - ControlFlow Process steps or events Steps are linked together by precedence constraints Value – success (green), failure (red), completion (blue) Evaluation operation – constraint, expression Multiple constraints - logical AND, OR
  • 44.
    Demo for BuildingControl Flow 1) Pin the tool bar 2) Select File System Task 3) Drag to area of Control Flow 4) Drag to area of Control Flow 5) Make Green arrow connect
  • 45.
    Demo for BuildingControl Flow 6) Double-click on link. 7) Change the Value from Success to Completion, because you want the Data Flow Task to execute whether the File System Task succeeds or not. 8) Click OK
  • 46.
    Demo for BuildingControl Flow 9) Double-Click on this task 10) Change Property Set the Source property to DepartmentList. Set the Destination property to DepartmentListBackup. Set the OverwriteDestinationFile property to True. 11) Click on OK button
  • 47.
    Demo for BuildingControl Flow 12) Run it!!! All are Green! But it’s warning in output dialog Overall status Check result in folder, new file has created!
  • 48.
    Building the DataFlows Data Flow Sources & Destinations Data Flow Transformations
  • 49.
    Data Flow Sources& Destinations Data is strongly typed Editor – connection manager, columns, error handling Advanced editor – column mappings, data type properties, error handling Input and output column mappings are generated automatically Error handling can be defined for individual column and error type Errors can be directed to specific output files
  • 50.
    Data Flow TransformationsDerived Column Transformation Editor useful for adding new columns or changing data type Original and derived columns Expression builder
  • 51.
    Transformations/Effected Aggregate: Aggregates data from transform or source. Character Map: This transformation makes string data changes for you, such as changing data from lowercase to uppercase. Data Conversion: Converts a column's data type to another data type. Data Mining Query : Performs a data-mining query against Analysis Services. Fuzzy Grouping : Performs data cleansing by finding rows that are likely duplicates. Fuzzy Lookup : Matches and standardizes data based on fuzzy logic. For example, this can transform the name Jon to John. Merge Join: Merges two data sets into a single data set using a join function. OLE DB Command: Executes an OLE DB command for each row in the data flow. etc.
  • 52.
    Demo for Buildingthe Data Flows 1) Select Data Flow tab 2) Select Data Flow task 3) Drag & Drop OLE Source 4) Drag & Drop Character Map 5) Drag & Drop File Destination 6) Drag green Arrow to below box
  • 53.
    Demo for Buildingthe Data Flows 7) Double-click on OLEDB source 8) Select table/views: HumanResources.Department 9) Click on OK button
  • 54.
    Demo for Buildingthe Data Flows 10) Double-click on Character Map 11) Select Name column 12) Change to in-place change 13) Change to Upper Case  Then press OK button
  • 55.
    Demo for Buildingthe Data Flows 14) Double-click on Flat File Destination 15) Select the DepartmentList Flat File Connection Manager 16) Drag Name column to DepartmentName column  Then press OK button
  • 56.
    Demo for Buildingthe Data Flows 17) Run it!!! All are Green! Overall status Result: transform one of the columns in that table to all uppercase characters, and then write that transformed column out to a flat file.
  • 57.
    Creating Event HandlersEvent handler tasks can be defined for each executable Events include OnPostvalidate , OnTaskFailed , OnVariableValueChanged An error handler is itself a control flow and can include multiple steps
  • 58.
    Demo for CreatingEvent Handlers Create new table on our Test DB CREATE TABLE DepartmentExports( ExportID int IDENTITY(1,1) NOT NULL, ExportTime datetime NOT NULL CONSTRAINT DF_DepartmentExports_ExportTime DEFAULT(GETDATE()), CONSTRAINT PK_DepartmentExports PRIMARY KEY CLUSTERED ( ExportID ASC ) ) 2) Click on Event Handler 3) Click on Data Flow Task 4) Select OnPostExecute 5) Click on this link
  • 59.
    Demo for CreatingEvent Handlers 1) Click on Execute SQL task 2) Drag and drop in this area 3) Double-click on this object 4) Change ConnectType = OLEDB 5) Select OLEDB connection 6) Input SQL statement INSERT INTO DepartmentExports (ExportTime) VALUES (GETDATE()) 7) Click on OK button
  • 60.
    Demo for CreatingEvent Handlers 8) Run it!!! (Please delete the existing files and use original files before run) All are Green! Overall status Check result after running the package
  • 61.
    Execution Results (inBIDS) See Progress tab (after running) Progress is OK Warning message
  • 62.
    Saving and RunningPackages When you work in BIDS, your SSIS package is saved as an XML file (with the extension dtsx ) directly in the normal Windows file system Storing SSIS packages in the Package Store or the msdb database makes it easier to access and manage them from SQL Server’s administrative and command-line tools without needing to have any knowledge of the physical layout of the server’s hard drive.
  • 63.
    Package Features inSQL Server Packages must be imported and re-imported if changed Even though they are displayed when copied to the package directory Runtime Options Packages lack configuration files – these can be added at run time Command files can be specified Connection manager connection strings Other options
  • 64.
    Demo for SavingPackages (1) Save current package as… (2) Input information as below picture: (3) Specifield path as /File System/ExportDepartments (4) Click on OK button
  • 65.
    Demo for RunningPackages (1) Select Connect  Integration Services (2) Input login information and press Connect. (3) Expand & Find store package (4) Select Run Package
  • 66.
    Demo for RunningPackages (5) Click on Execute (6) Display result on pop-up window
  • 67.
    SSIS: Exercise Onecommon use of SSIS is in data warehousing - collecting data from a variety of different sources into a single database that can be used for unified reporting. In this exercise you’ll use SSIS to perform a simple data warehousing task. Use SSIS to create a text file, EmployeeList.txt , containing the last names and network logins of the AdventureWorks employees. Retrieve the last names from the Person.Contact table in the AdventureWorks database. Retrieve the logins from the HumanResources.Employee table in the Sample database. You can use the Merge Join data flow transformation to join data from two sources. One tip: the inputs to this transformation need to be sorted on the joining column.
  • 68.
    SQL Server AnalysisServices (SSAS)
  • 69.
    MS-SQL2005: SSAS IntroductionMS-SQL2005 SSAS Understanding Analysis Services Creating a Data Cube Exploring a Data Cube
  • 70.
  • 71.
    Analysis Services WhyOLAP and Data Mining Matter Powerful business information modeling Cross platform data integration Integrated Relational & OLAP views The best of MOLAP to ROLAP Data enrichment and advanced analytics Key Performance Indicators and Perspectives Real-time, high performance Real-time data in OLAP Cubes Very fast and flexible analytics XML standards for Data Access and Web Services integration Cost and time savings for customers integrating with other systems
  • 72.
    Analysis Services High-level ArchitectureDashboards Rich Reports BI Front Ends Spreadsheets Ad Hoc Reports Analysis Services Cache SQL Server Teradata Oracle DB2 LOB DW Datamart XML/A or ODBO UDM
  • 73.
    Business Intelligence EnhancementsAuto generation of time and other dimensions based on type KPIs, MDX scripts, translations, currency… Data Mining 10 Mining Algorithms Smart applications XML standards for Data Access & Web services integration $$ saving for customers integrating our solution with other systems Unified Dimensional Model Powerful business information modeling Cross platform data integration Integrated Relational & OLAP views KPIs & Perspectives Proactive caching Real-time data in OLAP Cubes Very fast and flexible analytics SQL Server Analysis Services New Paradigm for the Analytics Platform
  • 74.
    Understanding Analysis ServicesCube Dimension table Dimension Level Fact table Measure Schema
  • 75.
    Cube A collectionof data that’s been aggregated to allow queries to return data quickly Cubes are ordered into dimensions and measures . Dimensions come from dimension tables , while measures come from fact tables .
  • 76.
    Dimension table &Dimension Dimension Table : Contains hierarchical data by which you’d like to summarize Dimension : Each cube has one or more dimensions , each based on one or more dimension tables. A dimension represents a category for analyzing business data. Typically, a dimension has a natural hierarchy so that lower results can be “ rolled up ” into higher results
  • 77.
    Level, Fact Table,Measure & Schema Each type of summary that can be retrieved from a single dimension is called a level . Fact Table : contains the basic information that you wish to summarize. Every cube will contain one or more measures , each based on a column in a fact table that you’d like to analyze e.g. Unit Sales or Profit Schema : given that you use the dimension tables to group information from the fact table
  • 78.
    Creating a DataCube (1) File  New  Project… (2) Select “Analysis Services Project”
  • 79.
    Creating a DataCube (3) From new solution, right click on Data Sources (4) Create New Connection for AdventureWorksDW (5) Select Default impersonation information to use the credentials you just supplied for the connection and click Next.
  • 80.
    Create Data SourceView (1) From new solution, right click on Data Sources View (3) Assign Name “Finance” and click on Finish (2) Select Data Source  Select Tables
  • 81.
    Create Data SourceView (4) View Schema of new data source Questions What’s kind of schema? What’s fact/dimension tables?
  • 82.
    Invoking the CubeWizard (2) Select option as shown in below pictures and click Next button (1) Right-Click on Cubs  New Cube… (3) Select Data source and click Next
  • 83.
    Invoking the CubeWizard (4) Wait until Cube Processing finish, then click on Next button (5) Identify Dimension & Fact Select table “DimTime” Select Fact & Dimension table (as shown in picture) Click on Next button
  • 84.
    Invoking the CubeWizard (6) Select Time Periods (7) Accept default measure (8) Wait until Cube Wizard detected hierarchy Click on Next button
  • 85.
    Invoking the CubeWizard (9) Accept default Dimension (10) Assign Cube Name Click on Finish button
  • 86.
    Deploying and Processinga Cube (1) Build  Deploy (2) Waiting while processing One of the tradeoffs of cubes is that SQL Server does not attempt to keep your OLAP cube data synchronized with the OLTP data that serves as its source. As you add, remove, and update rows in the underlying OLTP database, the cube will get out of date.  Re-Run Process
  • 87.
    Exploring a DataCube Click on Browser Drop a measure in the Totals/Detail area to see the aggregated data for that measure. Drop a dimension or level in the Row Fields area to summarize by that level or dimension on rows. Drop a dimension or level in the Column Fields area to summarize by that level or dimension on columns Drop a dimension or level in the Filter Fields area to enable filtering by members of that dimension or level. Use the controls at the top of the report area to select additional filtering expressions Same as PivotTable in Excel
  • 88.
    Exploring a DataCube Drag into Total/Detail Area Drag into Row Field Area Define hierachy from Dim Time Calendar Year-Calendar Quarter-Month Number of Year Filter Area: Scenario Name
  • 89.
    SSAS: Exercise Createa data cube, based on the data in the AdventureWorksDW sample database, to answer the following question: what were the internet sales by country and product name for married customers only?
  • 90.
    References/External Links (1) SSIS Tutorial: SQL Server 2005 Integration Services Tutorial http:// www.accelebrate.com/sql_training/ssis_tutorial.htm (2) SSAS Tutorial: SQL Server 2005 Analysis Services Tutorial http:// www.accelebrate.com/sql_training/ssas_tutorial.htm (3) MS SQL Server Data Transformation Services & Integration Services Chris Riley March 29, 2007 (4) Intro to SQL Server Integration Services & SQL Agent Jobs Stu Yarfitz Analyst/Programmer, Collaborative Data Services. FHCRC 6/18/2008
  • 91.
    Thank you foryour attention! [email_address] www.blueballgroup.com

Editor's Notes

  • #72 The Analysis Services technology remains at the heart of Microsoft’s BI capabilities. It is now in its third generation and has delivered a wide set of new options – both functional & management/scalability focused. We’ll focus on three key areas: The UDM – a new approach to modeling the inputs to the OLAP capabilities of SQL Server Analysis Services that can help eliminate costly data staging areas and business-function specific data marts KPI – a new architecture for delivering goal-based metrics to the organization Deep Data Mining – moving beyond slice & dice and drilldown to provide tools that help you catch complex relationships and patterns in your data and predict outcomes based on past results.
  • #73 ODBO = OLEdb for OLAP XMLA = XML for Analysis
  • #74 UDM: SQL Server 2005 introduces the Universal Data Model – this technology provides the mechanism to describe data sources in a BI-friendly manner without requiring changes to the source data. This can eliminate the need for staging areas as data can be consumed directly from the source systems. The models can then be used to drive multiple cubes (or live caches of the underlying data). By looking at the cubes as a series of high-performance caches the best of the OLAP & OLTP worlds can be combined. Cache: The cache is a MOLAP datastore that manages the retrieval of data from backend data sources. You can control how frequently the multidimensional cache is rebuilt, if stale data can be queried while the cache is being refreshed, and whether data is retrieved to a schedule or when it changes in the database. Business Intelligence Smarts: SQL Server 2000 included time dimension awareness – this is extended in SQL Server 2005 to cover autogeneration of several key calculations that really help you to jumpstart any BI system: Time – adds the following calculations: Period to date, Period over period growth, Moving average and Parallel period comparisons. Accounting – cost, balance & other accounting calculations Dimension – choose from a list of known dimension types and also automatically set additional attributes to autogenerate appropriate calculations Set your own aggregation operator or calculation and control updateability Data Mining: We’ll cover more on this later XMLA: SQL Server 2005 implements the open standard: XML for Analysis – optimized for scalable web access to access & define OLAP data.