SQL Server 2005 New Features


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SQL Server 2005 New Features

  1. 1. SQL Server 2005 New Features & Business Intelligence Kleanthis Georgaris Technology Specialist Microsoft Hellas
  2. 2. SQL Server 2005 A Complete Enterprise Data Management and BI Solution Development Tools Reporting Services Management Tools Management Tools Analysis Services OLAP & Data Mining Data Transformation Services ETL SQL Server Relational Engine
  3. 3. Agenda n XML Support in SQL Server 2005 n .NET Inside the Database n A step towards Object Oriented Programming n User Defined Types n Business Intelligence n OLAP n Data Mining
  4. 4. Agenda n XML Support in SQL Server 2005 n .NET Inside the Database n A step towards Object Oriented Programming n User Defined Types n Business Intelligence n OLAP n Data Mining
  5. 5. Data Representations n Data can be represented in two ways n Relational (Databases) : Requires Infrastructure n Structured (XML): It’s simply text n Data are exchange in XML Format but stored in Relational n We need convergence of the two models n Three alternatives n XML can be stored as text n loses much of value of XML representation n XML can decomposed into multiple relational tables n allows use of relational technologies n XML can be stored as an xml data type n allows use of XML technologies
  6. 6. Mapping Data Models n Sometimes you need to mix data models n middle-tier processing done with XML tools n web service requires message content in xml n browser requires xml for client side processing n But you have relational data n most data is stored using the relational model content & identifiers mapped id name company 37 Joe D Inc. <organization> 41 May A Co. <title sn="37" org="D Inc."/> <title sn="41" org="A Co."/> 14 Sam H Inc. <title sn="14" org="H Inc."/> 58 Bev K Inc. ... </organization> company table XML required for message database
  7. 7. XML as a data type n The XML data type is native database type n used as type of column in table n used as type of parameter in stored procedure n used as type of return value of a user-defined function n used as type of a variable
  8. 8. XML data type - Example CREATE TABLE xml_tab ( the_id INTEGER, xml_col XML) GO -- auto conversion INSERT INTO xml_tab VALUES(1, '<doc/>') INSERT INTO xml_tab VALUES(2, N'<doc/>') SELECT CAST(xml_col AS VARCHAR(MAX)) FROM xml_tab WHERE the_id < 10 -- fails, not wellformed INSERT INTO xml_tab VALUES(3, '<doc><x1><x2></x1></x2></doc>')
  9. 9. XML column usage n XML column is not just a TEXT column n XML technologies supported n the contents can validated using XML Schema n XML-aware indexes are supported n XQuery and XPath 2.0 supported n in-database XML-related functionality works on the type n FOR XML n OpenXML
  10. 10. XML Demo
  11. 11. Agenda n XML Support in SQL Server 2005 n .NET Inside the Database n Business Intelligence n OLAP n Data Mining
  12. 12. Hosted CLR n .Net CLR hosted inside SQL Server to improve performance n applications run in same address space as SQL Server n stored procedures in any language supported by CLR n web services can run inside of SQL Server SQL Server Process T-SQL function user .Net code database
  13. 13. .NET and Visual Studio Integration Breakthrough in Developer Productivity n Choice of programming language n T-SQL for data-intensive functions and procedures n .NET languages for CPU-intensive functions and procedures n Choice of where to run logic n Database or mid-tier n Symmetric data access model – ADO.NET n Integrated debugging experience across mid-tier and database tier n Seamlessly step cross-language – TSQL and .NET n Set breakpoints anywhere, inspect anything n Flexible and extensible n Users defined functions, procedures, triggers n User defined types and aggregates n XML data type
  14. 14. Development Environment n New SQL Server Project template in VS 2005 for SQL Server 2005 managed code n Server debug integration n Full debugger visibility n Set breakpoints anywhere n Single step support n Between languages n Between deployment tiers n Auto-deployment n Attributes
  15. 15. The Developer Experience VB,C#,C++ VS .NET Build Assembly: Project “TaxLib.dll” Runtime hosted by SQL SQL Data Definition: create assembly … (in-proc) create function … create procedure … create trigger … create type … SQL Server SQL Queries: select sum(tax(sal,state)) from Emp where county = ‘King’
  16. 16. SQL Web Services n Native SOAP access n Standards based access to SQL Server n No client dependency http://server1/aspnet/default.aspx n Improved Interoperability n New “ENDPOINT AS HTTP” object n Configure connection info n Configure authentication n Expose Functions & SPs Kernel n Expose TSQL Batches Mode Listener http://server1/sql/pubs?wsdl
  17. 17. Why user-defined types? n Add scalars that extend the type system n used in sorts, aggregates n customized sort orders and arithmetic calculations n Allows scalars to be implemented efficiently n compact representation n operations written in compiled language
  18. 18. UDTs on the client n SQL Server UDTs are "normal" .NET classes n can be used in clients as n parameters n DataReader column values n Methods can be used on the client or server n Code can be n locally available to clients n stored in GAC
  19. 19. Using UDTs with T-SQL n Using UDTs through Transact-SQL involves nothingTABLE UDT called Point has m_xpoint_colproperties newpoint_tab( oid integer, and m_y POINT) /* assuming a CREATE */ SqlConnection conn = new SqlConnection("my connect string"); SqlCommand cmd = new SqlCommand(); cmd.Connection = conn; conn.Open(); cmd.CommandText = "insert into point_tab values(1, convert(Point, '10:10'); int i; i = cmd.ExecuteNonQuery(); cmd.CommandText = "update point_tab set point_col::m_x = 15 where oid = 1"; i = cmd.ExecuteNonQuery();
  20. 20. UDTs and procedural code -- TSQL Procedure CREATE PROCEDURE GetPoints (@a PointCls) AS SELECT thepoint::m_x, thepoint::m_y FROM point_tab WHERE thepoint::m_x > @a::m_x GO DECLARE @p PointCls SET @p = CONVERT(PointCls, '1:1') EXEC GetPoints @p -- .NET function CREATE FUNCTION AddPoints ( @a PointCls, @b PointCls) RETURNS PointCls EXTERNAL NAME Point:PointCls::AddPoints GO DECLARE @a PointCls, @b PointCls, @c PointCls SET @a = CONVERT(PointCls, '100:200') SET @b = CONVERT(PointCls, '3:4') SET @c = dbo.AddPoints(@a, @b) SELECT @c::m_x
  21. 21. Agenda n XML Support in SQL Server 2005 n .NET Inside the Database n Business Intelligence n OLAP n Data Mining
  22. 22. What is Data Warehouse? n Defined in many different ways, but not rigorously. n A decision support database that is maintained separately from the organization’s operational database n Support information processing by providing a solid platform of consolidated, historical data for analysis. n “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon n Data warehousing: n The process of constructing and using data warehouses
  23. 23. Data Warehouse—Subject-Oriented n Organized around major subjects, such as customer, product, sales. n Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. n Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.
  24. 24. Data Warehouse—Integrated n Constructed by integrating multiple, heterogeneous data sources n relational databases, flat files, on-line transaction records n Data cleaning and data integration techniques are applied. n Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources n E.g., Hotel price: currency, tax, breakfast covered, etc. n When data is moved to the warehouse, it is converted.
  25. 25. Data Warehouse—Time Variant n The time horizon for the data warehouse is significantly longer than that of operational systems. n Operational database: current value data. n Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) n Every key structure in the data warehouse n Contains an element of time, explicitly or implicitly n But the key of operational data may or may not contain “time element”.
  26. 26. Data Warehouse—Non-Volatile n A physically separate store of data transformed from the operational environment. n Operational update of data does not occur in the data warehouse environment. n Does not require transaction processing, recovery, and concurrency control mechanisms n Requires only two operations in data accessing: n initial loading of data and access of data.
  27. 27. OLTP vs. OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date historical, detailed, flat relational summarized, multidimensional isolated integrated, consolidated usage repetitive ad-hoc access read/write lots of scans index/hash on prim. key unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response
  28. 28. Conceptual Modeling of Data Warehouses n Modeling data warehouses: dimensions & measures n Star schema: A fact table in the middle connected to a set of dimension tables n Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake n Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation
  29. 29. Example of Star Schema time time_key item day item_key day_of_the_week Sales Fact Table item_name month brand quarter time_key type year supplier_type item_key branch_key branch location location_key branch_key location_key branch_name units_sold street branch_type city dollars_sold state_or_province country avg_sales Measures
  30. 30. Example of Snowflake Schema time time_key item day item_key supplier day_of_the_week Sales Fact Table item_name supplier_key month brand supplier_type quarter time_key type year item_key supplier_key branch_key branch location location_key location_key branch_key units_sold street branch_name city_key branch_type dollars_sold city city_key avg_sales city state_or_province Measures country
  31. 31. Example of Fact Constellation time time_key item Shipping Fact Table day item_key day_of_the_week Sales Fact Table item_name time_key month brand quarter time_key type item_key year supplier_type shipper_key item_key branch_key from_location branch location_key location to_location branch_key location_key dollars_cost branch_name units_sold street branch_type dollars_sold city units_shipped province_or_state avg_sales country shipper Measures shipper_key shipper_name location_key shipper_type
  32. 32. Multidimensional Data n Sales volume as a function of product, month, and region Dimensions: Product, Location, Time Hierarchical summarization paths on gi Re Industry Region Year Category Country Quarter Product Product City Month Week Office Day Month
  33. 33. A Concept Hierarchy: Dimension (location) all all region Europe ... North_America country Germany ... Spain Canada ... Mexico city Frankfurt ... Vancouver ... Toronto office L. Chan ... M. Wind
  34. 34. A Sample Data Cube Total annual sales Date of TV in U.S.A. 1Qtr 2Qtr sum ct 3Qtr 4Qtr TV u od PC U.S.A Pr VCR Country sum Canada Mexico sum
  35. 35. OLAP Server Architectures n Relational OLAP (ROLAP) n Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middle ware to support missing pieces n Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services n greater scalability n Multidimensional OLAP (MOLAP) n Array-based multidimensional storage engine (sparse matrix techniques) n fast indexing to pre-computed summarized data n Hybrid OLAP (HOLAP) n User flexibility, e.g., low level: relational, high-level: array n Specialized SQL servers n specialized support for SQL queries over star/snowflake schemas
  36. 36. Data Warehouse Usage n Three kinds of data warehouse applications n Information processing n supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs n Analytical processing n multidimensional analysis of data warehouse data n supports basic OLAP operations, slice-dice, drilling, pivoting n Data mining n knowledge discovery from hidden patterns n supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools. n Differences among the three tasks
  37. 37. IT for the Past, Present and Future n Archiving the Past – storage, writing, etc n Awareness of the Present – networking, telecom, etc n Predicting the Future – This is where the action is! n What is needed? n Data about the past and present n Models for how systems evolve n Ability to associate data with system models n Predict the future and develop a course of action n Let’s enumerate some applications…..
  38. 38. Necessity Is the Mother of Invention n Data explosion problem n Automated data collection tools and mature database technology lead to tremendous amounts of data accumulated and/or to be analyzed in databases, data warehouses, and other information repositories n We are drowning in data, but starving for knowledge! n Solution: Data warehousing and data mining n Data warehousing and on-line analytical processing n Mining interesting knowledge (rules, regularities, patterns, constraints) from data in large databases
  39. 39. What Is Data Mining? n Data mining (knowledge discovery from data) n Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data n Data mining: a misnomer? n Alternative names n Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
  40. 40. Data Mining Process n Data mining—core of Pattern Evaluation knowledge discovery process Data Mining Task-relevant Data Data Warehouse Selection Data Cleaning Data Integration Databases
  41. 41. Complete Set of Algorithms Decision Trees Clustering Time Series Introduced in SQL Server 2000 Sequence Association Naïve Bayes Clustering Neural Net
  42. 42. What Is Association Mining? n Association rule mining: n Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories. n Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database [AIS93] n Motivation: finding regularities in data n What products were often purchased together? — Beer and diapers?! n What are the subsequent purchases after buying a PC? n What kinds of DNA are sensitive to this new drug? n Can we automatically classify web documents?
  43. 43. Classification vs. Prediction n Classification: n predicts categorical class labels (discrete or nominal) n classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data n Prediction: n models continuous-valued functions, i.e., predicts unknown or missing values n Typical Applications n credit approval n target marketing n medical diagnosis n treatment effectiveness analysis
  44. 44. Classification Process (1): Model Construction Classification Algorithms Training Data NAME RANK YEARS TENURED Classifier Mike Assistant Prof 3 no (Model) Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes IF rank = ‘professor’ Dave Assistant Prof 6 no OR years > 6 Anne Associate Prof 3 no THEN tenured = ‘yes’
  45. 45. Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) NAM E R ANK YEA RS TENURED Tom Assistant Prof 2 no Tenured? M erlisa Associate Prof 7 no George Professor 5 yes Joseph Assistant Prof 7 yes
  46. 46. Training Dataset age income student credit_rating buys_computer <=30 high no fair no This <=30 high no excellent no 31…40 high no fair yes follows an >40 medium no fair yes example >40 low yes fair yes from >40 low yes excellent no 31…40 low yes excellent yes Quinlan’s <=30 medium no fair no ID3 <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31…40 medium no excellent yes 31…40 high yes fair yes >40 medium no excellent no
  47. 47. Output: A Decision Tree for “buys_computer” age? <=30 overcast 30..40 >40 student? yes credit rating? no yes excellent fair no yes no yes