This document provides an overview and introduction to NoSQL databases. It discusses key-value stores like Dynamo and BigTable, which are distributed, scalable databases that sacrifice complex queries for availability and performance. It also explains column-oriented databases like Cassandra that scale to massive workloads. The document compares the CAP theorem and consistency models of these databases and provides examples of their architectures, data models, and operations.
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me for other informations and to download
In these slides we introduce Column-Oriented Stores. We deeply analyze Google BigTable. We discuss about features, data model, architecture, components and its implementation. In the second part we discuss all the major open source implementation for column-oriented databases.
This document provides an overview of NoSQL databases and summarizes key information about several NoSQL databases, including HBase, Redis, Cassandra, MongoDB, and Memcached. It discusses concepts like horizontal scalability, the CAP theorem, eventual consistency, and data models used by different NoSQL databases like key-value, document, columnar, and graph structures.
NoSQL databases take a different approach to data storage than traditional RDBMS systems. There are several categories of NoSQL databases including key-value stores, wide column stores, document stores, and graph databases. Each has different strengths such as flexibility, performance, or suitability for certain types of data. Choosing the right data model depends on factors like the relationships between data elements, scalability needs, and query requirements.
HBase is a column-oriented NoSQL database that provides random real-time read/write access to big data stored in Hadoop's HDFS. It is modeled after Google's Bigtable and sits on top of HDFS to allow fast access to large datasets. HBase architecture includes HMaster, HRegionServers, ZooKeeper, and HDFS. HMaster manages metadata and load balancing while HRegionServers serve read/write requests directly from clients. ZooKeeper coordinates the cluster and HDFS provides storage. Data is stored in tables divided into regions hosted by HRegionServers.
An overview of various database technologies and their underlying mechanisms over time.
Presentation delivered at Alliander internally to inspire the use of and forster the interest in new (NOSQL) technologies. 18 September 2012
Oracle has evolved from its first release in 1979 to become a leading database with various editions that can be used by individuals, workgroups or enterprises, and it provides developer tools and supports different database structures, security mechanisms, SQL for data access and transactions. Key components of an Oracle database include control files, data files, redo log files, tablespaces that logically organize storage, and various memory and file structures.
This document provides an overview and introduction to NoSQL databases. It discusses key-value stores like Dynamo and BigTable, which are distributed, scalable databases that sacrifice complex queries for availability and performance. It also explains column-oriented databases like Cassandra that scale to massive workloads. The document compares the CAP theorem and consistency models of these databases and provides examples of their architectures, data models, and operations.
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me for other informations and to download
In these slides we introduce Column-Oriented Stores. We deeply analyze Google BigTable. We discuss about features, data model, architecture, components and its implementation. In the second part we discuss all the major open source implementation for column-oriented databases.
This document provides an overview of NoSQL databases and summarizes key information about several NoSQL databases, including HBase, Redis, Cassandra, MongoDB, and Memcached. It discusses concepts like horizontal scalability, the CAP theorem, eventual consistency, and data models used by different NoSQL databases like key-value, document, columnar, and graph structures.
NoSQL databases take a different approach to data storage than traditional RDBMS systems. There are several categories of NoSQL databases including key-value stores, wide column stores, document stores, and graph databases. Each has different strengths such as flexibility, performance, or suitability for certain types of data. Choosing the right data model depends on factors like the relationships between data elements, scalability needs, and query requirements.
HBase is a column-oriented NoSQL database that provides random real-time read/write access to big data stored in Hadoop's HDFS. It is modeled after Google's Bigtable and sits on top of HDFS to allow fast access to large datasets. HBase architecture includes HMaster, HRegionServers, ZooKeeper, and HDFS. HMaster manages metadata and load balancing while HRegionServers serve read/write requests directly from clients. ZooKeeper coordinates the cluster and HDFS provides storage. Data is stored in tables divided into regions hosted by HRegionServers.
An overview of various database technologies and their underlying mechanisms over time.
Presentation delivered at Alliander internally to inspire the use of and forster the interest in new (NOSQL) technologies. 18 September 2012
Oracle has evolved from its first release in 1979 to become a leading database with various editions that can be used by individuals, workgroups or enterprises, and it provides developer tools and supports different database structures, security mechanisms, SQL for data access and transactions. Key components of an Oracle database include control files, data files, redo log files, tablespaces that logically organize storage, and various memory and file structures.
This document provides an overview of different database types including relational, NoSQL, document, key-value, graph, and column family databases. It discusses the history and drivers behind the development of NoSQL databases, as well as concepts like horizontal scaling, the CAP theorem, and eventual consistency. Specific databases are also summarized, including MongoDB, Redis, Neo4j, and HBase.
The presentation provides an overview of NoSQL databases, including a brief history of databases, the characteristics of NoSQL databases, different data models like key-value, document, column family and graph databases. It discusses why NoSQL databases were developed as relational databases do not scale well for distributed applications. The CAP theorem is also explained, which states that only two out of consistency, availability and partition tolerance can be achieved in a distributed system.
Cloud Deployments with Apache Hadoop and Apache HBaseDATAVERSITY
The document discusses deploying Apache Hadoop and Apache HBase cloud deployments. It begins with introducing the speaker and their background with Cloudera and various Apache projects. It then provides an overview of Cloudera and what they do. The majority of the document discusses Apache Hadoop and Apache HBase, what they are, how they are open source and horizontally scalable. It also discusses deploying a Hadoop and HBase cluster on Amazon EC2 using Apache Whirr to provision the machines. Real-world examples of using these technologies include building a web index for a search engine.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
The document discusses object-relational impedance mismatch and various data source patterns for mapping objects to relational databases in a way that minimizes this mismatch. It describes the table data gateway, row data gateway, active record, and data mapper patterns. The table data gateway acts as a gateway to a database table, while the row data gateway acts as a gateway to a single record. Active record wraps a database row and adds domain logic, and data mapper provides object-relational mapping to keep the object model independent from the database schema. Spring JDBC is also introduced as a framework that can help implement these patterns.
This document provides an introduction to NoSQL databases. It begins by explaining what a database management system (DBMS) and relational database management system (RDBMS) are. It then discusses some limitations of relational databases and how NoSQL databases address those limitations by being non-relational, schema-free, and offering simple APIs. The document provides a brief history of NoSQL databases and defines what NoSQL is and why it was developed to handle large, growing amounts of unstructured data from sources like social networks. It outlines some key features of NoSQL databases.
The initiation of The Hadoop Apache Hive began in 2007 by Facebook due to its data growth.
This ETL system began to fail over few years as more people joined Facebook.
In August 2008, Facebook decided to move to scalable a more scalable open-source Hadoop environment; Hive
Facebook, Netflix and Amazons support the Apache Hive SQL now known as the HiveQL
This document provides an overview of the course content for an online SAS training course. The course covers topics such as SAS basics, statistical analysis, data management, SQL, macro programming, and debugging SAS programs. It explores how to use SAS for clinical research studies and banking analysis. The course aims to teach students how to manage, analyze, and report on data with SAS.
This document provides an overview of NoSQL databases, including:
- Key-value stores store data as maps or hashmaps and are efficient for data access but limited in query capabilities.
- Column-oriented stores group attributes into column families and store data efficiently but are operationally challenging.
- Document databases store loosely structured data like JSON and allow retrieving documents by keys or contents.
- Graph databases are suited for interaction networks and path finding but are less suited for tabular data.
The document discusses the MariaDB CONNECT storage engine, which allows querying external file formats from within MariaDB. It was created by database expert Olivier Bertrand and brings business intelligence capabilities to MariaDB by enabling access to data sources like CSV, XML, Excel and other formats without needing ETL processes. The storage engine uses the MySQL plugin architecture and implements features like indexing, condition pushdown, and support for ODBC, MySQL tables, and various file types.
This document compares SQL and NoSQL databases. It defines databases, describes different types including relational and NoSQL, and explains key differences between SQL and NoSQL in areas like scaling, modeling, and query syntax. SQL databases are better suited for projects with logical related discrete data requirements and data integrity needs, while NoSQL is more ideal for projects with unrelated, evolving data where speed and scalability are important. MongoDB is provided as an example of a NoSQL database, and the CAP theorem is introduced to explain tradeoffs in distributed systems.
Apache Storm is a distributed, real-time computational framework used to process unbounded streams of data from sources like messaging systems or databases. It allows building topologies with spouts that act as data sources and bolts that perform computations. Data flows between nodes as tuples through streams. Apache Kafka is a distributed publish-subscribe messaging system that stores feeds of messages in topics, allowing producers to write data and consumers to read it.
This document discusses different types of distributed databases. It covers data models like relational, aggregate-oriented, key-value, and document models. It also discusses different distribution models like sharding and replication. Consistency models for distributed databases are explained including eventual consistency and the CAP theorem. Key-value stores are described in more detail as a simple but widely used data model with features like consistency, scaling, and suitable use cases. Specific key-value databases like Redis, Riak, and DynamoDB are mentioned.
This presentation explains why NoSQL databases came over SQL databases although SQL databases has been successfully technology for more than twenty years. Moreover, This presentation discuses the characteristics and classifications of NoSQL databases. Finally, These slides cover four NoSQL databases briefly.
MySQL: Know more about open Source DatabaseMahesh Salaria
- As a developer, it is important to understand MySQL's storage engines, data types, indexing, and normalization to build high-performing applications.
- MySQL has several storage engines that handle different table types differently in terms of transactions, locking, storage, and memory usage. Choosing the right engine depends on data usage.
- Properly normalizing data, using optimal data types, and adding indexes improves performance by reducing storage needs, memory usage, and speeding up queries.
The document discusses factors to consider when selecting a NoSQL database management system (DBMS). It provides an overview of different NoSQL database types, including document databases, key-value databases, column databases, and graph databases. For each type, popular open-source options are described, such as MongoDB for document databases, Redis for key-value, Cassandra for columnar, and Neo4j for graph databases. The document emphasizes choosing a NoSQL solution based on application needs and recommends commercial support for production systems.
The document discusses moving from traditional ETL processes to "analytics with no ETL" using Hadoop. It describes how Hadoop currently supports some ETL functions by storing raw and transformed data together. However, this still requires periodic loading of new data. The vision is to support complex schemas, perform background format conversion incrementally, and enable schema inference and evolution to allow analyzing data as it arrives without explicit ETL steps. This would provide an up-to-date, performant single view of all data.
This presentation discusses using WorldView-2 satellite imagery to classify land cover in Atlanta, Georgia. It combined multi-spectral data with multi-angle observations from 13 images. Four experiments classified imagery using a nadir multi-spectral image only, full multi-angle data, and dimensionality reduction techniques. The multi-angle data improved classification accuracy by 14% over using a single nadir image alone. Specific classes like cars and highways benefited more from the multi-angle information.
This document provides an overview of different database types including relational, NoSQL, document, key-value, graph, and column family databases. It discusses the history and drivers behind the development of NoSQL databases, as well as concepts like horizontal scaling, the CAP theorem, and eventual consistency. Specific databases are also summarized, including MongoDB, Redis, Neo4j, and HBase.
The presentation provides an overview of NoSQL databases, including a brief history of databases, the characteristics of NoSQL databases, different data models like key-value, document, column family and graph databases. It discusses why NoSQL databases were developed as relational databases do not scale well for distributed applications. The CAP theorem is also explained, which states that only two out of consistency, availability and partition tolerance can be achieved in a distributed system.
Cloud Deployments with Apache Hadoop and Apache HBaseDATAVERSITY
The document discusses deploying Apache Hadoop and Apache HBase cloud deployments. It begins with introducing the speaker and their background with Cloudera and various Apache projects. It then provides an overview of Cloudera and what they do. The majority of the document discusses Apache Hadoop and Apache HBase, what they are, how they are open source and horizontally scalable. It also discusses deploying a Hadoop and HBase cluster on Amazon EC2 using Apache Whirr to provision the machines. Real-world examples of using these technologies include building a web index for a search engine.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
The document discusses object-relational impedance mismatch and various data source patterns for mapping objects to relational databases in a way that minimizes this mismatch. It describes the table data gateway, row data gateway, active record, and data mapper patterns. The table data gateway acts as a gateway to a database table, while the row data gateway acts as a gateway to a single record. Active record wraps a database row and adds domain logic, and data mapper provides object-relational mapping to keep the object model independent from the database schema. Spring JDBC is also introduced as a framework that can help implement these patterns.
This document provides an introduction to NoSQL databases. It begins by explaining what a database management system (DBMS) and relational database management system (RDBMS) are. It then discusses some limitations of relational databases and how NoSQL databases address those limitations by being non-relational, schema-free, and offering simple APIs. The document provides a brief history of NoSQL databases and defines what NoSQL is and why it was developed to handle large, growing amounts of unstructured data from sources like social networks. It outlines some key features of NoSQL databases.
The initiation of The Hadoop Apache Hive began in 2007 by Facebook due to its data growth.
This ETL system began to fail over few years as more people joined Facebook.
In August 2008, Facebook decided to move to scalable a more scalable open-source Hadoop environment; Hive
Facebook, Netflix and Amazons support the Apache Hive SQL now known as the HiveQL
This document provides an overview of the course content for an online SAS training course. The course covers topics such as SAS basics, statistical analysis, data management, SQL, macro programming, and debugging SAS programs. It explores how to use SAS for clinical research studies and banking analysis. The course aims to teach students how to manage, analyze, and report on data with SAS.
This document provides an overview of NoSQL databases, including:
- Key-value stores store data as maps or hashmaps and are efficient for data access but limited in query capabilities.
- Column-oriented stores group attributes into column families and store data efficiently but are operationally challenging.
- Document databases store loosely structured data like JSON and allow retrieving documents by keys or contents.
- Graph databases are suited for interaction networks and path finding but are less suited for tabular data.
The document discusses the MariaDB CONNECT storage engine, which allows querying external file formats from within MariaDB. It was created by database expert Olivier Bertrand and brings business intelligence capabilities to MariaDB by enabling access to data sources like CSV, XML, Excel and other formats without needing ETL processes. The storage engine uses the MySQL plugin architecture and implements features like indexing, condition pushdown, and support for ODBC, MySQL tables, and various file types.
This document compares SQL and NoSQL databases. It defines databases, describes different types including relational and NoSQL, and explains key differences between SQL and NoSQL in areas like scaling, modeling, and query syntax. SQL databases are better suited for projects with logical related discrete data requirements and data integrity needs, while NoSQL is more ideal for projects with unrelated, evolving data where speed and scalability are important. MongoDB is provided as an example of a NoSQL database, and the CAP theorem is introduced to explain tradeoffs in distributed systems.
Apache Storm is a distributed, real-time computational framework used to process unbounded streams of data from sources like messaging systems or databases. It allows building topologies with spouts that act as data sources and bolts that perform computations. Data flows between nodes as tuples through streams. Apache Kafka is a distributed publish-subscribe messaging system that stores feeds of messages in topics, allowing producers to write data and consumers to read it.
This document discusses different types of distributed databases. It covers data models like relational, aggregate-oriented, key-value, and document models. It also discusses different distribution models like sharding and replication. Consistency models for distributed databases are explained including eventual consistency and the CAP theorem. Key-value stores are described in more detail as a simple but widely used data model with features like consistency, scaling, and suitable use cases. Specific key-value databases like Redis, Riak, and DynamoDB are mentioned.
This presentation explains why NoSQL databases came over SQL databases although SQL databases has been successfully technology for more than twenty years. Moreover, This presentation discuses the characteristics and classifications of NoSQL databases. Finally, These slides cover four NoSQL databases briefly.
MySQL: Know more about open Source DatabaseMahesh Salaria
- As a developer, it is important to understand MySQL's storage engines, data types, indexing, and normalization to build high-performing applications.
- MySQL has several storage engines that handle different table types differently in terms of transactions, locking, storage, and memory usage. Choosing the right engine depends on data usage.
- Properly normalizing data, using optimal data types, and adding indexes improves performance by reducing storage needs, memory usage, and speeding up queries.
The document discusses factors to consider when selecting a NoSQL database management system (DBMS). It provides an overview of different NoSQL database types, including document databases, key-value databases, column databases, and graph databases. For each type, popular open-source options are described, such as MongoDB for document databases, Redis for key-value, Cassandra for columnar, and Neo4j for graph databases. The document emphasizes choosing a NoSQL solution based on application needs and recommends commercial support for production systems.
The document discusses moving from traditional ETL processes to "analytics with no ETL" using Hadoop. It describes how Hadoop currently supports some ETL functions by storing raw and transformed data together. However, this still requires periodic loading of new data. The vision is to support complex schemas, perform background format conversion incrementally, and enable schema inference and evolution to allow analyzing data as it arrives without explicit ETL steps. This would provide an up-to-date, performant single view of all data.
This presentation discusses using WorldView-2 satellite imagery to classify land cover in Atlanta, Georgia. It combined multi-spectral data with multi-angle observations from 13 images. Four experiments classified imagery using a nadir multi-spectral image only, full multi-angle data, and dimensionality reduction techniques. The multi-angle data improved classification accuracy by 14% over using a single nadir image alone. Specific classes like cars and highways benefited more from the multi-angle information.
Este documento presenta una introducción al municipio de Carral en España. Incluye secciones sobre su ubicación geográfica, historia desde la prehistoria hasta la época contemporánea, monumentos destacados, gastronomía local y rutas turísticas recomendadas para visitar la región.
El grupo de estudiantes visita Madrid y realiza varias actividades como ver animales en el zoo, comer en una terraza cerca de la Puerta de Alcalá y sacarse una foto con un elefante. A lo largo del día expresan opiniones sobre la comida, el clima frío y lo cansados que están.
Windows Server 2008 and 2008 R2 provide greater control, protection, and flexibility for IT. Key features include Hyper-V virtualization, improved management with PowerShell and Server Core, enhanced security with BitLocker and OS hardening, and technologies like BranchCache and DirectAccess that improve the mobile workforce experience. The products also integrate well with Windows 7 to provide a consistent experience both inside and outside the corporate network.
The document introduces Microsoft's Windows Azure cloud platform. It summarizes that Windows Azure provides an operating system for the cloud that abstracts away hardware and provides services for automated management, scalable computing and storage. It allows developers to build applications and services that can easily scale across large, connected data centers. The talk demonstrates how Windows Azure allows building complex service architectures from simple components like web and worker roles that interact through a durable storage system. It emphasizes that the platform aims to provide a familiar development experience while handling all the complexities of highly scalable cloud services.
This document discusses Windows Small Business Server and Windows Essential Business Server solutions. It provides an overview of the different editions available and who they are designed for based on business size and growth. The Small Business Server is designed for companies with up to 75 PCs, while Essential Business Server is designed for midsize businesses with up to 300 PCs and the ability to grow. Both solutions provide integrated server technologies at a lower price point than comparable standalone products and are aimed at reducing IT complexity for small and midsize businesses.
from the Commonwealth Transportation Board's June meeting
presented to the Hampton Roads Partnership Annual Meeting, June 19, 2009
by Pierce Homer, the Commonwealth's Secretary of Transportation
Transportation Sub-Committee Meeting of 10 Dec 08 reports on alternatives and their ramifications to improving road congestion in Hampton Roads and the need to be ready to claim infrastructure stimulus money from the Obama administration. http://www.vmasc.odu.edu
La Unión Europea ha propuesto un nuevo paquete de sanciones contra Rusia que incluye un embargo al petróleo ruso. El embargo se aplicaría gradualmente durante seis meses para el petróleo crudo y ocho meses para los productos refinados. Este paquete de sanciones requiere la aprobación unánime de los 27 estados miembros de la UE.
This document discusses xRM (extended relationship management) applications built using Microsoft Dynamics CRM. It provides examples of how CRM can be extended beyond traditional customer relationship management to manage other types of relationships. These include applications for healthcare patient relationship management, education student information systems, government constituent relationship management, and more. The document also discusses the benefits of building xRM applications on the Dynamics CRM platform in terms of leveraging existing investments, rapid development and deployment cycles compared to custom solutions.
Windows Server 2008 and 2008 R2 provide greater control, protection, and flexibility for IT. Key features include Hyper-V virtualization, improved management with PowerShell and Server Core, enhanced security with BitLocker and OS hardening, and technologies like BranchCache and DirectAccess that improve the mobile workforce experience. The products also integrate well with Windows 7 to provide a consistent experience both inside and outside the corporate network.
1) Plastic roads use shredded plastic waste that is mixed with hot bitumen and laid like conventional tar roads.
2) Laboratory studies have shown plastic roads have improved properties like increased stability and strength compared to ordinary roads.
3) Using plastic waste in road construction provides an effective solution for plastic disposal while enhancing road quality in a more environmentally friendly manner.
The document discusses different NoSQL data models including key-value, document, column family, and graph models. It provides examples of popular NoSQL databases that implement each model such as Redis, MongoDB, Cassandra, and Neo4j. The document argues that these NoSQL databases address limitations of relational databases in supporting modern web applications with requirements for scalability, flexibility, and high performance.
Enterprise geodatabase sql access and administrationbrentpierce
The document provides an overview of accessing and administering an enterprise geodatabase through SQL and Python. It discusses how the geodatabase is based on relational database principles with user data stored in tables and system metadata stored in system tables. It describes how spatial types store geometry data and the benefits of using SQL to access and edit geodatabase content. The document also outlines how Python can be used for geodatabase administration tasks like schema creation, maintenance, and publishing tools.
Business intelligence and data warehousesDhani Ahmad
This chapter discusses business intelligence and data warehouses. It covers how operational data differs from decision support data, the components of a data warehouse including facts, dimensions and star schemas, and how online analytical processing (OLAP) and SQL extensions support analysis of multidimensional decision support data. The chapter also discusses data mining, requirements for decision support databases, and considerations for implementing a successful data warehouse project.
This document provides a summary of Oracle OpenWorld 2014 discussions on database cloud, in-memory database, native JSON support, big data, and Internet of Things (IoT) technologies. Key points include:
- Database Cloud on Oracle offers pay-as-you-go pricing and self-service provisioning similar to on-premise databases.
- Oracle Database 12c includes an in-memory option that can provide up to 100x faster analytics queries and 2-4x faster transaction processing.
- Native JSON support in 12c allows storing and querying JSON documents within the database.
- Big data technologies like Oracle Big Data SQL and Oracle Big Data Discovery help analyze large and diverse data sets from sources like
Researching an alternative to the MS SQL database - first of all in order to gain additional technological benefits, secondly moving towards an open source way of development.
The idea behind this presentation was to introduce PostgreSQL (ver. 9.4+) in a different manner than a conventional "Pros Vs. Cons" style, it is more likely to be a "Buzz Word" thesaurus (of course based on a deep research).
P.S. Since it's a presentation, there was no intention going over and covering all of the PostgreSQL features - most of the interesting parts.
An AMIS Overview of Oracle database 12c (12.1)Marco Gralike
Presentation used by Lucas Jellema and Marco Gralike during the AMIS Oracle Database 12c Launch event on Monday the 15th of July 2013 (much thanks to Tom Kyte, Oracle, for being allowed to use some of his material)
M.
AMIS organiseerde op maandagavond 15 juli het seminar ‘Oracle database 12c revealed’. Deze avond bood AMIS Oracle professionals de eerste mogelijkheid om de vernieuwingen in Oracle database 12c in actie te zien! De AMIS specialisten die meer dan een jaar bèta testen hebben uitgevoerd lieten zien wat er nieuw is en hoe we dat de komende jaren gaan inzetten!
Deze presentatie is deze avond gegeven als een plenaire sessie!
This document provides an overview and summary of key concepts related to advanced databases. It discusses relational databases including MySQL, SQL, transactions, and ODBC. It also covers database topics like triggers, indexes, and NoSQL databases. Alternative database systems like graph databases, triplestores, and linked data are introduced. Web services, XML, and data journalism are also briefly summarized. The document provides definitions and examples of these technical database terms and concepts.
The document provides an overview of database systems, including their purpose, components, and history. It discusses how database systems address issues with using file systems to store data, such as data redundancy, difficulty of accessing data, integrity problems, and concurrent access. The key components of a database system are the database management system (DBMS), data models, data definition and manipulation languages, database design, storage and querying, transaction management, architecture, users, and administrators. The relational model and SQL are introduced as widely used standards. A brief history outlines the evolution from early data processing using tapes and cards to modern database systems.
Cheetah is a custom data warehouse system built on top of Hadoop that provides high performance for storing and querying large datasets. It uses a virtual view abstraction over star and snowflake schemas to provide a simple yet powerful SQL-like query language. The system architecture utilizes MapReduce to parallelize query execution across many nodes. Cheetah employs columnar data storage and compression, multi-query optimization, and materialized views to improve query performance. Based on evaluations, Cheetah can efficiently handle both small and large queries and outperforms single-query execution when processing batches of queries together.
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSkillwise Group
This document provides an overview of the SSIS design pattern for data warehousing and change data capture. It discusses what design patterns are and how they are commonly used for SSIS and data warehousing projects. It then covers 13 specific patterns including truncate and load, slowly changing dimensions, hashbytes, change data capture, merge, and master/child workflows. The document explains when each pattern is best used and provides pros and cons. It also provides guidance on configuring and using SQL Server change data capture functionality.
The document provides an overview of database systems and their components. It discusses the purpose of database systems, database languages, data models, database internals including storage management, query processing and transaction management. It also describes different types of database users and the role of the database administrator.
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullyMd Kamaruzzaman
In modern Software Development and Software Architecture, selecting the right DataStore is one of the most challenging and important task. In this presentation, I have summarized the major DataStores and the decision criteria to select the right DataStore according to the use case.
NoSQL databases were developed to address the need for databases that can handle big data and scale horizontally to support massive amounts of data and high user loads. NoSQL databases are non-relational and support high availability through horizontal scaling and replication across commodity servers to allow for continuous availability. Popular types of NoSQL databases include key-value stores, document stores, column-oriented databases, and graph databases, each suited for different use cases depending on an application's data model and query requirements.
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...Ashnikbiz
Ashnik Database Solution Architect, Sameer Kumar, an Open Source evangelist presented at FOSSASIA 2015 about the features of open source database like PostgreSQL which are missed by developers stuck on proprietary databases.
10 Features you would love as an Open Source developer!
- New JSON Datatype
- Vast set of datatypes supported
- Rich support for foreign Data Wrap
- User Defined Operators
- User Defined Extensions
- Filter Based Indexes or Partial Indexes
- Granular control of parameters at User, Database, Connection or Transaction Level
- Use of indexes to get statistics
- JDBC API for COPY -Command
- Full Text Search
NoSQL databases provide an alternative to traditional relational databases that is well-suited for large datasets, high scalability needs, and flexible, changing schemas. NoSQL databases sacrifice strict consistency for greater scalability and availability. The document model is well-suited for semi-structured data and allows for embedding related data within documents. Key-value stores provide simple lookup of data by key but do not support complex queries. Graph databases effectively represent network-like connections between data elements.
This document discusses data-intensive computing and provides examples of technologies used for processing large datasets. It defines data-intensive computing as concerned with manipulating and analyzing large datasets ranging from hundreds of megabytes to petabytes. It then characterizes challenges including scalable algorithms, metadata management, and high-performance computing platforms and file systems. Specific technologies discussed include distributed file systems like Lustre, MapReduce frameworks like Hadoop, and NoSQL databases like MongoDB.
The document discusses data-intensive computing and provides details about related technologies. It defines data-intensive computing as concerned with large-scale data in the hundreds of megabytes to petabytes range. Key challenges include scalable algorithms, metadata management, high-performance computing platforms, and distributed file systems. Technologies discussed include MapReduce frameworks like Hadoop, Pig, and Hive; NoSQL databases like MongoDB, Cassandra, and HBase; and distributed file systems like Lustre, GPFS, and HDFS. The document also covers programming models, scheduling, and an example application to parse Aneka logs using MapReduce.
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
This document summarizes a survey of advanced non-relational database systems, their approaches, applications, and comparison to relational database management systems (RDBMS). It outlines the problem of scaling to meet new web-scale demands, describes how non-relational databases provide a solution by sacrificing consistency for availability and partition tolerance. Examples of non-relational databases are provided, including their data models, APIs, optimizations, and benefits compared to RDBMS such as improved scalability and fault tolerance.
Hyper-V is Microsoft's server virtualization technology that is included with Windows Server 2008. It allows multiple virtual machines to run on a single physical machine. Key capabilities of Hyper-V include support for large memory virtual machines up to 64GB, live migration of virtual machines between physical servers, and integration with the Windows hypervisor for security and isolation of virtual machines. System Center Virtual Machine Manager 2008 provides centralized management of virtualized and physical infrastructure across Hyper-V, Virtual Server and VMware environments.
Silverlight 2 allows developers to create rich internet applications using XAML and .NET code with the ability to access local storage, network resources, graphics, and multimedia; it provides a lightweight runtime for deploying smart client applications within web pages similar to Adobe Flash but with the added benefits of the .NET framework and local file access capabilities. Key controls and APIs include data binding, animations, graphics, audio/video playback and networking functionality for building rich interactive applications.
The document discusses Microsoft Dynamics NAV 5.0 SP1, including the overall strategy and new features. It outlines enhancements to application functionality, productivity tools, and integration capabilities. Key points include 300+ general improvements, planning engine refactoring, new document archiving and commenting features, and SQL Server performance optimizations. Upgrade considerations are also addressed.
Mesh Services are one of the underlying core services of the Live Framework. They manage users, devices, applications, and synchronization across a user's digital experiences. Key functions include identity management, directory services, storage, communications, and search capabilities to enable sharing of resources like contacts, files, and data across devices and applications.
Mogens Larsen will give a presentation on Dynamics AX 2009 Supply Chain Management from 14:15-15:15. The presentation will cover the user interface of Dynamics AX 2009, inventory management, order flow, and warehouse management. It will also include a short introduction to Dynamics AX 2009 through a PowerPoint presentation and discuss the product roadmap.
This document discusses how Visual Studio Team System can maximize ROI and drive IT governance through an integrated Application Lifecycle Management (ALM) solution. It provides concise summaries of key points, including how VSTS improves collaboration, ensures quality, integrates work frequently, and enables real-time decision making. IT governance is also discussed at a high level, focusing on compliance, metrics/reporting, and aligning IT with business needs. Examples are given of organizations seeing improvements in areas like productivity, quality, and cost reductions through an ALM approach.
This document discusses cloud computing options and dispels common myths about the cloud. It presents a cloud maturity model and suggests that CIOs should focus on reducing costs, attracting customers, and stimulating innovation. The document advocates evaluating different computing options based on needs and observing the evolving maturity of cloud computing. It warns of risks like dependency on vendors and issues with migrating systems. Overall, the document provides an overview of cloud computing and advice for CIOs on developing strategies regarding the cloud.
SOA involves exposing business functions as reusable services. This allows for greater agility, flexibility and reuse of services across different applications. SOA breaks down monolithic applications into discrete services that can be accessed over the network in a standardized way. This trend is driving the development of loosely coupled, interoperable services that can be discovered and orchestrated to meet business needs.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
2. SQL Server 2008 for relational and
multi-dimentional solution
developers
Silvano Coriani
silvano.coriani@microsoft.com
Developer Evangelist
Microsoft
2
3. Agenda
• SQL Server 2008 support for next generation
application development
– Geospatial data type
– Filestream
– Date & Time
– Large UDT
• Simplify existing application scenario
– Table Valued Parameters
– Change Tracking
– Hierarchy ID
• Going multi-dimensional
– Developer’s roadbook to SSIS, SSAS and SSRS
3
4. Relational and Non-Relational Data
• Relational data uses simple data types
– Each type has a single value
– Generic operations work well with the types
• Relational storage/query may not be optimal for
– Hierarchical data
– Spatial data
– Sparse, variable, property bags
• Some types
– benefit by using a custom library
– use extended type system (complex types, inheritance)
– use custom storage and non-SQL APIs
– use non-relational queries and indexing
4
5. Spatial Data
• Spatial data provides answers to location-
based queries
– Which roads intersect the Microsoft campus?
– Does my land claim overlap yours?
– List all of the Italian restaurants within 5
kilometers
• Spatial data is part of almost every database
– If your database includes an address
5
6. SQL Server 2008 and Spatial Data
• SQL Server supports two spatial data types
– GEOMETRY - flat earth model
– GEOGRAPHY - round earth model
• Both types support all of the instanciable OGC
types
– InstanceOf method can distinguish between them
• Supports two dimension data
– X and Y or Lat and Long members
– Z member - elevation (user-defined semantics)
– M member - measure (user-defined semantics)
6
7. Sample Query
Which roads intersect Microsoft’s main
SELECT *
FROM roads
campus? roads.geom.STIntersects(@ms)=1
WHERE
7
8. Filestream storage
• Storing large binary objects in databases is
suboptimal
– Large objects take buffers in database memory
– Updating large objects cause database fragmentation
• In file system however, "update" is delete and insert
• "Before image" in an update is not deleted immediately
• Storing all related data in a database adds
– Transactional consistency
– Integrated, point-in-time backup and restore
– Single storage and query vehicle
8
9. SQL Server 2008 Filestream
Implementation
• A filegroup for filestream storage is declared using DDL
– Filestream storage is tied to a database
• The filegroup is mapped to a directory
– Must be NTFS file system
– Caution: Files deleteable from file system if you have
appropriate permissions
• VARBINARY(MAX) columns can be defined with
FILESTREAM attribute
– Table must also have UNIQUEIDENTIFIER column
– Filestream storage not available for other large types
• Data is stored in the file system
9
10. Programming with Filestreams
• Filestream columns are available with SQL methods
– If SQL is used, indistinguishable from varbinary(max)
• Filestream can be accessed and modified using file IO
– PathName function retrieves a symbolic path name
– Acquire context with
• GET_FILESTREAM_TRANSACTION_CONTEXT
– Use OpenSqlFilestream to get a file handle based on
• File Name
• Required Access
• Access Options
• FilestreamTransaction context
10
11. Filestream behaviors
• File IO methods are available using
– Win32 APIs (usually with SQL Native Client)
– .NET Wrapper
• Handle can use
– ReadFile, WriteFile, TransmitFile, FlushFileBuffers...
– Handle must be closed before transaction commits
– FileIO supported with ReadCommitted isolation level
• File is required for handle, so to insert
– Insert a zero-length value
– Retrieve path and transaction context
– Write using streamed IO
11
14. Table-valued Parameters (TVP)
• Input parameters of Table type on
SPs/Functions
• Optimized to scale and perform better for
large data CREATE TYPE myTableType AS TABLE
(id INT, name NVARCHAR(100),qty
• Behaves like BCP in server INT);
CREATE PROCEDURE myProc (@tvp
• Simple programming model myTableType READONLY) AS
UPDATE Inventory SET
• Strongly typed
qty += s.qty
FROM Inventory AS i INNER JOIN
@tvp AS tvp
• Reduce client/server round trips GO
ON i.id = tvp.id
• Do not cause a statement to recompile
14
15. Table-valued Parameters (TVP)
TVP Client Stack Support
• Fully supported in ADO.NET 3
• New Parameter type:
SqlDbType.Structured
• Parameters can be passed in multiple ways
– DataTable
– IEnumerable<SqlDataRecord> (fully streamed)
– DbDataReader
15
16. Hierarchical Data
• Hierarchical data consists of nodes and edges
– In employee-boss relationship, employee and boss are
each nodes, the relationship between them is an edge
• Hierarchical data can be modeled in relational as
– Adjacency model - separate column for edge
• Most common, column can either be in same or separate
table
– Path Enumeration model - column w/hierarchical path
– Nested Set model - adds "left" and "right" columns to
represent edges, which must be maintained
separately
16
17. SQL Server 2008 and Hierarchical Data
• New Built-In Data Type - HierarchyID
• SQLCLR based system UDT
– Useable on .NET clients directly as SqlHierarchyId
• An implementation of path enumeration
model
– Uses ORDPATH internally for speed
17
18. HierarchyID
• Depth-first indexing
• "Level" property - allows breadth-first indexing
• Methods for common hierarchical operations
– GetRoot
– GetLevel
– IsDescendant
– GetDescendant, GetAncestor
– Reparent
• Does not enforce tree structure
– Can enforce tree using constraints
18
20. Sparse Properties
• Many designs require sparse properties
– Hardware store has different attributes for each
product
– Lab tests have different readings for each test
– Directory systems have different attributes for
each item
• These are name-value pairs (property bags)
• Because they don't appear on each tuple
(row) they are difficult to model
20
21. Modeling Sparse Properties
• Sparse Properties often modeled as separate table
– Base table has one row per item - common properties
– Property table has N rows per item - one per property
– Known as Entity-Attribute-Value
• Can be modeled as sparse tables
– 256 table limit in SQL Server JOIN
• Can be modeled as sparse columns
– 1024 column limit in SQL Server tables
• Can be modeled as XML
– Common properties are elements, sparse are attributes
21
22. SQL Server 2008 and Sparse Columns
• Sparse Column extends column limit
• Still 1024 column limit for "non-sparse"
columns
• Over 1024 (10000) for sparse columns
• Column marked as SPARSE in table definition
• Additional column represents all sparse
column name value pairs as attributes in a
single XML element
22
23. Change Tracking
• 3 different “flavor” of tracking data changes in SQL
Server 2008
– Change Tracking, CDC (used in DW), Auditing (security-
oriented)
• Keeps track of data modifications in a table
– Lightweight (No trigger, No schema changes)
• Overhead similar to a traditional index
– Synchronous at commit time
– Gives you access to “net changes” from T0
• Doesn’t keep track of “historical” changes
23
24. Why go multi-dimensional?
• Organizations have large volumes of related data stored in a
variety of data systems, often in different formats
• Data systems may not…
– Be optimized for analytical queries
– Contain all the data required by design or by time
– Manage historical context
– Be available or accessible
• Non-technical employees and managers may not have
sufficient skills, tools, or permissions to query data systems
• Systems may not have universal definitions of an entity
• Analytical queries & reporting can impact operational system
performance
24
25. A realistic scenario
• Data source
independence
– Can survive OLTP
system changes
– Heterogeneous data
source
• Single version of the truth
– Data Warehouse data
centralization
– Data Mart as specific
model for analysis
– Data Mart is user
oriented, not Data
Warehouse
• Some tools can be used
also by OLTP solutions
– Reporting Services
– OLTP queries
25 25
26. The Microsoft BI Platform
SQL Server 2008
Integrate Store
Analyze Report
26
27. New with Microsoft SQL Server 2008
Integration & Data Warehousing
• Scale and Manage large number of users and
data
– Improved Query performance on large tables Enhanced Partitioning
– Queries Optimized for data warehousing
scenarios DW Query Optimizations
– Increase I/O performance with efficient and
cost effective data storage Data Compression
– Manage concurrent workloads of ad-hoc
queries, reporting and analysis Resource Governor
• Integrate growing volumes of data
Persistent Lookups
– Optimize your ETL performance by identifying
data in your largest tables
Change Data Capture
– Reduce the data load volumes by capturing
operational changes in data
MERGE SQL Statement
– Simplify your insert and update data processing
– Profile your information to identify dirty data Data Profiling
27
28. Enterprise-class Data Integration with
SQL Server Integration Services
• Scalable Integrations
– Connect to data
– Multi-threaded architecture
– Comprehensive transformations
– Profile your data
– Cleanse your data
• Data Quality
– Cleanse data
– Text Mining
– Identify dirty data
28
29. Rich Connectivity
• Extensive Connectivity
– Standards based support
Unstructured data
– XML, Flat Files and Excel
– Binary Files
Legacy data: Binary files – BizTalk, MS Message Queues
– Oracle, DB2 and SQL Server
Application database
– Partner Ecosystem
OLTP • Change Data Capture
– Transparently capture changes
Change
Tables – Real time integration
DW
29
30. Rich Connectivity
Data Providers
ODBC
SQL Server SAP
NetWeaver BI SQL Server
Report Server Models
SQL Server
Integration Services Teradata
XML
OLE DB
DB2
MySAP SQL Server
Data Mining Models
Oracle
SQL Server
Analysis Services
Hyperion Essbase
30
32. New with Microsoft SQL Server 2008
Analysis Services
Innovative Cube Designer
Best Practice Design Alerts
Enhanced Dimension Design
Enhanced Aggregation Design
New Subspace Computations
MOLAP Enabled Write-Back
Enhanced Back-Up Scalability
New Resource Monitor
Execution Plan
32
34. New with Microsoft SQL Server 2008
Reporting Services
New Report Designer
Enhanced Data Visualization
New Flexible Report Layout
Scalable Report Engine
Single Service Architecture
New Word Rendering
Improved Excel Rendering
New End User Design Experience
SharePoint Integration
34
35. The complete flow
OLTP Client Portal
Analytical Applications
(MBS, third-party)
Office/SharePoint/PPS
Query and
CRM DW, Reporting
ERP ODS
Integration Analytical Devices
Services Data Analysis Components
LOB (ETL) (OLAP, DM)
Analytic Platform
.NET Framework (IIS, ASP, Net, CLR) and SQL Server
(Relational, Multidimensional, XML)
BI Development and Management Tools SQL Server Management Tools
35
36. Languages, APIs, And SDKs
• MDX + DMX
• ADO MD.NET
– AdoMdClient and AdoMdServer
• XML/A
• AMO
• RDL
• Report Server Web Service, RS URL Access,
and RS Extensions
36
37. Develop Custom Client Applications
• Using ADO MD.NET, AMO, and XMLA in your
own applications
• Front-ending RS and ProClarity
• Integrating with AdoMdServer and
server-side assemblies
• Using Data Mining Model Viewer controls
• Visualization with WPF and Silverlight
37
38. Summary
• Microsoft SQL Server and his services are the basement
for a complete solution, from data access to analysis, from
data consolidation to performance management
• Together with other Microsoft technologies can be used
by Developers and IT Professionals to build powerful and
flexible reporting and analysis solutions for the end users
• Several class libraries and protocols helps solution
developers to integrate these components in line of
business applications in a easy and natural way
– .NET Framework languages and technologies are the glue that
connect these building blocks together
38
39. Don’t forget the evalutations!!
• Fill the evaluations and you’ll get
– Windows Home Server (1st day)
– Windows 7 Beta (2nd day)
39