SSIS is a component of SQL Server that allows for data integration and workflow. It has separate runtime and data flow engines. The runtime engine manages package execution and control flow, while the data flow engine extracts, transforms, and loads data in a parallel, buffered manner for improved performance. SSAS is the analysis component that builds multidimensional cubes from relational data sources for analysis. It uses an OLAP storage model and has components for querying, processing, and caching data and calculations. SSRS is the reporting component that allows users to build interactive, parameterized reports from various data sources and deliver them through a web portal.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
An introduction to Microsoft Power BI, emphasisng on the usability of Power Query and how it's useful for the excel population. A session delived at Orion India Systems Pvt. Ltd.
Build Real-Time Applications with Databricks StreamingDatabricks
In this presentation, we will study a recent use case we implemented recently. In this use case we are working with a large, metropolitan fire department. Our company has already created a complete analytics architecture for the department based upon Azure Data Factory, Databricks, Delta Lake, Azure SQL and Azure SQL Server Analytics Services (SSAS). While this architecture works very well for the department, they would like to add a real-time channel to their reporting infrastructure.
This channel should serve up the following information: •The most up-to-date locations and status of equipment (fire trucks, ambulances, ladders etc.)
• The current locations and status of firefighters, EMT personnel and other relevant fire department employees
• The current list of active incidents within the city The above information should be visualized through an automatically updating dashboard. The central component of the dashboard will be map which automatically updates with the locations and incidents. This view should be as real-time as possible and will be used by the fire chiefs to assist with real-time decision-making on resource and equipment deployments.
In this presentation, we will leverage Databricks, Spark Structured Streaming, Delta Lake and the Azure platform to create this real-time delivery channel.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
An introduction to Microsoft Power BI, emphasisng on the usability of Power Query and how it's useful for the excel population. A session delived at Orion India Systems Pvt. Ltd.
Build Real-Time Applications with Databricks StreamingDatabricks
In this presentation, we will study a recent use case we implemented recently. In this use case we are working with a large, metropolitan fire department. Our company has already created a complete analytics architecture for the department based upon Azure Data Factory, Databricks, Delta Lake, Azure SQL and Azure SQL Server Analytics Services (SSAS). While this architecture works very well for the department, they would like to add a real-time channel to their reporting infrastructure.
This channel should serve up the following information: •The most up-to-date locations and status of equipment (fire trucks, ambulances, ladders etc.)
• The current locations and status of firefighters, EMT personnel and other relevant fire department employees
• The current list of active incidents within the city The above information should be visualized through an automatically updating dashboard. The central component of the dashboard will be map which automatically updates with the locations and incidents. This view should be as real-time as possible and will be used by the fire chiefs to assist with real-time decision-making on resource and equipment deployments.
In this presentation, we will leverage Databricks, Spark Structured Streaming, Delta Lake and the Azure platform to create this real-time delivery channel.
The Parquet Format and Performance Optimization OpportunitiesDatabricks
The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads.
As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general.
This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.
Data Warehouse Design and Best PracticesIvo Andreev
A data warehouse is a database designed for query and analysis rather than for transaction processing. An appropriate design leads to scalable, balanced and flexible architecture that is capable to meet both present and long-term future needs. This session covers a comparison of the main data warehouse architectures together with best practices for the logical and physical design that support staging, load and querying.
Continuous Data Replication into Cloud Storage with Oracle GoldenGateMichael Rainey
Continuous flow. Streaming. Near real-time. These are all terms used to identify the business’s need for quick access to data. It’s a common request, even if the data must flow from on-premises to the cloud. Oracle GoldenGate is the data replication solution built for fast data. In this session, we’ll look at how GoldenGate can be configured to extract transactions from the Oracle database and load them into a cloud object store, such as Amazon S3. There are many different use cases for this type of continuous load of data into the cloud. We’ll explore these solutions and the various tools that can be used to access and analyze the data from the cloud object store, leaving attendees with ideas for implementing a full source-to-cloud data replication solution.
Presented at ITOUG Tech Days 2019
Power BI Governance - Access Management, Recommendations and Best PracticesLearning SharePoint
This document outlines permissions management for Power BI Workspace and features of new admin, Member and Contributor Roles. Recommendations and best practices for sharing report are also included. Free to Download.
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
You may be familiar with the Presto plugin used to run fast interactive queries over Pulsar using ANSI SQL and can be joined with other data sources. This plugin will soon get a rename to align with the rename of the PrestoSQL project to Trino. What is the purpose of this rename and what does it mean for those using the Presto plugin? We cover the history of the community shift from PrestoDB to PrestoSQL, as well as, the future plans for the Pulsar community to donate this plugin to the Trino project. One of the connector maintainers will then demo the connector and show what is possible when using Trino and Pulsar!
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
Independent of the source of data, the integration and analysis of event streams gets more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. In this session we compare two popular Streaming Analytics solutions: Spark Streaming and Kafka Streams.
Spark is fast and general engine for large-scale data processing and has been designed to provide a more efficient alternative to Hadoop MapReduce. Spark Streaming brings Spark's language-integrated API to stream processing, letting you write streaming applications the same way you write batch jobs. It supports both Java and Scala.
Kafka Streams is the stream processing solution which is part of Kafka. It is provided as a Java library and by that can be easily integrated with any Java application.
This presentation shows how you can implement stream processing solutions with each of the two frameworks, discusses how they compare and highlights the differences and similarities.
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
Building a Real-Time Analytics Application with
Apache Pulsar and Apache Pinot
While the demands for real-time analytics are growing in leaps and bounds, the analytics software must rely on streaming platforms for ingesting high volumes of data that's traveling in lightning speed down the pipeline. We will take a look at 2 powerful open source Apache platforms: Pulsar and Pinot, that work hand-in-hand together to deliver the analytical results which bring great value to your systems.
Presenters: Mary Grygleski - Streaming Developer Advocate &
Mark Needham - Developer Relations Engineer at StarTree
Note: This webinar will be recorded and later posted on our Webinar page (https://altinity.com/webinarspage/) or Altinity official Youtube channel (https://www.youtube.com/@Altinity).
The Parquet Format and Performance Optimization OpportunitiesDatabricks
The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads.
As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general.
This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.
Data Warehouse Design and Best PracticesIvo Andreev
A data warehouse is a database designed for query and analysis rather than for transaction processing. An appropriate design leads to scalable, balanced and flexible architecture that is capable to meet both present and long-term future needs. This session covers a comparison of the main data warehouse architectures together with best practices for the logical and physical design that support staging, load and querying.
Continuous Data Replication into Cloud Storage with Oracle GoldenGateMichael Rainey
Continuous flow. Streaming. Near real-time. These are all terms used to identify the business’s need for quick access to data. It’s a common request, even if the data must flow from on-premises to the cloud. Oracle GoldenGate is the data replication solution built for fast data. In this session, we’ll look at how GoldenGate can be configured to extract transactions from the Oracle database and load them into a cloud object store, such as Amazon S3. There are many different use cases for this type of continuous load of data into the cloud. We’ll explore these solutions and the various tools that can be used to access and analyze the data from the cloud object store, leaving attendees with ideas for implementing a full source-to-cloud data replication solution.
Presented at ITOUG Tech Days 2019
Power BI Governance - Access Management, Recommendations and Best PracticesLearning SharePoint
This document outlines permissions management for Power BI Workspace and features of new admin, Member and Contributor Roles. Recommendations and best practices for sharing report are also included. Free to Download.
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
You may be familiar with the Presto plugin used to run fast interactive queries over Pulsar using ANSI SQL and can be joined with other data sources. This plugin will soon get a rename to align with the rename of the PrestoSQL project to Trino. What is the purpose of this rename and what does it mean for those using the Presto plugin? We cover the history of the community shift from PrestoDB to PrestoSQL, as well as, the future plans for the Pulsar community to donate this plugin to the Trino project. One of the connector maintainers will then demo the connector and show what is possible when using Trino and Pulsar!
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
Independent of the source of data, the integration and analysis of event streams gets more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. In this session we compare two popular Streaming Analytics solutions: Spark Streaming and Kafka Streams.
Spark is fast and general engine for large-scale data processing and has been designed to provide a more efficient alternative to Hadoop MapReduce. Spark Streaming brings Spark's language-integrated API to stream processing, letting you write streaming applications the same way you write batch jobs. It supports both Java and Scala.
Kafka Streams is the stream processing solution which is part of Kafka. It is provided as a Java library and by that can be easily integrated with any Java application.
This presentation shows how you can implement stream processing solutions with each of the two frameworks, discusses how they compare and highlights the differences and similarities.
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
Building a Real-Time Analytics Application with
Apache Pulsar and Apache Pinot
While the demands for real-time analytics are growing in leaps and bounds, the analytics software must rely on streaming platforms for ingesting high volumes of data that's traveling in lightning speed down the pipeline. We will take a look at 2 powerful open source Apache platforms: Pulsar and Pinot, that work hand-in-hand together to deliver the analytical results which bring great value to your systems.
Presenters: Mary Grygleski - Streaming Developer Advocate &
Mark Needham - Developer Relations Engineer at StarTree
Note: This webinar will be recorded and later posted on our Webinar page (https://altinity.com/webinarspage/) or Altinity official Youtube channel (https://www.youtube.com/@Altinity).
Organizations adopt different databases for big data which is huge in volume and have different data models. Querying big data is challenging yet crucial for any business. The data warehouses traditionally built with On-line Transaction Processing (OLTP) centric technologies must be modernized to scale to the ever-growing demand of data. With rapid change in requirements it is important to have near real time response from the big data gathered so that business decisions needed to address new challenges can be made in a timely manner. The main focus of our research is to improve the performance of query execution for big data.
Organizations adopt different databases for big data which is huge in volume and have different data models. Querying big data is challenging yet crucial for any business. The data warehouses traditionally built with On-line Transaction Processing (OLTP) centric technologies must be modernized to scale to the ever-growing demand of data. With rapid change in requirements it is important to have near real time response from the big data gathered so that business decisions needed to address new challenges can be made in a timely manner. The main focus of our research is to improve the performance of query execution for big data.
Produktbroschüre des Herstellers: METASUITE is the most powerful software solution that enables organizations to gain access to the information that is hidden in the large amounts of operational data that reside in their business applications.
http://www.Minerva-SoftCare.de
A Review of Data Access Optimization Techniques in a Distributed Database Man...Editor IJCATR
In today's computing world, accessing and managing data has become one of the most significant elements. Applications as
varied as weather satellite feedback to military operation details employ huge databases that store graphics images, texts and other
forms of data. The main concern in maintaining this information is to access them in an efficient manner. Database optimization
techniques have been derived to address this issue that may otherwise limit the performance of a database to an extent of vulnerability.
We therefore discuss the aspects of performance optimization related to data access in distributed databases. We further looked at the
effect of these optimization techniques
A Review of Data Access Optimization Techniques in a Distributed Database Man...Editor IJCATR
In today's computing world, accessing and managing data has become one of the most significant elements. Applications as
varied as weather satellite feedback to military operation details employ huge databases that store graphics images, texts and other
forms of data. The main concern in maintaining this information is to access them in an efficient manner. Database optimization
techniques have been derived to address this issue that may otherwise limit the performance of a database to an extent of vulnerability.
We therefore discuss the aspects of performance optimization related to data access in distributed databases. We further looked at the
effect of these optimization techniques.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
2. What is MSBI
• “This suite is composed of tools which helps in providing best solutions for Business
Intelligence Queries. These tools use Visual studio along with SQL server. It empower users
to gain access to accurate, up-to-date information for better decision making in an
organization. It offers different tools for different processes which are required in Business
Intelligence (BI) solutions.
• MSBI’s 3 Components:-
• SSIS – SQL Server Integration Services – Integration tool.
• SSAS – SQL Server Analytical Services -Analysis tool.
• SSRS – SQL Server Reporting Services – Reporting tool.
4. END USER TOOLS & PERFORMANCE MANAGEMENT APPS
Excel PerformancePoint Ser ver
BI PLATFORM
SQL Server
Reporting Services
SQL Server
Analysis Services
SQL Server DBMS
SQL Server Integration Services
SharePoint Ser ver
DELIVERY
Reports Dashboards Excel
Workbooks
Analytic
Views Scorecards Plans
5. BI Supported Platforms
Pervasive Insight
Dynamic Development
Beyond Relational
Enterprise Data Platform
Mobile and
Desktop
OLAP
FILE
XML
RDBMS
Entity Data Model
Services
Query
Analysis
Reporting Integration
Synch
Search
CloudServer
6. Highlights of MSBI Technologies
• MSBI solutions are built on an enterprise data platform, fully integrated with the tools you’re using
today to manage your IT operations and infrastructure
• By leverage that IT infrastructure in place today with SQL Server, you provide your users with the
trust in the information the demand, the integration they require, and the insight they need to
drive better business decisions.
• All done in an environment that is highly scalable and ready to meet the most demanding
requirements of thousands of users throughout your enterprise.
• Importantly, in a dynamic development environment that your IT department knows and uses
today, allowing them the rapidly develop, author, and publish key BI deliverables to the end users,
from reports, to OLAP cubes, to analytic models embedded in other applications that drive
increased insight and better business decisions.
• All on the Microsoft technology platform that you use and trust today.
9. SSIS is a component of SQL Server 2005/2008 and is the successor of DTS (Data
Transformation Services) which formed part of SQL Server 7.0/2000. It has segregated the
Data Flow Engine from the Control Flow Engine or SSIS Runtime Engine; designed to achieve
a high degree of parallelism and improve the overall performance.
SSIS Runtime Engine – The SSIS runtime engine handles the control flow of a package. It
saves the layout of packages, runs packages and provides support for logging, breakpoints,
configuration, connections and transactions. The run-time engine is a parallel control flow
engine that coordinates the execution of tasks or units of work within SSIS and manages the
engine threads that carry out those tasks.
The SSIS runtime engine executes the tasks inside a package in an orderly fashion. When the
runtime engine encounters a data flow task in a package during execution it creates a data
flow pipeline and lets that data flow task run in the pipeline.
10. SSIS Data Flow Engine/Pipeline – SSIS Data Flow Engine or Data Flow Pipeline or
Transformation pipeline engine manages the flow of data from data sources, through
transformations, and on to destination targets. When the Data Flow task executes, the SSIS
data flow engine extracts data from one or more data sources, performs any necessary
transformations on the extracted data and then delivers the data to one or more
destinations.
The Data flow engine is buffer oriented architecture (more details will be discussed in a later
section), it pulls data from the source and stores it in a buffer (memory structure) and does
the transformation in buffer/memory itself instead of processing on a row-by-row basis. The
benefit of this in-memory processing is that processing is much faster as there is no need to
physically copy/stage the data at each step of the data integration; the data flow engine
manipulates data as it is transferred from source to destination.
12. The diagram shows a typical Microsoft BI application architecture which has
different layers shown from left to right. On the left layer you have source
systems or a relational data warehouse, in the middle layer you have the Analysis
Services cube pulling data from the source systems and storing it in an Analysis
Services cube/OLAP store and on the right layer you have reporting applications
which consume the data from the Analysis Services cube/OLAP cube.
13. Query Parser
The Query Parser has an XMLA listener which accepts requests, parses
the request and passes it along to the Query Processor for query
execution.
Query Processor
Upon receiving the validated and parsed query from the Query Parser,
the Query Processor prepares an execution plan which dictates how the
requested results will be provided from the cube data and the
calculations used. The Query Processor caches the calculation results in
the formula engine cache (a.k.a Query Processor Cache) so it can be
reused across users with the same security permissions on subsequent
requests.
This summarizes the Query Processor operations:
Makes a request for sub cube data from storage engine
Translation of request into sub cube data requests
Produces result set by doing
Bulk calculation of sub cube
Cell-by-cell calculations
14. Stores calculation results in formula engine cache with varying scope
Query scope - cache will not be shared across queries in a session
Session scope - cache will be shared across queries in a session
Global scope - cache can be shared across sessions if the sessions have the
same security roles
Storage Engine
The Storage Engine responds to the sub cube data (a subset or logical unit of
data for querying, caching and data retrieval) request generated by the Query
Processor. It first checks if the requested sub cube data is already available in the
Storage Engine cache, if yes then it serves it from there. If not then it checks if
the aggregation is already available for the request, if yes then it takes the
aggregations from the aggregation store and caches it to the Storage Engine
cache and also sends it to Query Processor for serving the request. If not then it
grabs the detail data, calculates the required aggregations, caches it to the
Storage Engine and then sends it to Query Processor for serving the request.
This summarizes the Storage Engine operations:
Creates Attribute Store (Key store, relationship store, bitmap indexes etc)
Creates Hierarchy Store
Creates Aggregation Store
15. • Storage Engine Cache
• Loads data from storage engine cache as queries execute
• Clears data from storage engine cache with cleaner thread (in case of memory
pressure) or processing of partitions
• Aggregation Data
• Responds to request with aggregated values in storage
• If new then summarizes lower level aggregated values on the fly as needed
• Fact Data
• Scans MOLAP partitions and partitions segments in parallel
• Uses bitmap indexes to scan pages to find requested data
17. • Data mining is described as a process of discover or extracting interesting knowledge from
large amounts of data stored in multiple data sources such as file systems, databases, data
warehouses…etc. This knowledge contributes a lot of benefits to business strategies,
scientific, medical research, governments and individual.
• Business data is collected explosively every minute through business transactions and
stored in relational database systems. In order to provide insight about the business
processes, data warehouse systems have been built to provide analytical reports that help
business users to make decisions.
• Data is now stored in databases and/or data warehouse systems so should we design a
data mining system that decouples or couples with databases and data warehouse
systems? This question leads to four possible architectures of a data mining system as
follows:
18. • No-coupling: in this architecture, data mining system does not utilize any functionality of a
database or data warehouse system. A no-coupling data mining system retrieves data from a
particular data sources such as file system, processes data using major data mining algorithms and
stores results into file system. The no-coupling data mining architecture does not take any
advantages of database or data warehouse that is already very efficient in organizing, storing,
accessing and retrieving data. The no-coupling architecture is considered a poor architecture for
data mining system however it is used for simple data mining processes.
• Loose Coupling: in this architecture, data mining system uses database or data warehouse for data
retrieval. In loose coupling data mining architecture, data mining system retrieves data from
database or data warehouse, processes data using data mining algorithms and stores the result in
those systems. This architecture is mainly for memory-based data mining system that does not
require high scalability and high performance.
• Semi-tight Coupling: in semi-tight coupling data mining architecture, beside linking to database or
data warehouse system, data mining system uses several features of database ordata
warehouse systems to perform some data mining tasks including sorting, indexing,
aggregation…etc. In this architecture, some intermediate result can be stored in database or data
warehouse system for better performance.
19. • Tight Coupling: in tight coupling data mining architecture, database or data warehouse is
treated as an information retrieval component of data mining system using integration. All
the features of database or data warehouse are used to perform data mining tasks. This
architecture provides system scalability, high performance and integrated information.
20. SQL Server Reporting Services(SSRS)
What is SSRS?
Microsoft SSRS or Business Intelligence SSRS, lets you create very rich reports
(Tabular/Graphical/Interactive/free-form) from various data sources with rich data visualization (Charts, Maps,
spark lines). All these reports can be viewed via a web browsers. SSRS allows are reports to be exported in
various formats (Excel, PDF, word etc)SSRS allows reports to be delivered via emails or dropped to a share
location in an automated fashion.
SSRS Components:
• Report Server
• Report Builder
• Report Manager
24. Query Builder Reports
• The text-based query builder (default) provides a simple workspace for specifying a query and viewing
the results. You can specify multiple Transact-SQL statements, query or command syntax for custom data
processing extensions, and queries that are specified as expressions. Because the generic query
builder does not preprocess the query and can accommodate any kind of query syntax, it is the default
query builder tool for Report Designer.
• The graphical query builder provides a richer visual experience. It is used in Visual Studio and in other
parts of SQL Server. You can use the graphical query builder if you are not creating expressions or multi-
part SQL statements.
• To switch to the graphical query builder, toggle the Edit As Text button in the top left corner of the
window.
26. Drill Through and Drill Down Reports
DrillDown Reports
You can organize data in a variety of ways to show the relationship of the general to the
detailed. You can put all the data in the report, but set it to be hidden until a user clicks to
reveal details; this is a drilldown action.
DrillThrough Reports
You can display the data in a data region, such as a table or chart, which is nested inside
another data region, such as a table or matrix. You can display the data in a subreport that is
completely contained within a main report. Or, you can put the detail data in drillthrough
reports, separate reports that are displayed when a user clicks a link.