This document discusses distributed data warehouses and online analytical processing (OLAP). It begins by describing different data warehouse architectures like enterprise data warehouses, data marts, and distributed enterprise data warehouses. It then outlines challenges for achieving performance in distributed OLAP systems, including dynamically managing aggregates, using partial aggregates, allocating data and balancing loads. The document proposes techniques like redundancy and patchworking queries across sites to optimize distributed querying.
The document discusses topics related to data warehousing. It covers:
1. The key components involved in getting data into a data warehouse, which include extraction, transformation, cleansing, loading, and summarization of data.
2. An overview of the main components of a data warehouse architecture, including source data, data staging, data storage, information delivery, metadata management, and control components.
3. Various topics to be covered related to data warehousing, such as data marts, ERP, knowledge management, and customer relationship management.
As a C-level executive, you are always on the lookout for ways to reduce IT costs while increasing systems capability in order to grow sales and/or improve service. Leveraging applications with automated business processes that enable user centric interconnected applications embracing interfaces that are inherently enabled for smart devices with efficient systems represents a proven way to improve the business.
Data warehousing change in a challenging environmentDavid Walker
This white paper discusses the challenges of managing changes in a data warehousing environment. It describes a typical data warehouse architecture with source systems feeding data into a data warehouse and then into data marts or cubes. It also outlines the common processes involved like development, operations and data quality processes. The paper then discusses two major challenges - configuration/change management as there are frequent changes from source systems, applications and technologies that impact the data warehouse. The other challenge is managing and improving data quality as issues from source systems are often replicated in the data warehouse.
The document is a white paper about Cisco and Greenplum partnering to deliver high-performance Hadoop reference configurations. Key points:
- Cisco UCS and Greenplum's Greenplum MR provide an integrated Hadoop solution optimized for performance on Cisco UCS hardware.
- Greenplum MR is an Apache Hadoop distribution that offers high availability, real-time analytics, and direct data access.
- Cisco UCS is the exclusive hardware platform and provides a flexible, high-performance appliance for Hadoop workloads.
- Reference configurations include Cisco fabric interconnects, fabric extenders, and rack servers designed for scalability and simplified management.
This article discusses opportunities and challenges for efficient parallel data processing in cloud computing environments. It introduces Nephele, a new data processing framework designed specifically for clouds. Nephele is the first framework to leverage dynamic resource allocation in clouds for task scheduling and execution. The article analyzes how existing frameworks assume static resource environments unlike clouds, and how Nephele addresses this by dynamically allocating different compute resources during job execution. It then provides initial performance results for Nephele and compares it to Hadoop for MapReduce-style jobs on cloud infrastructure.
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference ...EMC
The document describes a partnership between Cisco and Greenplum to deliver optimized high-performance Hadoop reference configurations. Key elements include:
- Greenplum MR provides a high-performance distribution of Hadoop with features like direct data access, high availability, and advanced management.
- Cisco UCS is the exclusive hardware platform and provides a flexible, scalable computing platform optimized for Hadoop workloads.
- The Cisco Greenplum MR Reference Configuration combines these software and hardware components into an integrated solution for running Hadoop and big data analytics workloads.
IOUG93 - Technical Architecture for the Data Warehouse - PaperDavid Walker
This document provides an overview of the technical architecture for implementing a data warehouse. It discusses the key elements needed, including business analysis, database schema design, and project management. It then focuses on the technical architecture, describing the process of data acquisition which includes extraction, transformation, collation, migration, and loading of data from source systems into the data warehouse. It notes some considerations for each step, such as common data formats, network bandwidth, and handling exceptions during the loading process.
The document discusses topics related to data warehousing. It covers:
1. The key components involved in getting data into a data warehouse, which include extraction, transformation, cleansing, loading, and summarization of data.
2. An overview of the main components of a data warehouse architecture, including source data, data staging, data storage, information delivery, metadata management, and control components.
3. Various topics to be covered related to data warehousing, such as data marts, ERP, knowledge management, and customer relationship management.
As a C-level executive, you are always on the lookout for ways to reduce IT costs while increasing systems capability in order to grow sales and/or improve service. Leveraging applications with automated business processes that enable user centric interconnected applications embracing interfaces that are inherently enabled for smart devices with efficient systems represents a proven way to improve the business.
Data warehousing change in a challenging environmentDavid Walker
This white paper discusses the challenges of managing changes in a data warehousing environment. It describes a typical data warehouse architecture with source systems feeding data into a data warehouse and then into data marts or cubes. It also outlines the common processes involved like development, operations and data quality processes. The paper then discusses two major challenges - configuration/change management as there are frequent changes from source systems, applications and technologies that impact the data warehouse. The other challenge is managing and improving data quality as issues from source systems are often replicated in the data warehouse.
The document is a white paper about Cisco and Greenplum partnering to deliver high-performance Hadoop reference configurations. Key points:
- Cisco UCS and Greenplum's Greenplum MR provide an integrated Hadoop solution optimized for performance on Cisco UCS hardware.
- Greenplum MR is an Apache Hadoop distribution that offers high availability, real-time analytics, and direct data access.
- Cisco UCS is the exclusive hardware platform and provides a flexible, high-performance appliance for Hadoop workloads.
- Reference configurations include Cisco fabric interconnects, fabric extenders, and rack servers designed for scalability and simplified management.
This article discusses opportunities and challenges for efficient parallel data processing in cloud computing environments. It introduces Nephele, a new data processing framework designed specifically for clouds. Nephele is the first framework to leverage dynamic resource allocation in clouds for task scheduling and execution. The article analyzes how existing frameworks assume static resource environments unlike clouds, and how Nephele addresses this by dynamically allocating different compute resources during job execution. It then provides initial performance results for Nephele and compares it to Hadoop for MapReduce-style jobs on cloud infrastructure.
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference ...EMC
The document describes a partnership between Cisco and Greenplum to deliver optimized high-performance Hadoop reference configurations. Key elements include:
- Greenplum MR provides a high-performance distribution of Hadoop with features like direct data access, high availability, and advanced management.
- Cisco UCS is the exclusive hardware platform and provides a flexible, scalable computing platform optimized for Hadoop workloads.
- The Cisco Greenplum MR Reference Configuration combines these software and hardware components into an integrated solution for running Hadoop and big data analytics workloads.
IOUG93 - Technical Architecture for the Data Warehouse - PaperDavid Walker
This document provides an overview of the technical architecture for implementing a data warehouse. It discusses the key elements needed, including business analysis, database schema design, and project management. It then focuses on the technical architecture, describing the process of data acquisition which includes extraction, transformation, collation, migration, and loading of data from source systems into the data warehouse. It notes some considerations for each step, such as common data formats, network bandwidth, and handling exceptions during the loading process.
1) The document discusses big data analytics and introduces Greenplum, a massively parallel processing (MPP) database for big data analytics.
2) Greenplum allows for integrated analysis of structured and unstructured data at scale through its SQL database and Hadoop integration.
3) The architecture provides linear scalability, flexibility to handle various data types and schemas, and rich language support for analytics.
The document provides an overview of basic concepts related to data warehousing and online analytical processing (OLAP). It discusses the key components of a corporate information factory including the data warehouse, operational data store, data marts, and ETL processes. It also covers multidimensional data modeling concepts such as dimensions and cubes. The data warehouse is designed to integrate and store enterprise data to support strategic decision making across the organization.
The document discusses the purpose and history of data warehousing. It defines a data warehouse as a centralized, well-managed environment for storing high-value data from various sources. The data warehouse processes this data into a format optimized for analysis and information processing. The data warehouse has evolved from mainframe-based systems in the 1970s to today's cost-effective solutions embedded in software. A data warehouse is not defined by its size but by its functionality and ability to meet business objectives through consolidated, consistent data.
The document discusses the evolution of decision support systems from ad hoc reports generated for management to more sophisticated executive information systems. It defines a data warehouse as a subject-oriented collection of integrated and nonvolatile data used to support management decision making. The document outlines key characteristics of data warehouses, including how they integrate data from multiple sources and focus on historical data analysis over transaction processing. It contrasts online transaction processing (OLTP) with online analytical processing (OLAP) and describes how data warehouses are tuned for complex OLAP queries rather than transactional access.
This white paper discusses the need for differentiated architectures in today's data centers. It outlines Juniper's vision of evolving data centers to a simplified, cloud-ready state. This involves consolidating resources, simplifying networks through a 3-2-1 architecture, and making networks more scalable and efficient for modern applications through techniques like Virtual Chassis technology and a unified fabric. The paper contrasts needs for cost-effective IT data centers versus high-performance production data centers.
Microsoft® SQL Azure™ Database is a cloud-based relational database service built for Windows® Azure platform. It provides a highly available, scalable, multi-tenant database service hosted by Microsoft in the cloud. SQL Azure Database enables easy provisioning and deployment of multiple databases. Developers do not have to install, setup, patch or manage any software. High Availability and fault tolerance is built-in and no physical administration is required. SQL Azure supports Transact-SQL (T-SQL). Customers can leverage existing tools and knowledge in T-SQL based familiar relational
data model for building applications.
The document discusses the journey organizations take to establish trusted data through effective data management. It outlines common barriers to coordinating data initiatives and how the gap between IT and business needs can be closed. A maturity model is presented showing how organizations evolve their data practices from being IT-driven to enabling personalized customer experiences. The key is establishing repeatable processes through a single data management platform that provides data quality, integration and master data management capabilities.
CentreView is a solution that allows users to search across both electronic and physical records stored in different systems. It provides a consolidated search results list encompassing information from a Microsoft SharePoint repository and physical records stored in an organization's records centers. Users can then view and interact with electronic documents directly or request physical records. CentreView brings together disparate forms of information management typically addressed separately.
Collaboration by individuals, organizations, and communities with the right tools and resources is essential in achieving success with data science. Join us for a live demonstration of how you can leverage a data science platform, an open-source model, internal and external data, analytics tools, and visualization using Hadoop. See how unprecedented access to data scientists can deliver entirely new levels of insight to push the boundaries of what’s possible. Find out what you can do NOW to move your data science efforts forward.
The document discusses Dell's PowerEdge server portfolio and solutions for enterprise applications. It provides an overview of next-generation PowerEdge technologies including more processing power from Intel Xeon processors, high-capacity low-power memory, scalable efficient local storage, simplified intelligent management, and energy efficiency innovations. It also describes PowerEdge platforms for traditional and converged infrastructure and Dell's comprehensive enterprise solutions.
This document discusses maximizing returns from a data warehouse. It covers the need for real-time data integration to power business intelligence and enable timely, trusted decisions. It outlines challenges with traditional batch-based approaches and how Oracle's data integration solutions address these through products that enable real-time data capture and delivery, bulk data movement, and data quality profiling to build an enterprise data warehouse.
This white paper discusses Sun Microsystems' new virtualized network express module and blade server solution. It addresses ongoing customer needs to reduce datacenter costs related to power, cooling, management complexity and staffing. The solution aims to improve efficiency and lower costs by streamlining management, reducing cabling, improving energy efficiency, and providing a single-pane-of-glass management view.
Dear Students
Ingenious techno Solution offers an expertise guidance on you Final Year IEEE & Non- IEEE Projects on the following domain
JAVA
.NET
EMBEDDED SYSTEMS
ROBOTICS
MECHANICAL
MATLAB etc
For further details contact us:
enquiry@ingenioustech.in
044-42046028 or 8428302179.
Ingenious Techno Solution
#241/85, 4th floor
Rangarajapuram main road,
Kodambakkam (Power House)
http://www.ingenioustech.in/
During a period when various proposed solutions under consideration were either too expensive, too proprietary
or functionally inadequate, FTEL was contacted by DataCore and introduced to the SANsymphony™ advanced
storage networking and management software. Ian Batten, FTEL’s IT Director, explained, “The DataCore solution
appeared to offer many of the aspects missing from other options, such as block level snapshot, easier device
sharing, single point of administration, better caching and the prospect of interesting solutions to the backup
issue.” FTEL decided to evaluate SANsymphony utilizing commodity RAID devices for storage. With even
relatively low-end storage, the results were impressive enough that the solution moved forward into a
production environment
The document discusses the database environment and advantages of a database management system (DBMS). It describes how a DBMS provides a central repository of shared data that applications can access. This reduces data redundancy, improves data sharing and integrity, and increases development productivity compared to file-based data storage. The document provides examples of database applications from personal to enterprise-wide and outlines the typical components involved, from CASE tools to end users.
Our data services are based on the importance of the data (its category) and not just on its volume. This way we can build cost effective data
management
solutions with you.
Savings of up to 35% typically achievable against in-house options.
This document summarizes Intechnology's multi-tier data management services. It offers managed replication for mission-critical tier 1 data, managed backup for critical tier 2 data, and managed archiving for legacy tier 3 data. By automatically matching different types of data to the appropriate storage tier and service, Intechnology helps customers reduce costs, improve data access and retention, and scale storage capacity as needed. The multi-tier approach provides business benefits like lower expenses, improved performance and security, and reduced infrastructure management burdens and carbon footprint.
This document discusses goals, trends, and complexities of supporting an integrated student data warehouse. It covers project visions and baselines, Oracle Streams concepts for feeding data into the data warehouse, metadata, configuration issues, validation steps, and the importance of securing and monitoring the data stream for quality and compliance. Technical challenges include testing changes and their impact on the different systems.
Juniper Networks' QFabric is an innovative data center fabric that provides a flattened, single-tier network architecture with any-to-any connectivity between devices. This allows for rapid deployment of services by eliminating bottlenecks and simplifying network management. QFabric also improves cost efficiency by reducing complexity, scaling more easily, and lowering power consumption and space needs compared to traditional hierarchical network designs. The document examines the business benefits of QFabric, such as rapid service provisioning, lower costs, increased efficiency, and improved resiliency and security for data center networks.
Happy new year celebration style 1 powerpoint templatesSlideTeam.net
The document is a PowerPoint template for a "Happy New Year - Style 1" presentation. It includes diagrams, icons, and text boxes that can be edited by the user. The templates allow users to bring presentations to life, amaze audiences, and pitch ideas convincingly. All images in the templates are 100% editable in PowerPoint.
The most amazing moment with the full of happiness and enjoyment, the savory and brand New Year 2016 is here to come whose celebration will starts from the New Year’s Eve which is December 31st, 2015 and people will welcome the New Year 2016 and by keeping this in view, before anyone send you New Year Quotes Wishes 2016,
"At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, 'so how do I know I can trust this information?'. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons."
Tim Berners-Lee, W3C Chair, Web Design Issues, September 1997
Provenance is focused on the description and understanding of where and how data is produced, the actors involved in the production of such data, and the processes by which the data was manipulated and transformed until it arrived to the collection from which it is being accessed. Provenance aims at providing the ability to trace the sources of data, enabling the exploration not just of the relationships between datasets, but also of their authors and affiliations, with the goal of preserving data ownership and establishing a notion of trust based on authenticity and reliability.
The Future Internet poses important challenges for provenance, derived from complex and rich scenarios characterized by the presence of large amounts of data stemming from heterogeneous sources like user communities, services, and things. Such challenges span across technical but also socioeconomic dimensions. The former includes aspects like vocabularies for representing provenance, interoperability and scalability issues, and means to produce, acquire, and reason with provenance in order to provide measures of trust and information quality. However, it is probably in the socieconomic dimension where more significant efforts need to be made as to addressing issues like the role of provenance in the overall picture of the Future Internet, entry barriers preventing the generation of provenance-aware internet content, means required to incentivate the production of such content, and ways to prevent provenance forgery.
In this talk, we provide and overview on provenance and the above mentioned challenges and introduce ongoing work in order to address trust issues from the provenance perspective in the Future Internet. We also link provenance to other relevant aspects for trust discussed in the session, like security, legal frameworks, and economics.
1) The document discusses big data analytics and introduces Greenplum, a massively parallel processing (MPP) database for big data analytics.
2) Greenplum allows for integrated analysis of structured and unstructured data at scale through its SQL database and Hadoop integration.
3) The architecture provides linear scalability, flexibility to handle various data types and schemas, and rich language support for analytics.
The document provides an overview of basic concepts related to data warehousing and online analytical processing (OLAP). It discusses the key components of a corporate information factory including the data warehouse, operational data store, data marts, and ETL processes. It also covers multidimensional data modeling concepts such as dimensions and cubes. The data warehouse is designed to integrate and store enterprise data to support strategic decision making across the organization.
The document discusses the purpose and history of data warehousing. It defines a data warehouse as a centralized, well-managed environment for storing high-value data from various sources. The data warehouse processes this data into a format optimized for analysis and information processing. The data warehouse has evolved from mainframe-based systems in the 1970s to today's cost-effective solutions embedded in software. A data warehouse is not defined by its size but by its functionality and ability to meet business objectives through consolidated, consistent data.
The document discusses the evolution of decision support systems from ad hoc reports generated for management to more sophisticated executive information systems. It defines a data warehouse as a subject-oriented collection of integrated and nonvolatile data used to support management decision making. The document outlines key characteristics of data warehouses, including how they integrate data from multiple sources and focus on historical data analysis over transaction processing. It contrasts online transaction processing (OLTP) with online analytical processing (OLAP) and describes how data warehouses are tuned for complex OLAP queries rather than transactional access.
This white paper discusses the need for differentiated architectures in today's data centers. It outlines Juniper's vision of evolving data centers to a simplified, cloud-ready state. This involves consolidating resources, simplifying networks through a 3-2-1 architecture, and making networks more scalable and efficient for modern applications through techniques like Virtual Chassis technology and a unified fabric. The paper contrasts needs for cost-effective IT data centers versus high-performance production data centers.
Microsoft® SQL Azure™ Database is a cloud-based relational database service built for Windows® Azure platform. It provides a highly available, scalable, multi-tenant database service hosted by Microsoft in the cloud. SQL Azure Database enables easy provisioning and deployment of multiple databases. Developers do not have to install, setup, patch or manage any software. High Availability and fault tolerance is built-in and no physical administration is required. SQL Azure supports Transact-SQL (T-SQL). Customers can leverage existing tools and knowledge in T-SQL based familiar relational
data model for building applications.
The document discusses the journey organizations take to establish trusted data through effective data management. It outlines common barriers to coordinating data initiatives and how the gap between IT and business needs can be closed. A maturity model is presented showing how organizations evolve their data practices from being IT-driven to enabling personalized customer experiences. The key is establishing repeatable processes through a single data management platform that provides data quality, integration and master data management capabilities.
CentreView is a solution that allows users to search across both electronic and physical records stored in different systems. It provides a consolidated search results list encompassing information from a Microsoft SharePoint repository and physical records stored in an organization's records centers. Users can then view and interact with electronic documents directly or request physical records. CentreView brings together disparate forms of information management typically addressed separately.
Collaboration by individuals, organizations, and communities with the right tools and resources is essential in achieving success with data science. Join us for a live demonstration of how you can leverage a data science platform, an open-source model, internal and external data, analytics tools, and visualization using Hadoop. See how unprecedented access to data scientists can deliver entirely new levels of insight to push the boundaries of what’s possible. Find out what you can do NOW to move your data science efforts forward.
The document discusses Dell's PowerEdge server portfolio and solutions for enterprise applications. It provides an overview of next-generation PowerEdge technologies including more processing power from Intel Xeon processors, high-capacity low-power memory, scalable efficient local storage, simplified intelligent management, and energy efficiency innovations. It also describes PowerEdge platforms for traditional and converged infrastructure and Dell's comprehensive enterprise solutions.
This document discusses maximizing returns from a data warehouse. It covers the need for real-time data integration to power business intelligence and enable timely, trusted decisions. It outlines challenges with traditional batch-based approaches and how Oracle's data integration solutions address these through products that enable real-time data capture and delivery, bulk data movement, and data quality profiling to build an enterprise data warehouse.
This white paper discusses Sun Microsystems' new virtualized network express module and blade server solution. It addresses ongoing customer needs to reduce datacenter costs related to power, cooling, management complexity and staffing. The solution aims to improve efficiency and lower costs by streamlining management, reducing cabling, improving energy efficiency, and providing a single-pane-of-glass management view.
Dear Students
Ingenious techno Solution offers an expertise guidance on you Final Year IEEE & Non- IEEE Projects on the following domain
JAVA
.NET
EMBEDDED SYSTEMS
ROBOTICS
MECHANICAL
MATLAB etc
For further details contact us:
enquiry@ingenioustech.in
044-42046028 or 8428302179.
Ingenious Techno Solution
#241/85, 4th floor
Rangarajapuram main road,
Kodambakkam (Power House)
http://www.ingenioustech.in/
During a period when various proposed solutions under consideration were either too expensive, too proprietary
or functionally inadequate, FTEL was contacted by DataCore and introduced to the SANsymphony™ advanced
storage networking and management software. Ian Batten, FTEL’s IT Director, explained, “The DataCore solution
appeared to offer many of the aspects missing from other options, such as block level snapshot, easier device
sharing, single point of administration, better caching and the prospect of interesting solutions to the backup
issue.” FTEL decided to evaluate SANsymphony utilizing commodity RAID devices for storage. With even
relatively low-end storage, the results were impressive enough that the solution moved forward into a
production environment
The document discusses the database environment and advantages of a database management system (DBMS). It describes how a DBMS provides a central repository of shared data that applications can access. This reduces data redundancy, improves data sharing and integrity, and increases development productivity compared to file-based data storage. The document provides examples of database applications from personal to enterprise-wide and outlines the typical components involved, from CASE tools to end users.
Our data services are based on the importance of the data (its category) and not just on its volume. This way we can build cost effective data
management
solutions with you.
Savings of up to 35% typically achievable against in-house options.
This document summarizes Intechnology's multi-tier data management services. It offers managed replication for mission-critical tier 1 data, managed backup for critical tier 2 data, and managed archiving for legacy tier 3 data. By automatically matching different types of data to the appropriate storage tier and service, Intechnology helps customers reduce costs, improve data access and retention, and scale storage capacity as needed. The multi-tier approach provides business benefits like lower expenses, improved performance and security, and reduced infrastructure management burdens and carbon footprint.
This document discusses goals, trends, and complexities of supporting an integrated student data warehouse. It covers project visions and baselines, Oracle Streams concepts for feeding data into the data warehouse, metadata, configuration issues, validation steps, and the importance of securing and monitoring the data stream for quality and compliance. Technical challenges include testing changes and their impact on the different systems.
Juniper Networks' QFabric is an innovative data center fabric that provides a flattened, single-tier network architecture with any-to-any connectivity between devices. This allows for rapid deployment of services by eliminating bottlenecks and simplifying network management. QFabric also improves cost efficiency by reducing complexity, scaling more easily, and lowering power consumption and space needs compared to traditional hierarchical network designs. The document examines the business benefits of QFabric, such as rapid service provisioning, lower costs, increased efficiency, and improved resiliency and security for data center networks.
Happy new year celebration style 1 powerpoint templatesSlideTeam.net
The document is a PowerPoint template for a "Happy New Year - Style 1" presentation. It includes diagrams, icons, and text boxes that can be edited by the user. The templates allow users to bring presentations to life, amaze audiences, and pitch ideas convincingly. All images in the templates are 100% editable in PowerPoint.
The most amazing moment with the full of happiness and enjoyment, the savory and brand New Year 2016 is here to come whose celebration will starts from the New Year’s Eve which is December 31st, 2015 and people will welcome the New Year 2016 and by keeping this in view, before anyone send you New Year Quotes Wishes 2016,
"At the toolbar (menu, whatever) associated with a document there is a button marked "Oh, yeah?". You press it when you lose that feeling of trust. It says to the Web, 'so how do I know I can trust this information?'. The software then goes directly or indirectly back to metainformation about the document, which suggests a number of reasons."
Tim Berners-Lee, W3C Chair, Web Design Issues, September 1997
Provenance is focused on the description and understanding of where and how data is produced, the actors involved in the production of such data, and the processes by which the data was manipulated and transformed until it arrived to the collection from which it is being accessed. Provenance aims at providing the ability to trace the sources of data, enabling the exploration not just of the relationships between datasets, but also of their authors and affiliations, with the goal of preserving data ownership and establishing a notion of trust based on authenticity and reliability.
The Future Internet poses important challenges for provenance, derived from complex and rich scenarios characterized by the presence of large amounts of data stemming from heterogeneous sources like user communities, services, and things. Such challenges span across technical but also socioeconomic dimensions. The former includes aspects like vocabularies for representing provenance, interoperability and scalability issues, and means to produce, acquire, and reason with provenance in order to provide measures of trust and information quality. However, it is probably in the socieconomic dimension where more significant efforts need to be made as to addressing issues like the role of provenance in the overall picture of the Future Internet, entry barriers preventing the generation of provenance-aware internet content, means required to incentivate the production of such content, and ways to prevent provenance forgery.
In this talk, we provide and overview on provenance and the above mentioned challenges and introduce ongoing work in order to address trust issues from the provenance perspective in the Future Internet. We also link provenance to other relevant aspects for trust discussed in the session, like security, legal frameworks, and economics.
David Evans has experience training and grooming dogs as well as caring for horses. He has worked as a dog trainer, dog groomer, and horse caretaker. Evans has also assisted with an equine therapy program and supervised middle school campers at a summer camp. He is looking to earn a veterinary technician assistant certificate.
This document provides a template to record the mass in grams and volume in mililitres of various classroom objects such as crayons, rubbers, a stone, and a glass of water. Students are instructed to find and record the mass and volume of different objects in their classroom to complete this table.
This document discusses next challenges for semantic technologies in corporate knowledge management. It outlines how semantic applications can help manage knowledge as the most important corporate asset by explicitly representing concepts, properties, and relations in ontologies. Key challenges include using semantics to support open innovation, managing corporate knowledge, and optimizing enterprise processes.
The Illinois Health Information Exchange (ILHIE) has just released its Consumer Education Health IT Toolkit. It was developed to provide healthcare professionals with simple and informative educational material to share with their patients. Consumer education and engagement are cornerstones for the implementation of a successful state health IT program.
This document provides a summary of Frank L. Robinson's career experience including roles as a Solutions Architect, Field Application Engineer, Specialist in Infrastructure Engineering, Technical Lead for Storage and Backup Administration, Storage Solutions Architect, and Storage Administrator. He has over 15 years of experience working with various storage vendors such as EMC, HDS, HP, and Network Appliance on infrastructure projects for clients such as JPMorgan Chase, Nationwide Insurance, Honda, IBM, and Bank of America.
Tema actualizado en
https://es.slideshare.net/rafalinares/produccin-audiovisual-cine-tema-2-el-plan-de-produccin-actualizado
Diapositivas referentes al tema 2 de la asignatura de Producción Audiovisual Cine de la URJC para los grados de CAV y CAV-PER
Este documento presenta varios triángulos rectángulos notables con sus lados y ángulos medidos en unidades de k. Se muestran las medidas de los lados opuestos a los ángulos de 30°, 45°, 60° y otros ángulos comunes, así como fórmulas para calcular los lados en función de k.
The document outlines a 3-step process for developing coursework project ideas. Students are instructed to: 1) Come up with 4 initial ideas independently; 2) Write 2 paragraph synopses for each idea describing the plot and opening titles; 3) Present their favorite idea to the group as a short verbal pitch and receive feedback to help refine their 4 final ideas.
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse EMC
This EMC perspective provides an overview of the EMC Data Warehouse Modernization offering. It describes four tactics that can be implemented quickly, using an organization's existing skill sets, and rapidly show a return on investment.
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...IJCERT JOURNAL
NOSQL is a database provides a mechanism for storage and retrieval of data that is modeled for huge amount of data which is used in big data and Cloud Computing . NOSQL systems are also called "Not only SQL" to emphasize that they may support SQL-like query languages. A basic classification of NOSQL is based on data model; they are like column, Document, Key-Value etc. The objective of this paper is to study and compare the implantation of various column oriented data stores like Bigtable, Cassandra.
This document provides an introduction to NoSQL databases. It discusses that NoSQL databases are non-relational, do not require a fixed table schema, and do not require SQL for data manipulation. It also covers characteristics of NoSQL such as not using SQL for queries, partitioning data across machines so JOINs cannot be used, and following the CAP theorem. Common classifications of NoSQL databases are also summarized such as key-value stores, document stores, and graph databases. Popular NoSQL products including Dynamo, BigTable, MongoDB, and Cassandra are also briefly mentioned.
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.[1] DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for knowledge workers throughout the enterprise.
The Pivotal Business Data Lake provides a flexible blueprint to meet your business's future information and analytics needs while avoiding the pitfalls of typical EDW implementations. Pivotal’s products will help you overcome challenges like reconciling corporate and local needs, providing real-time access to all types of data, integrating data from multiple sources and in multiple formats, and supporting ad hoc analysis.
This document provides an overview of key concepts in data warehousing architecture. It discusses how a data warehouse is an architecture, not a product, and describes some of the core components of a data warehouse system architecture including databases, applications, connectivity, interfaces, and time-series data. It emphasizes that the data warehouse architecture aligns dimensions like customers, products, time and location with the business. The document also discusses concepts like star schemas, metadata, aggregation, OLAP, and how a data warehouse supports strategic business goals like brand development and cross-selling.
This document discusses key concepts in data warehousing and modeling. It describes a multitier architecture for data warehousing consisting of a bottom tier warehouse database, middle tier OLAP server, and top tier front-end client tools. It also discusses different data warehouse models including enterprise warehouses, data marts, and virtual warehouses. The document outlines the extraction, transformation, and loading process used to populate data warehouses and the role of metadata repositories.
This document outlines the key issues in data warehousing and online analytical processing (OLAP) in e-business. It defines data warehousing as a centralized repository that extracts and integrates relevant information from multiple sources to support decision making. OLAP enables multi-dimensional analysis of data stored in a database. The problem is that operational databases are not optimized for analytical reporting. This leads to slow response times for reports needed by top management. The research aims to develop a tool to convert operational databases into a star schema structure for faster multi-dimensional analysis and reporting.
Challenges Management and Opportunities of Cloud DBAinventy
Research Inventy provides an outlet for research findings and reviews in areas of Engineering, Computer Science found to be relevant for national and international development, Research Inventy is an open access, peer reviewed international journal with a primary objective to provide research and applications related to Engineering. In its publications, to stimulate new research ideas and foster practical application from the research findings. The journal publishes original research of such high quality as to attract contributions from the relevant local and international communities.
A Computer database is a collection of logically related data that is stored in a computer system,so that a computer program or person using a query language can use it to answer queries. An operational database (OLTP) contains up-to-date, modifiable application specific data. A data warehouse (OLAP) is a subject-oriented, integrated, time-variant and non-volatile collection of data used to make business decisions. Hadoop Distributed File System (HDFS) allows storing large amount of data on a cloud of
machines. In this paper, we surveyed the literature related to operational databases, data warehouse and hadoop technology.
This document provides an overview of open source data warehousing and business intelligence (DW/BI). It defines cloud computing and explains how open DW consists of pre-designed data warehouse architectures that are free to use. Open DW reduces costs and risks by shortening design and development time. While the architectures are free, vendors charge for services like customization, support, and maintenance. The document discusses the need for and benefits of open DW/BI, including faster deployment, lower costs, and mitigated risks through rapid development. It also outlines some popular open source databases, tools, and vendors in this space.
LEGO has embraced change by combining business intelligence (BI) with a flexible information system. The database plays a key role in SAP's three-tier architecture by storing data on LEGO's products, operations, supply chain, and employees. Distributed databases improve performance and flexibility by allowing data to be stored across multiple computers and accessed worldwide. SAP's business software includes BI features like supply chain management, product lifecycle management, and enterprise resource planning. While distributed databases provide advantages like data access from various locations, they also involve additional complexity and overhead.
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEMmyteratak
Lego implemented SAP's three-tier client-server system with a flexible IT infrastructure to help management better forecast and plan. The system includes a presentation layer, application layer, and database layer. It allows distributed access to the database from different locations. Some key business intelligence features in SAP's suite include tools for consolidating, analyzing, and providing access to vast amounts of data to help users make better decisions. While a distributed architecture with multiple databases improves scalability, fault tolerance, and workload distribution, it also increases security risks, requires more effort to ensure data quality and integrity, and has higher maintenance costs.
The document discusses current trends in database management. It describes how databases are increasingly bridging SQL and NoSQL structures to provide the capabilities of both. It also discusses how databases are moving to the cloud/Platform as a Service models and how automation is emerging to simplify database management tasks. The document emphasizes that security must remain a focus as well, with database administrators working closely with security teams to protect enterprise data from both external and internal threats.
Megastore providing scalable, highly available storage for interactive servicesJoão Gabriel Lima
The document describes Megastore, a storage system developed by Google to meet the requirements of interactive online services. Megastore blends the scalability of NoSQL databases with the features of relational databases. It uses partitioning and synchronous replication across datacenters using Paxos to provide strong consistency and high availability. Megastore has been widely deployed at Google to handle billions of transactions daily storing nearly a petabyte of data across global datacenters.
The document describes Megastore, a storage system developed by Google to meet the requirements of interactive online services. Megastore blends the scalability of NoSQL databases with the features of relational databases. It uses partitioning and synchronous replication across datacenters using Paxos to provide strong consistency and high availability. Megastore has been widely deployed at Google to handle billions of transactions daily storing nearly a petabyte of data across global datacenters.
This document provides a sector roadmap for cloud analytic databases in 2017. It discusses key topics such as usage scenarios, disruption vectors, and an analysis of companies in the sector. Some main points:
- Cloud databases can now be considered the default option for most selections in 2017 due to economics and functionality.
- Several newer cloud-native offerings have been able to leapfrog more established databases through tight integration of cloud features like elasticity and separation of compute and storage.
- While traditional database functionality is still required, cloud dynamics are causing needs for capabilities like robust SQL support, diverse data support, and dynamic environment adaptation.
- Vendor solutions are evaluated on disruption vectors including SQL support, optimization, elasticity, environment
1. A data lake is a storage repository that holds vast amounts of raw data in its native format until it is needed for analysis. It addresses challenges of big data by allowing data to be stored and analyzed together without upfront structuring.
2. Traditional data warehouses structure data upfront, limiting flexibility. A data lake avoids this by storing all data as-is and analyzing data when questions arise. This provides greater analytic power on emerging big data sources.
3. While data lakes provide benefits like reduced costs and more flexibility, challenges remain around metadata management, governance, preparation, and security when storing all raw data in one place. Effective solutions are needed for these challenges to realize the full potential of data lakes.
AtomicDB is a proprietary software technology that uses an n-dimensional associative memory system instead of a traditional table-based database. This allows information to be stored and related in a way analogous to human memory. The technology does not require extensive programming and can rapidly build and modify information systems to meet evolving needs. It provides significant cost and performance advantages over traditional databases for managing complex, relational data.
Conspectus data warehousing appliances – fad or futureDavid Walker
Data warehousing appliances aim to simplify and accelerate the process of extracting, transforming, and loading data from multiple source systems into a dedicated database for analysis. Traditional data warehousing systems are complex and expensive to implement and maintain over time as data volumes increase. Data warehousing appliances use commodity hardware and specialized database engines to radically reduce data loading times, improve query performance, and simplify administration. While appliances introduce new challenges around proprietary technologies and credibility of performance claims, organizations that have implemented them report major gains in query speed and storage efficiency with reduced support costs. As more vendors enter the market, appliances are poised to become a key part of many organizations' data warehousing strategies.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
1 ieee98
1. 1 Introduction
ON-LINE ANALYTICAL PROCESSING IN DISTRIBUTED DATA WAREHOUSES
Jens Albrecht, Wolfgang Lehner
University of Erlangen-Nuremberg, Germany
{jalbrecht, lehner}@informatik.uni-erlangen.de
Abstract Therefore, many companies today decide to start with
smaller, flexible data marts dedicated to specific business
The concepts of ‘Data Warehousing’ and ‘On-line areas. In order to get the possibilities for cross-functional
Analytical Processing’ have seen a growing interest in the analysis, there are two possibilities. The first one is to cre-
research and commercial product community. Today, the ate again a centralized data warehouse for only cross-func-
trend moves away from complex centralized data ware- tional summary data. The other one is to integrate the data
houses to distributed data marts integrated in a common marts into a common conceptual schema and therefore cre-
conceptual schema. However, as the first part of this paper ate a distributed data warehouse.
demonstrates, there are many problems and little solutions
We will concentrate on the second alternative which
for large distributed decision support systems in worldwide
allows more flexible querying. To the user a distributed
operating corporations. After showing the benefits and
data warehouse should behave exactly like a centralized
problems of the distributed approach, this paper outlines
data warehouse. For transparent and efficient OLAP on a
possibilities for achieving performance in distributed on-
distributed data warehouse several problems need to be
line analytical processing. Finally, the architectural frame-
solved. The intention of this article is not to present final
work of the prototypical distributed OLAP system
solutions but to show the potential of the distributed
CUBESTAR is outlined.
approach and the challenges for researchers and software
vendors as well.
1 Introduction
In the following section we will characterize different
Today’s global economy has placed a premium on data warehouse architectures and their inherent drawbacks.
information, because in a dynamic market environment After showing the benefits of a distributed approach to data
with many competitors it is crucial for an enterprise to have warehousing we show the problems resulting for distrib-
on-line information about its general business figures as uted OLAP systems. Performance as the main problem is
well as detailed information on specific topics to be able to the issue of section 3. Finally, in section 4 we give an over-
make the right decisions at the right time. That’s why today view over the prototypical distributed OLAP system
almost all big companies are trying to build data ware- CUBESTAR.
houses. In contrast to former management information sys-
tems based mostly on operational data, data warehouses 2 Data Warehouse Architectures
contain integrated and non-volatile data and provide there-
fore a consistent basis for organizational decision making. According to [Inmo92], a data warehouse is a “sub-
In addition to classical reporting, the main application of ject-oriented, integrated, time-varying, non-volatile collec-
data warehouses is On-line Analytical Processing (OLAP, tion of corporate data”. Since that definition is very
[CoCS93]), i.e. the interactive exploration of the data. The generic, we will now give an overview of three types of
data warehouse promise is getting accurate business infor- data warehouse architectures, partly based on [WeCa96].
mation fast.
But the ultimate goal of data warehousing and OLAP 2.1 Enterprise Data Warehouses
goes even further, the vision is to “put a crystal ball on
What most people think of when they use the term
every desktop”. Today that ideal is far from being reality.
data warehouse is the enterprise or corporate data ware-
Because first, to build a single centralized data warehouse
house, where all business data is stored in a single database
serving many different user groups takes a long time. Setup
with a single corporate data model (figure 1). Enterprise
and maintenance are very expensive. All this contributes to
data warehouses are created to provide knowledge workers
a relatively inflexible architecture. Second, local access
with consistent and integrated data from all major business
behavior is considered in the data warehouse design neither
areas. They offer the possibility to correlate information
on the conceptual nor on the internal but only on the exter-
across the enterprise.
nal layer.
2. Data Warehouse Architectures 2.2
But, because of the huge sub-
ject area (the corporation) and the
vast amount of different opera-
tional sources that need to be inte-
grated, enterprise data warehouses OLAP Distributed OLAP Middleware
are very difficult and time-con- Server
suming to implement. Too many
Global
different data sources and too Schema
many different user requirements Centralized Asian
DW Amer. Service Sales
are likely reasons for the failure of Sales All Logistics
Service
the whole data warehouse project.
Europe America Asia
Since the data warehousing
setup strategy from the analysts Fig. 1: Enterprise Fig. 3: Distributed Enterprise Data
point of view is, as Inmon puts it, Data Warehouse Warehouse
“give me what I want and I will tell
you what I really want” [Inmo92], an incremental approach The benefits of this divide-and-conquer approach are clear.
has proven to be the right choice. In the distributed enterprise data warehouse the individual
data marts can be created and managed flexibly.
2.2 Data Marts The flexibility for the individual data marts is traded
for additional complexity in the management of the global
The term data mart stands for schema and query processing across several data marts.
the idea of a highly-focused ver- Table 2.1 compares some characteristics to the centralized
sion of a data warehouse (figure data warehouse.
2). There is a need to distinguish
data marts which are local OLAP OLAP Characteristics Centralized Distributed DW
extracts from an enterprise data Server Server
DW Local Area Wide Area
warehouse and data marts which
are serving as the data warehouse Conceptual Flexibility low low moderate
for a small business group with Data Distribution none moderate high
Sales
well defined needs. In the latter Service Logistics Query Processing easy moderate complex
case, which is the definition we Communication Costs low moderate high
will use in this paper, the limita- Asia Security Requirements. none low low
tion in size and scope of the data Local Maintenance hard easy easy
warehouse, e.g. only sales or mar- Fig. 2: Data Marts
Global Maintenance n/a easy moderate
keting, dramatically reduces setup
costs. Data marts can be deployed in weeks or months. That Tab. 2.1 Characteristics of Centralized and
is why most of the current data warehousing projects Distributed Data Warehouses
include data marts.
2.3 Distributed Enterprise Data Warehouses There are basically two scenarios for a distributed
enterprise data warehouse. The local area scenario (e.g. the
Although the data mart idea is very attractive and data marts of only the Asian division) characterizes what
potentially offers the opportunity to build a corporate wide first comes into mind. The extension of this approach to a
data warehouse from the bottom up, the benefits of data wide area scenario (the global setting depicted in figure 3)
marts can easily be outweighed if there is no corporate- is possible, but the high degree of data distribution results
wide data warehouse strategy. If data mart design is not in much more complexity for query processing since com-
done carefully, independent islands of information are cre- munication costs become a major factor. The more global
ated and cross-functional analysis, as in the enterprise data the distributed data warehouse is, the more important is
warehouse, is not possible. increasing local access by replication. On the contrary, the
more local the scenario is, the more likely techniques for
Therefore, the pragmatic approach to combine the vir- load balancing in a shared-nothing server cluster can be
tues of the former is to integrate the departmental data applied.
marts into a common schema. The result is a distributed
enterprise data warehouse (figure 3), also called federated
data warehouse [WeCa96] or enterprise data mart [Info97].
3. 2.4 Discussion
2.4 Discussion Site A Site C
The current trend in the creation of data warehouses
follows the bottom up approach where many data marts are
created with a later integration into a distributed enterprise
data warehouse in mind. The Ovum report [WeCa96] pre- Site B
dicts that centralized data warehouses will become increas-
ingly unpopular due to their inherent drawbacks and thus
Query Result
be replaced by a set of interoperating data marts.
The commercial product community seems to confirm
Complete data cube
the prediction. A first step towards easy integration was
Selected canditate aggregates
recently done by the Informatica Corp. with the release of
its tools for enterprise data mart support. These tools are Other possible candidates
proposed to easily allow the creation and maintenance of
Fig. 4: Patchworking algorithm
distributed data marts and a seamless integration into a
common schema. Moreover, query tools can directly
every minute. If user behavior is considered at all, only a fix
access the metadata by providing a metadata exchange
number of queries is taken as the basis for the aggregation
API. OLAP tool vendors like Microstrategy Inc. already
algorithms ([GHRU97], [Gupt97], [BaPT97]). On the
announced to support this feature. For querying, “it means
commercial product side there are OLAP tools like Micros-
that all data marts are working off the same sets of data def-
trategy’s DSS Tools or Informix’ Metacube which are able
initions and business rules allowing users to query aggre-
to give the DBA hints for the creation of aggregates based
gate data that may be distributed across multiple disparate
on monitored user behavior. However, if the DBA decides
data marts” [Ecke97]. Companies like Cognos [Cogn97]
to create aggregates, these are static and a change in the
and Hewlett Packard [HP97] also offer already rudimental
user behavior is not reflected.
support for distributed data warehouses.
Today, aggregates are more or less seen as another
But to our knowledge, no query tool today is able to
multidimensional access method, similar to an index, that
transparently and performantly generate and execute que-
must be managed by the DBA. But only a self-maintaining,
ries in distributed setting. An intelligent middleware layer
dynamic aggregate management working like a (distrib-
as the basis for traffic routing, aggregate caching and load
uted) system buffer or cache for aggregates offers the
balancing is needed. This is the problem we will focus on
needed flexibility and the potential for high query response
in the remaining sections.
times and low human maintenance efforts. The selection
and the usage of aggregates should be completely transpar-
3 Achieving Performance in Distributed ent to the user. The DBA should only be able to setup
OLAP Systems parameters for the creation of but not the content of aggre-
gates.
In order to deserve the attribute on-line, queries
should never run much longer than a minute. In this section Partial Aggregates. The algorithms for the selection of
we will outline some specific methods and necessary pre- aggregates explicitly supporting the multidimensional
requisites to achieve performance in large distributed model compute preaggregates that are based on the full
OLAP environments. Note, that some of these also apply in scope of the data cube. The query result in figure 4, if mate-
the non-distributed case but become essential in the distrib- rialized, would be such a complete aggregate. Relationally
uted case. spoken, only different attribute combinations in the group-
by clause are considered. The where-clause in the materi-
3.1 The Selection and Use of Aggregates alized view definition is always empty. The benefit of this
approach is that query processing is relatively simple.
Researches and vendors agree that the main approach Therefore, this is the only kind of aggregates commercial
to speed up OLAP queries is the use of aggregates or mate- products support today.
rialized views [LeRT96]. In the last two years many articles
on the selection of views to materialize in a data warehouse However, materialized views created in that manner
were published. A good overview over techniques on the may contain much data that is never needed and therefore
selection of aggregates can be found in [ThSe97]. storage space is wasted. Even more important, in a distrib-
uted setting with inherent data fragmentation the handling
Dynamic Management of Aggregates. None of the recent of partial aggregates (the candidates in figure 4) becomes
articles considers the dynamics of an on-line environment a necessary prerequisite. A data mart server has only parts
with dozens of completely different queries to execute in of the whole data cube stored locally. In fact, that is the fun-
4. Achieving Performance in Distributed OLAP Systems 3.2
damental idea of having data marts in a distributed enter- The fragmentation of the raw data is already fix due to
prise data warehouse. Thus, at least at the lower aggrega- the definition of the data mart. Usually, there is not much of
tion levels it does not even make sense to create aggregates a choice here. The problems as well as the performance
with full, enterprise-wide scope. potential of distributed data warehouses arise from the
The big drawback is clearly that query optimization in mentioned techniques of dynamic aggregate management
the presence of partial aggregates becomes more complex. and dynamic data allocation. These techniques were not
A patchworking algorithm finding the least expensive previously applicable because redundancy in transaction-
aggregate partitions must be invoked (figure 4). oriented systems always requires expensive synchroniza-
tion mechanisms.
Data Allocation and Load Balancing. Directly related to The query optimizer of a distributed OLAP system
the question of selecting aggregates in a distributed envi- first needs to be aware of the aggregations which would be
ronment is the allocation problem. The system should not helpful in order to speed up query processing. If this infor-
only manage the dynamic creation of partial aggregates, mation is known it must invoke a patchworking algorithm
but also decide where to place them on a tactical basis. finding the least expensive aggregate partitions (figure 4).
Moreover, the traditional method to increase local access The choice how to assemble the query result may vary from
performance in distributed environments by the use of rep- query to query because the underlying redundant fragments
licas can and should also be applied in distributed OLAP. may be dropped for the selection of new ones by the
Hence, redundancy in the distributed data warehouse resource management and the system and network load
can be distinguished into vertical redundancy involving the may change.
creation of summary data with a coarser granularity than An important point is that the determination of candi-
the detailed raw data, and horizontal redundancy being dates as well as the patchworking itself, which includes the
introduced by the replication of aggregates (figure 5). comparison of selection predicates, is only feasible in the
context of a multidimensional data model. In contrast to the
aggr op. simple relational data model, selections (slices) can be
replicate defined by classification nodes (see section 4.2). Predicates
Summary Data Summary Data defined in that manner contain much more semantics as a
aggr op. simple relational selection predicate. If one would allow
arbitrary predicates for the definition of fragments, patch-
Raw Data working would not be possible.
Vertical Horizontal
Redundancy Redundancy 3.2 Quality of Service Specification
Fig. 5: Horizontal and Vertical Redundancy
Another point greatly distinguishes the distributed
OLAP scenario from traditional distributed databases. In
In order to allow the system to adapt to changing user the analytical context general figures are of interest and not
requirements, there should be no fixed locations for any individual entities. It is often the case that approximate
kind of redundant data fragments. Aggregate data should numbers are sufficient if they can be delivered faster. Thus,
be close to sites where they are needed in order to minimize it is feasible to trade correctness for performance. Some
the communication costs. In order to increase parallel factors characterizing correctness are accuracy, actuality
query execution, data fragments should be distributed on and consistency.
several data mart servers, especially if communication cost
is low. Redundant fragments must dynamically be created Accuracy. One approach to relaxing the accuracy of the
and dropped according to the changing user behavior. reports is offering summary data that is either aggregated at
Thus, a flexible adapting migration and replication a higher level than requested or offering only parts of the
policy is essential for a data based load balancing at the requested information. For example, if a query requesting
data mart servers. Involved in the process of choosing and sales figures for each city in Germany would take hours, an
allocating aggregates are not only access characteristics but answer containing sales figures for each county or even
also communication cost, maintenance cost for redundant only for South Germany might be enough for the moment.
data and additional storage cost. An algorithm dealing with these issues is given in
[Dyre96].
Differences to traditional distributed DBMS. Although
the fragmentation and allocation problematic is well Actuality. Absolute actuality in a worldwide distributed
known from traditional distributed database systems, there enterprise data warehouse is an expensive requirement. If
are several specifics for distributed data warehouses. each data mart is updated once a day for example, it means
for the distributed data warehouse that it is updated every
5. 4 The Distributed OLAP system CubeStar
few hours. Therefore, the maintenance cost for summary Clients Result
data can increases dramatically. But getting the most actual
numbers may not be necessary for the analyst. Hence, a
certain degree of staleness can be acceptable allowing Query
redundant data to be updated or dropped gradually Middleware Parser Syntax Check Query
Restructurer
[Lenz96]. Generator
Semantic Check Broker
Consistency. The consistency of the reports generated in a Partition
session should be another adjustable factor. In general, it is Delivery Distributed Global Partition Directory
Server Classification Selector Cache
desirable that a sequence of drill-down and roll-up opera- Information
tions relies on the same base figures. However, this require-
ment might not be crucial, if the analyst just wants to get an Data Mart
Server Bidder B B
overview over some figures. Hence, this condition as well
might be relaxed according to the analysts needs. Note, that RM AE RM AE
Aggregation Resource
generating consistent reports can be very expensive, Engine Manager
because all reports in a session must have the same actual-
ity. AE Distributed
AE Server
Federation
Especially in a large distributed environment where
communication costs can not be neglected, these factors Control Flow
Data Flow
are critical for reasonable overall performance. A little
inaccurate, a little stale or a little inconsistent data may be Fig. 6: Architecture of CUBESTAR
sufficient for the analyst, if he just gets the figures in a few
seconds rather than minutes. The third layer (server layer) consists of a federation
of data mart servers. Each data mart server controls the
4 The Distributed OLAP system CUBESTAR locally available set of data partitions and the parallel query
execution at its machine.
This section focuses on the architecture and the prin-
ciples for query processing and resource management of 4.2 The Data Model
the prototypical distributed OLAP system CUBESTAR. In
order to deal with the complexity of the issues we borrowed CUBESTAR consequently applies the multidimen-
many ideas from the Mariposa system ([SDK+94]) and sional data model at all levels of query processing and
applied them to our special application domain. aggregate management. Only issues of physical data access
are managed by the underlying relational database engines.
4.1 The Architecture Since the main operation in OLAP is the application
of aggregation operations, classification hierarchies can be
The general architecture of CUBESTAR consists of a used to define groups for the application of aggregation
three tier architecture as shown in figure 6. At the client operations (figure 7). For example, the typical OLAP-
layer are the end user tools with reporting facilities. Que- query of figure 7 specified in CQL asks for the sales figures
ries are issued in the Cube-Query-Language (CQL, at the granularity of product families subsumed by video
[BaLe97], figure 7), specifically supporting the multidi- equipment (= camcorders and recorders).
mensional data model.
The heart of the architecture is the middleware layer Multidimensional Objects. In analogy to the ‘cubette’-
hiding the details of data distribution. Its main task is to approach of [Dyre96], a query may be seen as a multidi-
translate a user query and generate an optimized distributed mensional subcube, called Multidimensional Object (MO),
query execution plan. For distributed query processing and basically described by three components [Lehn98]:
aggregate management we apply a microeconomic para- • A specification of the original cell and the applied
digm. A site tries to maximize revenues earned for taking aggregation operator (SUM(Sales))
part in query processing. In order to compete, the site has
to maintain a set of redundant data according to the current • A multidimensional scope specification consisting of
market needs. The middleware layer also implements a glo- a tuple of classification nodes (<P.Group = ‘Video’,
bal classification information service providing a globally G.Country = ‘Germany’, T.Year = ‘1997’>)
consistent view of the multidimensional data model. • A data granularity specification consisting of a tuple
of categories (<P.Family, G.Top, T.Top>)
6. The Distributed OLAP system CubeStar 4.3
TOP ALL for the aggregation efforts. The value of a query result is
computed by a weighted combination of the following fac-
Consumer tors:
Area Electronics
• Actual size of the raw data MOs required for the com-
Group Audio Video putation (number of tuples).
Taking the size of the raw data is reasonable, since
Family Camcorder Recorder
requests for aggregates of huge fact tables should be
article# TR-75 TS-78 A200 V-201 ClassicI
more expensive than those of smaller ones. Moreover,
this decision favors the storage of higher aggregates,
since they need little space and yield high profits.
SELECT SUM(SALES)
FROM Product P, Geography G, Time T • Actual size of the query result (number of tuples).
WHERE P.Group = ‘Video’, This term simply values the additional overhead for
G.Country = ‘Germany’, storing large aggregates.
T.Year = ‘1997’ • Aggregation time.
UPTO P.Family This is the time spent from issuing the query to deliver-
ing the result, i.e. the local query processing time.
Fig. 7: A Sample Classification Hierarchy and Higher aggregation time results in a lower price, thus
a Sample Query in CQL favoring fast data mart servers.
• Transfer time.
Within the distributed multidimensional OLAP envi- The computation may require the shipping of interme-
ronment, multidimensional objects reflect the basic units diate MOs from one data mart server to another.
for distributed storage management and query processing.
Therefore, different types of MOs with regard to their gen- 4.4 Query Processing
eration in the distributed environment are considered.
The raw or base data partitions at each data mart are A query in CUBESTAR is basically a multidimensional
called Base Multidimensional Objects (BMO). Thus, object specified by a CQL query like the one in figure 7.
BMOs represent data of the finest granularity. BMOs are Affiliated with each query is a user defined quality of ser-
not allowed to migrate but are assigned a unique home site, vice specification consisting of a limitation to the query
where they are uploaded by the data mart loading tools. processing time and an indicator for the requested accuracy
Consequently, they are per definition up-to-date. of the data. Thus, the user has the choice to trade speed for
actuality.
From the BMOs any number of MOs can be derived.
In a relational sense these derived MOs are materialized The query is issued to a broker which tries to perform the
views. However, because of the definition of MOs, the query in the best way possible on behalf of the user. Since
questions whether one MO is contained in, intersected by all users are treated equally and prices are regulated, a
or can be derived from another MO is easily decidable. query is not assigned a budget. The brokers objective is
According to the requirements of the system MOs are simply to get the best service available for the users
allowed to migrate or be replicated freely. request, i.e. to minimize query execution time under the
quality of service constraints. After query processing, the
broker pays the price according to the billing scheme to the
4.3 Microeconomics participating data mart servers. Note, that in order to sup-
An elegant solution to cope with the complexity of port user priorities it would make sense to limit the users
dynamic aggregate management and load balancing is to budget resulting in a second objective, price, to query opti-
put all issues related to shared resources into a microecono- mization.
mic framework ([SDK+94]). In an economy where every Query processing basically proceeds in the three steps
site tries to selfishly maximize its utility there is no need for query preparation, query optimization, and query execu-
a central coordinator; the decision process is inherently tion.
decentralized. Moreover, the competition for orders and
In the query preparation phase, the query string is handed
profits allows the system to dynamically adapt to market
over to the parser, where it is checked for syntactical and
demands, i.e. query behavior and system load as well.
semantic correctness. The result of this step is a tree-like
In contrast to Mariposa where demand and supply regulate representation of the query with the queried MO at the top,
the prices, for the sake of simplicity a uniform billing intermediate results at the inner nodes, and raw data at the
scheme is applied to each executed query having fixed rates leaves.
7. 4.5 Storage Management
For the generation of a distributed query execution plan the 4.5 Storage Management
current distribution of the aggregates must be taken into
account. Note, that due to the dynamic aggregation man- So far, the existence of many redundant multidimensional
agement the existence and location of aggregates and rep- objects was assumed. This subsection focuses on when,
licates changes frequently. Therefore, the broker may issue how, and where these MOs are generated. In order to deal
a broadcast request to the data servers in order to obtain the with the complexity of the issues of section 3.1, this prob-
necessary information. This kind of meta data is cached in lem is handled by the application of microeconomics.
the partition directory cache in order to avoid too many The resource manager at a data server aims towards maxi-
broadcasts. Using this method, it is possible that invalid mal usage of its local resources in terms of revenues.
meta data are used. In this case, there are two possibilities, Therefore, a site always tries to expand its market share by
either force a data mart server to the recomputation of cleverly creating and dropping aggregates and replicas. In
dropped aggregates or submit a new broadcast. order to estimate current demands the resource manager
At the data mart server side, the bidder component receiv- maintains a query history list containing those requests the
ing the request checks for those locally available MOs site was rejected to answer, either because it did not have
which can contribute to the query. Each bid from a data any suitable MOs or other bidders were preferred. The list
mart server includes the following data entries include for each query the actually selected MOs
with their locations and revenues. Based on this informa-
• the identifier of the data mart server, tion the resource manager can decide to obtain new MOs in
• a set of tuples { (Mi, Ti) } each of which contains the order to underbid its competitors in the future. If the new
description of an offered MO in the repository and the MO is to be acquired from a remote location, the resource
time the site expects to aggregate that MO to the gran- manager acts as a client itself, and therefore also has to pay
ularity of the query M, for the MO from its own account. In this case, it is doing an
investment.
• an indicator for the available processing power the site
is willing to reserve for the query depending on the sys- The other way to use the storage space is to offer it for
tem load, query processing. If MOs from different locations must be
coalesced, one site will receive the other parts. Note, that it
• the free space the site is willing to reserve for interme-
is very lucrative to receive other pieces of MOs fitting to
diate results possibly from other data mart servers.
locally stored ones, because next time a larger MO can be
The query optimizer’s task is now to select from the set of offered.
multidimensional objects offered by the bidding sites
In fact, using the intermediate results of the query process-
those, which require the least aggregation and transporta-
ing is the most convenient way to get new MOs also in the
tion efforts using a patchworking algorithm. To estimate
local case. Each of these MOs is checked by the resource
communication costs, average network throughputs
manager for usefulness. Basically, storing MOs that were
between the sites are used.
already needed is working like a cache, with a microecono-
The result of this step is a distributed query execution plan, mic heuristics for replacement.
which is basically a tree with physically existing MOs at
the leaves, the target MO at the top, and intermediate
5 Summary and Future Work
results at the inner nodes. Attached to each node is an oper-
ation, an estimated processing time for the operation, and In the first part of this paper we made clear the draw-
the location for the execution of the operation denoting the backs of the centralized data warehouse and the non-inte-
bidders that won the competition. grated data mart approach. In order to circumvent these
In the query execution step, the bidders are informed drawbacks and combine the benefits, integrated distributed
whether their offer was accepted or not. The notification enterprise data warehouses were presented as a possible
includes the complete query execution plan allowing the solution. The on-line exploration of distributed data ware-
sites to understand the decision of the broker and draw con- houses or distributed OLAP is related to many complex
clusions for their future resource management. issues like distributed query optimization, meta data man-
agement and security policy handling, each of which
If all MOs necessary for processing have arrived at the site,
demanding innovative solutions. We characterized some
the local query execution is initiated by an aggregation
possibilities for achieving query performance by dynamic
engine. If subqueries can locally be executed in parallel, and partial aggregation, data based load balancing and the
the aggregation engine may spawn new aggregation specification of the quality of service. Finally, we discussed
engines running in parallel threads. After the aggregation the architectural framework of the prototypical distributed
engine completed its task, in the last step the resulting MO OLAP system CUBESTAR.
is sent to the target location.
8. Summary and Future Work 5
We consider sophisticated aggregation management a HaRU96 Harinarayan, V.; Rajaraman, A.; Ullman, J.D.:
major research issue. This covers algorithms for using dis- Implementing Data Cubes Efficiently, in: 25th
tributed partial aggregations and deciding which aggregate International Conference on Management of Data,
combinations yield the highest benefit and are the best to be (SIGMOD96, Montreal, Quebec, Canada, June 4-6,
1996), pp. 205-216
materialized. A reasonable solution offering the needed
flexibility is the application of a microeconomic framework HP97 N.N.: HP Intelligent Warehouse, White Paper,
as in the Mariposa system ([SDK+94]). We admit, that this Hewlett Packard, http://www.hp.com, 1997
last statement needs to be proven by the implementation. Inmo92 Inmon, W.H.: Building the Data Warehouse, John
Wiley, 1992
Info97 N.N.: Enterprise-Scalable Data Marts: A New
References Strategy for Building and Deploying Fast, Scalable
BaLe97 Bauer, A.; Lehner, W.: The Cube-Query-Language Data Warehousing Systems, White Paper,
for Multidimensional Statistical and Scientific Informatica, http://www.informatica.com, 1997
Database Systems, in: 5th International Conference Lehn98 Lehner, W.: Modeling Large Scale OLAP
on Database Systems For Advanced Applications Scenarios, in: 6th International Conference on
(DASFAA’97, Melbourne, Australia, April 1-4, Extending Database Technology, (EDBT’98,
1997), pp. 263-272 Valencia, Spain, March 23-27, 1998)
BaPT97 Baralis, E.; Paraboschi, S.; Teniente, E.: Lenz96 Lenz, R.: Adaptive Distributed Data Management
Materialized View Selection in a Multidimensional with Weak Consistent Replicated Data. In:
Database, in: 23rd International Conference on Very Proceedings of the 11th annual Symposium on
Large Data Bases (VLDB’97, Athens, Greece, Applied Computing (SAC’96), Philadelphia, 1996
1997)
LeRT96 Lehner, W.; Ruf, T.; Teschke, M.: Improving Query
CoCS93 Codd, E.F.; Codd, S.B.; Salley, C.T.: Providing Response Time in Scientific Databases Using Data
OLAP (On-line Analytical Processing) to User Aggregation, in: 7th International Conference and
Analysts: An IT Mandate, White Paper, Arbor Workshop on Database and Expert Systems
Software Corporation, 1993 Applications (DEXA’96, Zurich, Switzerland,
Cogn97 Cognos Corporation: Distributed OLAP, White Sept. 9-10, 1996)
Paper, http://www.cognos.com, 1997 SDK+94 Stonebraker, M.; Devine, R.; Kornacker, M.; Litwin,
Cube97 The CUBESTAR Project: http:// W.; Pfeffer, A.; Sah, A.; Staelin, C.: An Economic
www6.informatik.uni-erlangen.de/research/ Paradigm for Query Processing and Data Migration
cubestar.html in Mariposa, in: Proceedings of 3rd International
Conference on Parallel and Distributed Information
Dyre96 Dyreson, C.: Information Retrieval from an Systems (PDIS’94, Austin, TX., Sept. 28-30), 1994,
Incomplete Data Cube, in: 22th International pp. 58-67
Conference on Very Large Data Bases, (VLDB’96,
Mumbai, India, 1996) ThSe97 Theodoratos, D.; Sellis, T.: Data Warehouse
Configuration, in: 23th International Conference on
Ecke97 Eckerson, W.: Building and Managing Data Marts, Very Large Data Bases, (VLDB’97, Athens,
Patricia Seybold Group Report for Informatica, Greece, 1997)
http://www.informatica.com, 1997
WeCa96 Wells, D.; Carnelley, P.: Ovum evaluates: The Data
GHRU97 Gupta, H.; Harinarayan, V.; Rajaraman, A.; Ullman, Warehouse, Ovum Ltd., London, 1996
J.D.: Index Selection for OLAP, in: 13th
International Conference on Data Engineering,
(ICDE’97, Birmingham, UK, April 7-11, 1997)
Gupt97 Gupta, H.: Selection of Views to Materialize in a
Data Warehouse, in: 6th International Conference
on Database Theory (ICDT‘97, Delphi, Greece,
Jan 8-10, 1997), pp. 98-112