Prezentácia podáva základné informácie o dátových skladoch. Obsahuje vysvetlenie pojmu dátový sklad a jeho vlastností, princípy dátového modelovania, relačný vs. dimenzionálny dátový model, OLAP vs. OLTP spracovanie dát, architektúru dátového skladu, ETL proces, OLAP kocka a jaj vlastnosti a pod.
History, definition, need, attributes, applications of data warehousing ; difference between data mining, big data, database and data warehouse ; future scope
This document describes four types of databases: hierarchical, network, relational, and object-oriented. Hierarchical databases organize data in a tree structure with parent-child relationships. Network databases use a many-to-many relationship structure like a graph. Relational databases organize data into tables with rows and columns. Object-oriented databases store reusable software objects that contain data and instructions.
The document discusses different database concepts:
1) A database is a collection of organized data that can be easily retrieved, inserted, and deleted. Database management systems (DBMS) like MySQL and Oracle are software used to manage databases.
2) The two main data models are the relational model, which organizes data into tables and relations, and the object-oriented model, which represents data as objects with properties and methods.
3) DBMS provide advantages like data sharing, backup/recovery, security, and independence between data and applications. However, they also have disadvantages such as higher costs and complexity.
This document provides an overview of data warehousing and related concepts. It defines a data warehouse as a centralized database for analysis and reporting that stores current and historical data from multiple sources. The document describes key elements of data warehousing including Extract-Transform-Load (ETL) processes, multidimensional data models, online analytical processing (OLAP), and data marts. It also outlines advantages such as enhanced access and consistency, and disadvantages like time required for data extraction and loading.
A SAP client is an organizational and legal entity that protects business management data. Each client contains its own set of tables and user data. Multiple clients can exist within a single SAP system, with each client serving a unique purpose like development, testing, or production. When creating a new client, attributes like the client number, name, currency, and settings for changes and transports must be defined.
The document discusses three papers related to data warehouse design.
Paper 1 presents the X-META methodology, which addresses developing a first data warehouse project and integrates metadata creation and management into the development process. It proposes starting with a pilot project and defines three iteration types.
Paper 2 proposes extending the ER conceptual data model to allow modeling of multi-dimensional aggregated entities. It includes entity types for basic dimensions, simple aggregations, and multi-dimensional aggregated entities.
Paper 3 presents a comprehensive UML-based method for designing all phases of a data warehouse, from source data to implementation. It defines four schemas - operational, conceptual, storage, and business - and the mappings between them. It also provides steps
OLTP systems emphasize short, frequent transactions with a focus on data integrity and query speed. OLAP systems handle fewer but more complex queries involving data aggregation. OLTP uses a normalized schema for transactional data while OLAP uses a multidimensional schema for aggregated historical data. A data warehouse stores a copy of transaction data from operational systems structured for querying and reporting, and is used for knowledge discovery, consolidated reporting, and data mining. It differs from operational systems in being subject-oriented, larger in size, containing historical rather than current data, and optimized for complex queries rather than transactions.
History, definition, need, attributes, applications of data warehousing ; difference between data mining, big data, database and data warehouse ; future scope
This document describes four types of databases: hierarchical, network, relational, and object-oriented. Hierarchical databases organize data in a tree structure with parent-child relationships. Network databases use a many-to-many relationship structure like a graph. Relational databases organize data into tables with rows and columns. Object-oriented databases store reusable software objects that contain data and instructions.
The document discusses different database concepts:
1) A database is a collection of organized data that can be easily retrieved, inserted, and deleted. Database management systems (DBMS) like MySQL and Oracle are software used to manage databases.
2) The two main data models are the relational model, which organizes data into tables and relations, and the object-oriented model, which represents data as objects with properties and methods.
3) DBMS provide advantages like data sharing, backup/recovery, security, and independence between data and applications. However, they also have disadvantages such as higher costs and complexity.
This document provides an overview of data warehousing and related concepts. It defines a data warehouse as a centralized database for analysis and reporting that stores current and historical data from multiple sources. The document describes key elements of data warehousing including Extract-Transform-Load (ETL) processes, multidimensional data models, online analytical processing (OLAP), and data marts. It also outlines advantages such as enhanced access and consistency, and disadvantages like time required for data extraction and loading.
A SAP client is an organizational and legal entity that protects business management data. Each client contains its own set of tables and user data. Multiple clients can exist within a single SAP system, with each client serving a unique purpose like development, testing, or production. When creating a new client, attributes like the client number, name, currency, and settings for changes and transports must be defined.
The document discusses three papers related to data warehouse design.
Paper 1 presents the X-META methodology, which addresses developing a first data warehouse project and integrates metadata creation and management into the development process. It proposes starting with a pilot project and defines three iteration types.
Paper 2 proposes extending the ER conceptual data model to allow modeling of multi-dimensional aggregated entities. It includes entity types for basic dimensions, simple aggregations, and multi-dimensional aggregated entities.
Paper 3 presents a comprehensive UML-based method for designing all phases of a data warehouse, from source data to implementation. It defines four schemas - operational, conceptual, storage, and business - and the mappings between them. It also provides steps
OLTP systems emphasize short, frequent transactions with a focus on data integrity and query speed. OLAP systems handle fewer but more complex queries involving data aggregation. OLTP uses a normalized schema for transactional data while OLAP uses a multidimensional schema for aggregated historical data. A data warehouse stores a copy of transaction data from operational systems structured for querying and reporting, and is used for knowledge discovery, consolidated reporting, and data mining. It differs from operational systems in being subject-oriented, larger in size, containing historical rather than current data, and optimized for complex queries rather than transactions.
Service-Oriented Modeling Framework (SOMF) is a methodology for enterprise, application, and cloud architecture using modeling languages and patterns. It provides "as-is", "to-be", and business/technical perspectives. SOMF includes six phases for software development focusing on service-oriented conceptualization, analysis, design, and logical architecture. It delivers analysis, design, and architecture models using various diagrams and is supported by Enterprise Architect modeling tool.
SAP began in 1972 and has evolved through several releases to become the world's largest enterprise software company, with its current release being SAP Business Suite 7. SAP provides integrated enterprise resource planning software modules that can be customized for any industry, with a multi-tier architecture separating the presentation, business logic and data layers. A typical SAP landscape includes separate clients for customization, quality assurance, and production to ensure changes are tested before being implemented in the live system.
Data processing involves cleaning, integrating, transforming, reducing, and summarizing data from various sources into a coherent and useful format. It aims to handle issues like missing values, noise, inconsistencies, and volume to produce an accurate and compact representation of the original data without losing information. Some key techniques involved are data cleaning through binning, regression, and clustering to smooth or detect outliers; data integration to combine multiple sources; data transformation through smoothing, aggregation, generalization and normalization; and data reduction using cube aggregation, attribute selection, dimensionality reduction, and discretization.
SAP Leonardo Blockchain Services and Use-CasesNagesh Caparthy
The document summarizes a presentation given by Dr. Sathya Narasimhan and Nagesh Caparthy of SAP on SAP Leonardo blockchain services and use cases. The presentation covered SAP's unique approach to blockchain using SAP Cloud Platform and SAP HANA, key opportunity areas for blockchain including supply chain, asset management and payments. It provided examples of blockchain applications for traceability, master data management and transportation. It also discussed SAP's blockchain industry consortiums and approach to customer co-innovation projects.
A seminar topic which was created for the first time easy to understand and easy to explain.
any queries related to this topic can ask to me. and be free to connect to me.
to connect me to fb search mykeel vineeth thelakat.
This document provides an overview of SAP and discusses 12 key points:
1) It defines SAP as a leading enterprise resource planning (ERP) software package.
2) It explains that SAP modules integrate different departments like finance, logistics, sales.
3) It describes SAP's architecture with database, application, and presentation layers.
A web database allows storing and accessing data via the internet. It uses a client-server architecture with a web interface to connect to backend database servers. Large businesses rely on web databases to store customer information and make it accessible online. Web databases provide platform independence and standardization through their use of web technologies like HTML. Their future involves new technologies like NoSQL, Hadoop, universal memory and blockchain.
The document discusses data modeling, which involves creating a conceptual model of the data required for an information system. There are three types of data models - conceptual, logical, and physical. A conceptual data model describes what the system contains, a logical model describes how the system will be implemented regardless of the database, and a physical model describes the implementation using a specific database. Common elements of a data model include entities, attributes, and relationships. Data modeling is used to standardize and communicate an organization's data requirements and establish business rules.
The document discusses several implementation challenges for object relational database systems (ORDBMS). It outlines issues related to storage and access methods for efficiently storing and accessing different data types like ADTs and structured objects. Query processing is more complex due to support for ADTs and user-defined functions that can introduce bugs or security issues. The query optimizer must also understand the new functionality to optimize queries appropriately.
This document provides an overview of data warehousing. It defines data warehousing as collecting data from multiple sources into a central repository for analysis and decision making. The document outlines the history of data warehousing and describes its key characteristics like being subject-oriented, integrated, and time-variant. It also discusses the architecture of a data warehouse including sources, transformation, storage, and reporting layers. The document compares data warehousing to traditional DBMS and explains how data warehouses are better suited for analysis versus transaction processing.
The 10 step process for ERP system selection includes: 1) conducting a business assessment to identify needs, 2) appointing a selection team to manage the process, 3) assessing business requirements and constraints, 4) determining selection criteria to evaluate vendors, 5) finding compatible vendors that meet requirements, 6) communicating with potential vendors to understand offerings, 7) creating a request for information to gather details from vendors, 8) shortlisting top vendors and seeing demonstrations of their systems, 9) selecting the ERP system best suited to organizational needs, and 10) formalizing a project plan with the selected vendor to implement the new system.
Distributed databases allow users to access data across multiple independent database systems as if accessing a single database. There are two main types: homogeneous, where all database systems have identical software, and heterogeneous, where the systems may differ. Data can be distributed through replication, storing multiple copies of data for availability and parallelism, or fragmentation, partitioning relations across sites. Commit protocols like two-phase commit ensure atomicity of transactions executing across multiple sites in a distributed database.
Data warehousing involves assembling and managing data from various sources to provide an integrated view of enterprise information. A data warehouse contains consolidated, historical data used to support management decision making. It differs from operational databases by containing aggregated, non-volatile data optimized for queries rather than updates. The extract, transform, load (ETL) process migrates data from source systems to the warehouse, transforming it as needed. Process managers oversee loading, maintaining, and querying the warehouse data.
Study of SAP R3 architecture. Here we present you a detail study of the most widely used ERP Software - SAP. Find more details at http://www.sapficotraining.com/
The document discusses SAP BW Near-line Storage (NLS). It begins with defining NLS and how it differs from online and archive storage. It then discusses the NLS interface in SAP BW, including the design-time and runtime objects. It also covers NLS implementation partners and solutions, how NLS fits into the SAP BW information lifecycle management strategy, and benefits such as reduced storage costs.
Sap Billing for Public Transport & Passenger RailSAP
Various estimates put global investment in smart transportaion systems nearly to $15B from 2011 through 2017. Smart ticketing is a subset of this market. Railway carriers face high operating costs for siloed homegrown systems. Learn more about how to streamline your smart ticketing system.
This document discusses data mining elements, techniques, and applications. It defines data mining as the extraction of interesting patterns from large amounts of data. Common data mining techniques discussed include decision trees, neural networks, regression, association rules, and clustering. Applications mentioned include analyzing customer purchase patterns in retail, medical imaging, market segmentation in business, and analyzing patterns in banking transactions and frequent flyer data.
The document describes the ANSI-SPARC 3-level architecture for database management systems. It separates a database into three logical levels: the external level contains users' individual views, the conceptual level defines the overall data structure, and the internal level manages the physical storage and access. This architecture allows for independent views, hides physical details from users, and enables changes to the storage structure without affecting other levels.
This document discusses OLAP (Online Analytical Processing) operations. It defines OLAP as a technology that allows managers and analysts to gain insight from data through fast and interactive access. The document outlines four types of OLAP servers and describes key multidimensional OLAP concepts. It then explains five common OLAP operations: roll-up, drill-down, slice, dice, and pivot.
This document provides an overview of key concepts related to data and big data. It defines data, digital data, and the different types of digital data including unstructured, semi-structured, and structured data. Big data is introduced as the collection of large and complex data sets that are difficult to process using traditional tools. The importance of big data is discussed along with common sources of data and characteristics. Popular tools and technologies for storing, analyzing, and visualizing big data are also outlined.
Service-Oriented Modeling Framework (SOMF) is a methodology for enterprise, application, and cloud architecture using modeling languages and patterns. It provides "as-is", "to-be", and business/technical perspectives. SOMF includes six phases for software development focusing on service-oriented conceptualization, analysis, design, and logical architecture. It delivers analysis, design, and architecture models using various diagrams and is supported by Enterprise Architect modeling tool.
SAP began in 1972 and has evolved through several releases to become the world's largest enterprise software company, with its current release being SAP Business Suite 7. SAP provides integrated enterprise resource planning software modules that can be customized for any industry, with a multi-tier architecture separating the presentation, business logic and data layers. A typical SAP landscape includes separate clients for customization, quality assurance, and production to ensure changes are tested before being implemented in the live system.
Data processing involves cleaning, integrating, transforming, reducing, and summarizing data from various sources into a coherent and useful format. It aims to handle issues like missing values, noise, inconsistencies, and volume to produce an accurate and compact representation of the original data without losing information. Some key techniques involved are data cleaning through binning, regression, and clustering to smooth or detect outliers; data integration to combine multiple sources; data transformation through smoothing, aggregation, generalization and normalization; and data reduction using cube aggregation, attribute selection, dimensionality reduction, and discretization.
SAP Leonardo Blockchain Services and Use-CasesNagesh Caparthy
The document summarizes a presentation given by Dr. Sathya Narasimhan and Nagesh Caparthy of SAP on SAP Leonardo blockchain services and use cases. The presentation covered SAP's unique approach to blockchain using SAP Cloud Platform and SAP HANA, key opportunity areas for blockchain including supply chain, asset management and payments. It provided examples of blockchain applications for traceability, master data management and transportation. It also discussed SAP's blockchain industry consortiums and approach to customer co-innovation projects.
A seminar topic which was created for the first time easy to understand and easy to explain.
any queries related to this topic can ask to me. and be free to connect to me.
to connect me to fb search mykeel vineeth thelakat.
This document provides an overview of SAP and discusses 12 key points:
1) It defines SAP as a leading enterprise resource planning (ERP) software package.
2) It explains that SAP modules integrate different departments like finance, logistics, sales.
3) It describes SAP's architecture with database, application, and presentation layers.
A web database allows storing and accessing data via the internet. It uses a client-server architecture with a web interface to connect to backend database servers. Large businesses rely on web databases to store customer information and make it accessible online. Web databases provide platform independence and standardization through their use of web technologies like HTML. Their future involves new technologies like NoSQL, Hadoop, universal memory and blockchain.
The document discusses data modeling, which involves creating a conceptual model of the data required for an information system. There are three types of data models - conceptual, logical, and physical. A conceptual data model describes what the system contains, a logical model describes how the system will be implemented regardless of the database, and a physical model describes the implementation using a specific database. Common elements of a data model include entities, attributes, and relationships. Data modeling is used to standardize and communicate an organization's data requirements and establish business rules.
The document discusses several implementation challenges for object relational database systems (ORDBMS). It outlines issues related to storage and access methods for efficiently storing and accessing different data types like ADTs and structured objects. Query processing is more complex due to support for ADTs and user-defined functions that can introduce bugs or security issues. The query optimizer must also understand the new functionality to optimize queries appropriately.
This document provides an overview of data warehousing. It defines data warehousing as collecting data from multiple sources into a central repository for analysis and decision making. The document outlines the history of data warehousing and describes its key characteristics like being subject-oriented, integrated, and time-variant. It also discusses the architecture of a data warehouse including sources, transformation, storage, and reporting layers. The document compares data warehousing to traditional DBMS and explains how data warehouses are better suited for analysis versus transaction processing.
The 10 step process for ERP system selection includes: 1) conducting a business assessment to identify needs, 2) appointing a selection team to manage the process, 3) assessing business requirements and constraints, 4) determining selection criteria to evaluate vendors, 5) finding compatible vendors that meet requirements, 6) communicating with potential vendors to understand offerings, 7) creating a request for information to gather details from vendors, 8) shortlisting top vendors and seeing demonstrations of their systems, 9) selecting the ERP system best suited to organizational needs, and 10) formalizing a project plan with the selected vendor to implement the new system.
Distributed databases allow users to access data across multiple independent database systems as if accessing a single database. There are two main types: homogeneous, where all database systems have identical software, and heterogeneous, where the systems may differ. Data can be distributed through replication, storing multiple copies of data for availability and parallelism, or fragmentation, partitioning relations across sites. Commit protocols like two-phase commit ensure atomicity of transactions executing across multiple sites in a distributed database.
Data warehousing involves assembling and managing data from various sources to provide an integrated view of enterprise information. A data warehouse contains consolidated, historical data used to support management decision making. It differs from operational databases by containing aggregated, non-volatile data optimized for queries rather than updates. The extract, transform, load (ETL) process migrates data from source systems to the warehouse, transforming it as needed. Process managers oversee loading, maintaining, and querying the warehouse data.
Study of SAP R3 architecture. Here we present you a detail study of the most widely used ERP Software - SAP. Find more details at http://www.sapficotraining.com/
The document discusses SAP BW Near-line Storage (NLS). It begins with defining NLS and how it differs from online and archive storage. It then discusses the NLS interface in SAP BW, including the design-time and runtime objects. It also covers NLS implementation partners and solutions, how NLS fits into the SAP BW information lifecycle management strategy, and benefits such as reduced storage costs.
Sap Billing for Public Transport & Passenger RailSAP
Various estimates put global investment in smart transportaion systems nearly to $15B from 2011 through 2017. Smart ticketing is a subset of this market. Railway carriers face high operating costs for siloed homegrown systems. Learn more about how to streamline your smart ticketing system.
This document discusses data mining elements, techniques, and applications. It defines data mining as the extraction of interesting patterns from large amounts of data. Common data mining techniques discussed include decision trees, neural networks, regression, association rules, and clustering. Applications mentioned include analyzing customer purchase patterns in retail, medical imaging, market segmentation in business, and analyzing patterns in banking transactions and frequent flyer data.
The document describes the ANSI-SPARC 3-level architecture for database management systems. It separates a database into three logical levels: the external level contains users' individual views, the conceptual level defines the overall data structure, and the internal level manages the physical storage and access. This architecture allows for independent views, hides physical details from users, and enables changes to the storage structure without affecting other levels.
This document discusses OLAP (Online Analytical Processing) operations. It defines OLAP as a technology that allows managers and analysts to gain insight from data through fast and interactive access. The document outlines four types of OLAP servers and describes key multidimensional OLAP concepts. It then explains five common OLAP operations: roll-up, drill-down, slice, dice, and pivot.
This document provides an overview of key concepts related to data and big data. It defines data, digital data, and the different types of digital data including unstructured, semi-structured, and structured data. Big data is introduced as the collection of large and complex data sets that are difficult to process using traditional tools. The importance of big data is discussed along with common sources of data and characteristics. Popular tools and technologies for storing, analyzing, and visualizing big data are also outlined.
2. Výrobca potrebuje vedieť...
2
2
Ktorý je náš
najmenej a najvac
ziskový zákazník?
Kto su moji zákazníci
a ktoré produkty kupujú?
Ktorý zákazník
je ochotný
viac nakupovať?
Aký dopad bude
mať nový produkt
na príjmy a výnosy?
Ako konkrétna
akcia prispela
k nárastu tržieb?
Čo je naziskovejší
distribučný kanál?
3. Čo je to dátový sklad
Databáza na podporu rozhodovania, ktorá je
udržiavaná mimo produkčnej databázy.
Podpora spracovania informácií
prostredníctvom konsolidovanej databázy s
historickými dátami.
“Dátový sklad je subjektívne-orientovaná,
integrovaná, časovo stála kolekcia dát pre
podporu manažmentu.” - W. H. Inmon
Data warehousing – proces vytvorenia a
používania dátového skladu
3
4. DWH – Subjektívne-orientovaný
organizovaná okolo hlavných subjektov
(zákazník, produkt, predaj)
zameraná na modelovanie a analýzu dát pre
riadiacich pracovníkov na báze transakčných
dát
poskytuje jednoduchý a stručný pohľad na
čiastkové subjekty pomocou vylúčenia dát,
ktoré sú nepoužiteľné pre rozhodovanie
4
5. DWH - Integrovaný
Konštruovaný ako integrovaný priestor
združujúci heterogénne zdroje
Relačné DB, textové súbory, on-line záznamy
Sú implementované techniky integrácie a
dátovej kvality
Zaistenie konzistencie v názvoch, atribútoch
medzi rôznymi dátovými zdrojmi
Konvertovanie zdrojových dát podľa definícií
5
6. DWH – Časovo nezávislý
Časový horizont pre dátový sklad je výrazne
dlhší ako údaje uchovávané v operatívnych
systémoch
Operačné databázy: aktuálna hodnota dát
Data warehouse data: poskytuje informácie z
historickej perspektívy (napr., posledných 5-10
rokov)
Každá kľúčová štruktúra v DW obsahuje
časovú dimenziu, ale nie všetky dáta majú
časovú dimenziu
6
7. DWH – Stály
Fyzicky oddelené uloženie dát
transformovaných z operatívnej evidencie
Operatívne uchovávanie dát v dátovom
sklade nenastáva
DW nepotrebuje transakčné spracovanie a
obnovu
Používa dve operácie:
Inicializačné nahratie dát
Prístup k dátam
7
8. Dátové modelovanie
Proces vytvorenia dátového modelu
informačného systému použitím formálnych
modelovacích techník
Fázy návrhu DB:
Konceptuálny (koncept. model, Chenn)
Logický (normalizácia)
Fyzický (závislé na implementačnom prostr.)
8
10. Relačný model v 3NF
10
Jednoduché nahranie dát
Prenos dát zo zdrojových súborov a ich integrácia
Zložité dotazy
Množstvo JOIN operácií
Ťažšie pochopiteľný bežnými používateľmi
Model pre centrálne úložisko dát podľa Inmona
11. Dimenzionálny dátový model
11
Odporúčaný pre DWH
Zložité ETL
Transformácie dát
Integrácia, ...
Jednoduché reportovanie
Ľahšie pochopiteľný
Rýchlejšie analytické dotazy
Model pre datamarty – podľa Inmona aj
Kimballa
12. Relačný vs. dimenzionálny model
Relačný dátový model v 3-NF
Odstránenie duplicitných dát – zmenšenie počtu
záznamov
Zvýšenie počtu tabuliek
Prepojenie cez cudzie kľúče a tabuľky relácii
Efektívny insert/update, menej efektívne
dotazovanie
Dimenzionálny dátový model (nie je v 3NF)
Adaptácia relačného modelu
Faktové a dimenzionálne tabuľky
Denormalizovaný, duplicitné dáta
Menší počet tabuliek
Efektívne dotazovanie
12
15. DWH vs. DBMS
OLTP (on-line transaction processing)
Hlavné úlohy tradičných DBMS
Denno-denné operácie: nákup, sklady, bankové transakcie,
výroba, mzdy, účtovanie atď.
OLAP (on-line analytical processing)
Hlavne využívané v dátových skladoch
Analýza dát a podpora rozhodovania
Odelené funkcie (OLTP vs. OLAP):
Užívateľsky a systémovo orientovaný: zákazník vs. trh
obsah: aktuálne, detailné vs. historické, konsolidované
Design: ER + aplikačný vs. star + subjektívny
Pohľad: skutočnosť, lokálne vs. evolučne, integrovane
Vzory prístupu: upravovanie vs. read-only, ale komplexné
dotazy
15
20. ETL (Extract, Transform, Load)
Vykonáva sa automatizovane v
pravidelných časových intervaloch
Denne (v noci)
Týždenne
Extrakcia
Extrakcia dát z rozličných zdrojov a
formátov
Validácia dát (správna forma/hodnota)
20
21. ETL (Extract, Transform, Load)
Transformácia
Na dáta je aplikovaná séria funkcií a pravidiel
pre prípravu ich nahratiu do DWH
Čistenie dát
Iba správne dáta môžu byť nahraté
Voľba iba určitých stĺpcov
Kódovanie („Male“ to „M“)
Odvodenie nových hodnôt (hodnota_predaja
= cena * počet_predaných_produktov)
Agregácia (sumarizácia)
Rozdelenie stĺpcov (napr. na dátum a čas)
21
22. ETL (Extract, Transform, Load)
Nahranie
Zabezpečuje nahranie extrahovaných a
transformovaných dát do cieľového
systému (DWH)
22
25. Multidimenzionálna databáza
DW je postavené na multidimenzionom dátovom
modeli, ktorý zobrazuje dáta z dátových kociek
Dátová kocka, ako napr. predaj, dovoluje
modelovať informácie z rôznych dimenzií
Dimenzionálne tabuľky položka (meno_polozky,
značka, typ), alebo čas(den, týždeň, mesiac,
štvrťrok, rok)
Tabuľka faktov obsahuje hodnoty (napr.
predaná_cena) a klúče vytvárajú reláciu s
dimenziou
25
27. Tabuľky faktov
Tabuľka faktov obsahuje dva druhy
atribútov:
kľúčové atribúty – sú to cudzie kľúče z
príslušných tabuliek dimenzií. Primárny
kľúč tabuľky faktov je zložený zo
všetkých jej kľúčových atribútov
nekľúčové atribúty – sú to samotné
fakty, ktoré sa sledujú pre každú
kombináciu nekľúčových atribútov
27
28. Tabuľky faktov
Popisuje konkrétny subjekt podnikania
Fakt, ktorý je uložený v tabuľke, sa sleduje pre
jednotlivé kombinácie dimenzií
Tabuľka faktov obsahuje veľké objemy údajov
Maximálny objem tabuľky sa určuje ako
karteziánsky súčin hodnôt primárnych kľúčov
tabuliek dimenzií
Zdrojom údajov sú bázické dáta z podnikových
informačných zdrojov
Údaje sa pri transformácií do tabuľky faktov
sumarizujú a agregujú na požadovanej úrovni
Každú tabuľku faktov charakterizuje granularita,
ktorá vyjadruje stupeň agregácie faktov oproti
zdrojovým údajom
28
29. Tabuľky dimenzií
Ohraničujú výber konkrétneho faktu z
tabuľky faktov.
Obsahujú atribúty, ktoré popisujú vybranú
dimenziu a charakterizujú význam
jednotlivých faktov v tabuľke faktov.
Príkladmi dimenzií sú:
produkt
čas
geografická lokalita
29
43. STAR vs. SNOWFLAKE
SNOWFLAKE STAR
Jednoduchosť
údržby/zmeny
Neobsahuje redundanciu takže
je ľahšie modifikovateľná
Obsahuje redundantné dáta
Jednoduchosť
používania
Komplexnejšie dotazy a tým aj
menej zrozumiteľné
Menšia komplexnosť dotazov a
jednoduchšia na pochopenie
Rýchlosť
vykonania
dotazov
Veľa cudzích kľúčov, čo
spôsobuje dlhší čas vykonania
dotazov
Menší počet cudzích kľúčov a tým
aj rýchlejšie vykonanie dotazov
DWH typ Komplexné vzťahy
(many:many)
Jednoduché vzťahy
(1:1 alebo 1:many)
Join Veľký počet Malý počet
Počet tabuliek
dimenzií
Môže obsahovať viac ako
jednu tabuľku pre každú
dimenziu
Obsahuje iba jednu tabuľku pre
dimenziu
Kedy použiť Ak je DT veľká, šetrí miesto Ak tabuľka dimenzií obsahuje
menší počet riadkov
Normalizácia
schémy
DT je normalizovaná, FT
denormalizovaná
Obe DT aj FT sú denormalizované
43
47. Využitie DWH
47
Spracovanie informácií
Podporuje dotazovanie a základné štatistické
analýzy, reporting a grafy a tabulky
Spracovanie analýz
Multidimenzionálna analýza v DW
Podporuje základné OLAP operácie, slice-dice,
drillovanie, pivoting
Data mining
Objavovanie znalostí z ukrytých vzorov správania
Podporuje asociácie, vytváranie analytických
modelov, vykonáva klasifikáciu informácií
vizualizáciu, …
48. Závery
48
Prečo data warehousing?
Dátové modelovanie (E-R vs. dimenzionálne)
Multidimenzionálny model dátového skladu
Star schema, snowflake schema, fakty
Dátová kocka - dimenzie & hodnoty
ETL proces (Extract, Transform, Load)
OLAP operácie: drilling, rolling, slicing, dicing and
pivoting