SlideShare a Scribd company logo
1 of 31
Download to read offline
Emerging trends In Database
Systems
Introduction
2
Dramatic advances in data capture, processing power, data
transmission, and storage capabilities are enabling organisations to
integrate their various databases into data warehouses.
Data warehousing is defined as a process of centralised data
management and retrieval.
Data warehousing represents an ideal vision of maintaining a central
repository of all organisational data. Centralisation of data is needed
to maximize user access and analysis.
As knowledge becomes the new currency of organizations,
information now is viewed in an entirely new way - as a strategic
source of opportunity.
With this new focus on the information delivery, government and
industry are looking to Data Warehousing as valuable construct to
convert data to information.
Data Mining & Data Warehouse
3
Data mining is a broad technology that can potentially benefit any
functional area in a business where there is a major need or
opportunity for improved performance and where data analysis can
impact that improvement.
Part of the power of data mining is that it not only solves difficult
business problems, but it does so in ways that are repeatable.
The data mining process involves developing models that can be used
to solve the business problem at hand. Since they are models, they
can be reused on new data.
As the data in the warehouse is refreshed, the models can be re-run
on new data and new results obtained.
Data Mining: A KDD Process
4
Data Cleaning
Data Integration
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
Databases
Characteristics of a Data Warehouse
5
A common way of introducing data warehousing is to refer to the
characteristics of a data warehouse as set forth by William Inmon:
Subject Oriented
Integrated
Non-volatile *
Time Variant *
The major characteristics of data warehouse are:
Organisation
Consistency
Non-volatile *
Time Variant *
Relational
Client/server
Web-based
(Turban, McLean, Wetherbe)
Characteristics of a Data Warehouse
6
Organisation: Data are organised by subject (e.g. by customer, vendor,
product, price level, and region) and contain information relevant for
decision support only.
Consistency: Data in different operational databases may be encoded
differently. In the warehouse they will be coded in a consistent manner.
Relational: Typically the data warehouse uses a relational structure.
Client/server: The data warehouse uses the client/server architecture
mainly to provide the end user an easy access to its data.
Web-based: Today’s data warehouses are designed to provide an
efficient computing environment for Web-based applications
(Rundensteiner et.al., 2000)
CISM01 Intelligent Systems for Management Unit 9
Subject Oriented
7
Data warehouses are designed to help you analyse data.
For example, to learn more about your company's sales data, you can
build a warehouse that concentrates on sales.
Using this warehouse, you can answer questions like "Who was our
best customer for this item last year?" This ability to define a data
warehouse by subject matter, sales in this case, makes the data
warehouse subject oriented.
Integrated
8
Integration is closely related to subject orientation.
Data warehouses must put data from disparate sources into a
consistent format.
They must resolve such problems as naming conflicts and
inconsistencies among units of measure.
When they achieve this, they are said to be integrated.
Nonvolatile
9
Non-volatile means that, once entered into the warehouse, data are not
changed/updated.
This is logical because the purpose of a warehouse is to enable you to
analyse what has occurred.
Time Variant
10
In order to discover trends in business, analysts need large amounts of
data.
This is very much in contrast to online transaction processing (OLTP)
systems, where performance requirements demand that historical data
be moved to an archive.
The data are kept for many years so they can be used for trends,
forecasting, and comparisons over time.
A data warehouse's focus on change over time is what is
meant by the term time variant.
Data Marts
11
The high cost of data warehouses confines their use to large
companies.
An alternative used by many other firms is creation of a lower cost,
scaled-down version of a data warehouse called a data mart.
A data mart is a small warehouse designed for a strategic business
unit (SBU) or a department.
The advantages of data marts include:
Low cost
Significantly shorter lead time for implementation
Local rather than central control, conferring power on the using group
Data Marts
12
From statistical viewpoint, a data mart should be organised according
to two principles:
The statistical units, the elements in the reference population that are
considered important for the aims of the analysis (e.g. the supply
companies, the customers, the people who visit the site).
The statistical variables, the important characteristics, measured for
each statistical unit (e.g. the amounts customers buy, the payment
methods they use, the socio-demographic profile of each customer).
Operational vs. Informational
13
Operational Data Data Warehouse
Application OLTP OLAP
Use Precise Queries Ad Hoc
Temporal Snapshot Historical
Modification Dynamic Static
Orientation Application Business
Data Operational Values Integrated
Size Gigabits Terabits
Level Detailed Summarized
Access Often Less Often
Response Few Seconds Minutes
Data Schema Relational Star/Snowflake
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date
detailed, flat relational
isolated
historical,
summarized, multidimensional
integrated, consolidated
usage repetitive ad-hoc
access read/write
index/hash on prim. key
lots of scans
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
14
Contrasting OLTP & Data Warehousing Environments
15
Few Many
Many Some
Normalised
DBMS
Denormalised
DBMS
Rare Common
Indexes
Joins
Duplicated Data
Derived Data &
Aggregates
OLTP Data Warehouse
Complex Data Structures Multidimensional Data
structures
Contrasting OLTP & Data Warehousing Environments
16
One major difference between the types of system is that data
warehouses are not usually in third normal form (3NF), a type of data
normalisation common in OLTP environments.
Data warehouses and OLTP systems have very different requirements.
Here are some examples of differences between typical data
warehouses and OLTP systems:
Workload
Data warehouses are designed to accommodate ad hoc queries. You might
not know the workload of your data warehouse in advance, so a data
warehouse should be optimised to perform well for a wide variety of
possible query operations.
OLTP systems support only predefined operations. Your applications might
be specifically tuned or designed to support only these operations.
Contrasting OLTP & Data Warehousing Environments
17
Data modifications
A data warehouse is updated on a regular basis using bulk data
modification techniques. The end users of a data warehouse do not
directly update the data warehouse.
In OLTP systems, end users routinely issue individual data modification
statements to the database. The OLTP database is always up to date, and
reflects the current state of each business transaction.
Contrasting OLTP & Data Warehousing Environments
18
Schema design
Data warehouses often use denormalised or partially denormalised
schemas (such as a star schema) to optimise query performance.
OLTP systems often use fully normalized schemas to optimise
update/insert/delete performance, and to guarantee data consistency.
Typical operations
A typical data warehouse query scans thousands or millions of rows. For
example, "Find the total sales for all customers last month."
A typical OLTP operation accesses only a handful of records. For
example, "Retrieve the current order for this customer."
Contrasting OLTP & Data Warehousing Environments
19
Historical data
Data warehouses usually store many months or years of data. This is to
support historical analysis.
OLTP systems usually store data from only a few weeks or months. The
OLTP system stores only historical data as needed to successfully meet
the requirements of the current transaction.
Levels
of the Data Warehouse Architecture
20
Organisationally
structured: to meet the
informational requirements
of the entire organisation.
Organisationally
Structured
Departmentally
Structured
Individually
Structured
Departmentally structured:
structured to meet the focused
informational requirements of
the distinct group identified by
a specific business function.
Individually structured:
structured to meet an even
more focused set of
informational requirements
as defined by a specific
management function.
Metadata
21
Metadata describes types of information that are stored in the database.
For a data warehouse, metadata provides discipline, since changes to
the warehouse must be reflected in the metadata to be communicated to
users.
A good metadata system helps ensure the success of a data warehouse
by making users more aware of and comfortable with the contents. It
proves valuable assistance in understanding data.
The metadata repository is an often overlooked component of the data
warehousing environment.
(from Berry & Linoff, 2004)
Without metadata, the data warehouse and its associated components
in the architected environment are merely disjoined components
working independently and with separate goals.
Data Warehouse Architectures:
Basic
22
Meta
Data
Operational Systems
Flat Files
Analysis
Reporting
Data Mining
Data Sources Warehouse Users
Summary
Data
Row
Data
A General Architecture for Data Warehousing
23
OLAP Engine
Data Storage Front-End Tools
Extract
Transform
Load
Refresh
Analysis
Query
Reporting
Data mining
Operational
DBs
other sources
Data Warehouse
Server
(Tier 1)
OLAP Servers
(Tier 2)
Client analysis tools
(Tier 3)
Data Marts
e.g., MOLAP
e.g., ROLAP
Data
Warehouse
Serve
Information
Sources
A General Architecture for
Data Warehousing
24
The major components of data warehouse architecture are:
Source systems are where the data comes from.
Extraction, transformation, and load (ETL) move data between different
data stores.
The central repository is the main store for the data warehouse.
The metadata repository describes what is available and where.
Data marts provide fast, specialised access for end users and
applications.
Operational feedback integrates decision support back into the
operational systems.
End-users are the reason for developing the warehouse in the first place
MOLAP:
Multi-Dimensional
On-Line Analytical Processing
ROLAP:
Relational
On-Line Analytical Processing
Cloud Database
A cloud database is a database that typically runs on a cloud
computing platform and access to the database is provided as-a-
service.
25
methods to run a database in a cloud
• There are two primary methods to run a database in a cloud:
• Virtual machine image
• Cloud platforms allow users to purchase virtual-machine instances for a
limited time, and one can run a database on such virtual machines. Users
can either upload their own machine image with a database installed on it,
or use ready-made machine images that already include an optimized
installation of a database.
•
• Database-as-a-service (DBaaS)
• With a database as a service model, application owners do not have to install
and maintain the database themselves. Instead, the database service
provider takes responsibility for installing and maintaining the database, and
application owners are charged according to their usage of the service. This
is a type of software as a service (SaaS).
26
Architecture and common characteristics
• Most database services offer web-based consoles, which the end user can
use to provision and configure database instances.
• Database services consist of a database-manager component, which controls
the underlying database instances using a service API. The service API is
exposed to the end user, and permits users to perform maintenance and
scaling operations on their database instances.
• Underlying software-stack stack typically includes the operating system, the
database and third-party software used to manage the database. The service
provider is responsible for installing, patching and updating the underlying
software stack and ensuring the overall health and performance of the
database.
• Scalability features differ between vendors – some offer auto-scaling, others
enable the user to scale up using an API, but do not scale automatically.
• There is typically a commitment for a certain level of high availability (e.g.
99.9% or 99.99%). This is achieved by replicating data and failing instances
over to other database instances.
27
Data model
• Advanced queries expressed in SQL work well with the strict relationships
that are imposed on information by relational databases. However, relational
database technology was not initially designed or developed for use over
distributed systems. This issue has been addressed with the addition of
clustering enhancements to the relational databases, although some basic
tasks require complex and expensive protocols, such as with data
synchronization.
• Modern relational databases have shown poor performance on data-
intensive systems, therefore, the idea of NoSQL has been utilized within
database management systems for cloud based systems. Within NoSQL
implemented storage, there are no requirements for fixed table schemas,
and the use of join operations is avoided. "The NoSQL databases have
proven to provide efficient horizontal scalability, good performance, and
ease of assembly into cloud applications." Data models relying on simplified
relay algorithms have also been employed in data-intensive cloud mapping
applications unique to virtual frameworks.
28
Difference between cloud databases which are
relational as opposed to non-relational or
NoSQL:
• SQL databases
• Are one type of database which can run in the cloud, either in a virtual
machine or as a service, depending on the vendor. While SQL databases are
easily vertically scalable, horizontal scalability poses a challenge, that cloud
database services based on SQL have started to address.
• EDB Postgres Advanced Server
• IBM Db2
• Ingres (database)
• MariaDB
• MySQL
• NuoDB
• Oracle Database
• PostgreSQL
• SAP HANA
• YugabyteDB
29
• NoSQL databases
• Are another type of database which can run in the cloud. NoSQL databases
are built to service heavy read/write loads and can scale up and down easily,
and therefore they are more natively suited to running in the cloud.
However, most contemporary applications are built around an SQL data
model, so working with NoSQL databases often requires a complete rewrite
of application code.
• Some SQL databases have developed NoSQL capabilities including JSON,
binary JSON (e.g. BSON or similar variants), and key-value store data types.
• A multi-model database with relational and non-relational capabilities
provides a standard SQL interface to users and applications and thus
facilitates the usage of such databases for contemporary applications built
around an SQL data model. Native multi-model databases support multiple
data models with one core and a unified query language to access all data
models.
30
• Examples
• Apache Cassandra on Amazon EC2 or Google Compute Engine
• ArangoDB on Amazon EC2, Google Compute or Microsoft Azure
• Clusterpoint Database Virtual Box VM
• CouchDB on Amazon EC2 or Google Cloud Platform
• EDB Postgres Advanced Server
• Hadoop on Amazon EC2, Google Cloud Platform, or Rackspace
• MarkLogic on Amazon EC2 or Google Cloud Platform
• MongoDB on Amazon EC2, Google Compute Engine, Microsoft Azure, or
Rackspace
• Neo4J on Amazon EC2 or Microsoft Azure
• ScyllaDB on Amazon EC2 or Google Cloud Platform
• YugabyteDB
31

More Related Content

Similar to TOPIC 9 data warehousing and data mining.pdf

Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
Data warehouse
Data warehouseData warehouse
Data warehouseMR Z
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouseKrish_ver2
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etlAashish Rathod
 
11667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect411667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect4ambujm
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
data-warehousing-ppt[1].pptx
data-warehousing-ppt[1].pptxdata-warehousing-ppt[1].pptx
data-warehousing-ppt[1].pptxAtharun504
 
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxUNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxshruthisweety4
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forAyushMeraki1
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paperjuly12jana
 

Similar to TOPIC 9 data warehousing and data mining.pdf (20)

Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
11667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect411667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect4
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Abstract
AbstractAbstract
Abstract
 
data-warehousing-ppt[1].pptx
data-warehousing-ppt[1].pptxdata-warehousing-ppt[1].pptx
data-warehousing-ppt[1].pptx
 
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxUNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Unit 1
Unit 1Unit 1
Unit 1
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining for
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paper
 
Data warehouse presentation
Data warehouse presentationData warehouse presentation
Data warehouse presentation
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

TOPIC 9 data warehousing and data mining.pdf

  • 1. Emerging trends In Database Systems
  • 2. Introduction 2 Dramatic advances in data capture, processing power, data transmission, and storage capabilities are enabling organisations to integrate their various databases into data warehouses. Data warehousing is defined as a process of centralised data management and retrieval. Data warehousing represents an ideal vision of maintaining a central repository of all organisational data. Centralisation of data is needed to maximize user access and analysis. As knowledge becomes the new currency of organizations, information now is viewed in an entirely new way - as a strategic source of opportunity. With this new focus on the information delivery, government and industry are looking to Data Warehousing as valuable construct to convert data to information.
  • 3. Data Mining & Data Warehouse 3 Data mining is a broad technology that can potentially benefit any functional area in a business where there is a major need or opportunity for improved performance and where data analysis can impact that improvement. Part of the power of data mining is that it not only solves difficult business problems, but it does so in ways that are repeatable. The data mining process involves developing models that can be used to solve the business problem at hand. Since they are models, they can be reused on new data. As the data in the warehouse is refreshed, the models can be re-run on new data and new results obtained.
  • 4. Data Mining: A KDD Process 4 Data Cleaning Data Integration Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation Databases
  • 5. Characteristics of a Data Warehouse 5 A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as set forth by William Inmon: Subject Oriented Integrated Non-volatile * Time Variant * The major characteristics of data warehouse are: Organisation Consistency Non-volatile * Time Variant * Relational Client/server Web-based (Turban, McLean, Wetherbe)
  • 6. Characteristics of a Data Warehouse 6 Organisation: Data are organised by subject (e.g. by customer, vendor, product, price level, and region) and contain information relevant for decision support only. Consistency: Data in different operational databases may be encoded differently. In the warehouse they will be coded in a consistent manner. Relational: Typically the data warehouse uses a relational structure. Client/server: The data warehouse uses the client/server architecture mainly to provide the end user an easy access to its data. Web-based: Today’s data warehouses are designed to provide an efficient computing environment for Web-based applications (Rundensteiner et.al., 2000) CISM01 Intelligent Systems for Management Unit 9
  • 7. Subject Oriented 7 Data warehouses are designed to help you analyse data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented.
  • 8. Integrated 8 Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated.
  • 9. Nonvolatile 9 Non-volatile means that, once entered into the warehouse, data are not changed/updated. This is logical because the purpose of a warehouse is to enable you to analyse what has occurred.
  • 10. Time Variant 10 In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. The data are kept for many years so they can be used for trends, forecasting, and comparisons over time. A data warehouse's focus on change over time is what is meant by the term time variant.
  • 11. Data Marts 11 The high cost of data warehouses confines their use to large companies. An alternative used by many other firms is creation of a lower cost, scaled-down version of a data warehouse called a data mart. A data mart is a small warehouse designed for a strategic business unit (SBU) or a department. The advantages of data marts include: Low cost Significantly shorter lead time for implementation Local rather than central control, conferring power on the using group
  • 12. Data Marts 12 From statistical viewpoint, a data mart should be organised according to two principles: The statistical units, the elements in the reference population that are considered important for the aims of the analysis (e.g. the supply companies, the customers, the people who visit the site). The statistical variables, the important characteristics, measured for each statistical unit (e.g. the amounts customers buy, the payment methods they use, the socio-demographic profile of each customer).
  • 13. Operational vs. Informational 13 Operational Data Data Warehouse Application OLTP OLAP Use Precise Queries Ad Hoc Temporal Snapshot Historical Modification Dynamic Static Orientation Application Business Data Operational Values Integrated Size Gigabits Terabits Level Detailed Summarized Access Often Less Often Response Few Seconds Minutes Data Schema Relational Star/Snowflake
  • 14. OLTP vs. OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date detailed, flat relational isolated historical, summarized, multidimensional integrated, consolidated usage repetitive ad-hoc access read/write index/hash on prim. key lots of scans unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response 14
  • 15. Contrasting OLTP & Data Warehousing Environments 15 Few Many Many Some Normalised DBMS Denormalised DBMS Rare Common Indexes Joins Duplicated Data Derived Data & Aggregates OLTP Data Warehouse Complex Data Structures Multidimensional Data structures
  • 16. Contrasting OLTP & Data Warehousing Environments 16 One major difference between the types of system is that data warehouses are not usually in third normal form (3NF), a type of data normalisation common in OLTP environments. Data warehouses and OLTP systems have very different requirements. Here are some examples of differences between typical data warehouses and OLTP systems: Workload Data warehouses are designed to accommodate ad hoc queries. You might not know the workload of your data warehouse in advance, so a data warehouse should be optimised to perform well for a wide variety of possible query operations. OLTP systems support only predefined operations. Your applications might be specifically tuned or designed to support only these operations.
  • 17. Contrasting OLTP & Data Warehousing Environments 17 Data modifications A data warehouse is updated on a regular basis using bulk data modification techniques. The end users of a data warehouse do not directly update the data warehouse. In OLTP systems, end users routinely issue individual data modification statements to the database. The OLTP database is always up to date, and reflects the current state of each business transaction.
  • 18. Contrasting OLTP & Data Warehousing Environments 18 Schema design Data warehouses often use denormalised or partially denormalised schemas (such as a star schema) to optimise query performance. OLTP systems often use fully normalized schemas to optimise update/insert/delete performance, and to guarantee data consistency. Typical operations A typical data warehouse query scans thousands or millions of rows. For example, "Find the total sales for all customers last month." A typical OLTP operation accesses only a handful of records. For example, "Retrieve the current order for this customer."
  • 19. Contrasting OLTP & Data Warehousing Environments 19 Historical data Data warehouses usually store many months or years of data. This is to support historical analysis. OLTP systems usually store data from only a few weeks or months. The OLTP system stores only historical data as needed to successfully meet the requirements of the current transaction.
  • 20. Levels of the Data Warehouse Architecture 20 Organisationally structured: to meet the informational requirements of the entire organisation. Organisationally Structured Departmentally Structured Individually Structured Departmentally structured: structured to meet the focused informational requirements of the distinct group identified by a specific business function. Individually structured: structured to meet an even more focused set of informational requirements as defined by a specific management function.
  • 21. Metadata 21 Metadata describes types of information that are stored in the database. For a data warehouse, metadata provides discipline, since changes to the warehouse must be reflected in the metadata to be communicated to users. A good metadata system helps ensure the success of a data warehouse by making users more aware of and comfortable with the contents. It proves valuable assistance in understanding data. The metadata repository is an often overlooked component of the data warehousing environment. (from Berry & Linoff, 2004) Without metadata, the data warehouse and its associated components in the architected environment are merely disjoined components working independently and with separate goals.
  • 22. Data Warehouse Architectures: Basic 22 Meta Data Operational Systems Flat Files Analysis Reporting Data Mining Data Sources Warehouse Users Summary Data Row Data
  • 23. A General Architecture for Data Warehousing 23 OLAP Engine Data Storage Front-End Tools Extract Transform Load Refresh Analysis Query Reporting Data mining Operational DBs other sources Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Client analysis tools (Tier 3) Data Marts e.g., MOLAP e.g., ROLAP Data Warehouse Serve Information Sources
  • 24. A General Architecture for Data Warehousing 24 The major components of data warehouse architecture are: Source systems are where the data comes from. Extraction, transformation, and load (ETL) move data between different data stores. The central repository is the main store for the data warehouse. The metadata repository describes what is available and where. Data marts provide fast, specialised access for end users and applications. Operational feedback integrates decision support back into the operational systems. End-users are the reason for developing the warehouse in the first place MOLAP: Multi-Dimensional On-Line Analytical Processing ROLAP: Relational On-Line Analytical Processing
  • 25. Cloud Database A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a- service. 25
  • 26. methods to run a database in a cloud • There are two primary methods to run a database in a cloud: • Virtual machine image • Cloud platforms allow users to purchase virtual-machine instances for a limited time, and one can run a database on such virtual machines. Users can either upload their own machine image with a database installed on it, or use ready-made machine images that already include an optimized installation of a database. • • Database-as-a-service (DBaaS) • With a database as a service model, application owners do not have to install and maintain the database themselves. Instead, the database service provider takes responsibility for installing and maintaining the database, and application owners are charged according to their usage of the service. This is a type of software as a service (SaaS). 26
  • 27. Architecture and common characteristics • Most database services offer web-based consoles, which the end user can use to provision and configure database instances. • Database services consist of a database-manager component, which controls the underlying database instances using a service API. The service API is exposed to the end user, and permits users to perform maintenance and scaling operations on their database instances. • Underlying software-stack stack typically includes the operating system, the database and third-party software used to manage the database. The service provider is responsible for installing, patching and updating the underlying software stack and ensuring the overall health and performance of the database. • Scalability features differ between vendors – some offer auto-scaling, others enable the user to scale up using an API, but do not scale automatically. • There is typically a commitment for a certain level of high availability (e.g. 99.9% or 99.99%). This is achieved by replicating data and failing instances over to other database instances. 27
  • 28. Data model • Advanced queries expressed in SQL work well with the strict relationships that are imposed on information by relational databases. However, relational database technology was not initially designed or developed for use over distributed systems. This issue has been addressed with the addition of clustering enhancements to the relational databases, although some basic tasks require complex and expensive protocols, such as with data synchronization. • Modern relational databases have shown poor performance on data- intensive systems, therefore, the idea of NoSQL has been utilized within database management systems for cloud based systems. Within NoSQL implemented storage, there are no requirements for fixed table schemas, and the use of join operations is avoided. "The NoSQL databases have proven to provide efficient horizontal scalability, good performance, and ease of assembly into cloud applications." Data models relying on simplified relay algorithms have also been employed in data-intensive cloud mapping applications unique to virtual frameworks. 28
  • 29. Difference between cloud databases which are relational as opposed to non-relational or NoSQL: • SQL databases • Are one type of database which can run in the cloud, either in a virtual machine or as a service, depending on the vendor. While SQL databases are easily vertically scalable, horizontal scalability poses a challenge, that cloud database services based on SQL have started to address. • EDB Postgres Advanced Server • IBM Db2 • Ingres (database) • MariaDB • MySQL • NuoDB • Oracle Database • PostgreSQL • SAP HANA • YugabyteDB 29
  • 30. • NoSQL databases • Are another type of database which can run in the cloud. NoSQL databases are built to service heavy read/write loads and can scale up and down easily, and therefore they are more natively suited to running in the cloud. However, most contemporary applications are built around an SQL data model, so working with NoSQL databases often requires a complete rewrite of application code. • Some SQL databases have developed NoSQL capabilities including JSON, binary JSON (e.g. BSON or similar variants), and key-value store data types. • A multi-model database with relational and non-relational capabilities provides a standard SQL interface to users and applications and thus facilitates the usage of such databases for contemporary applications built around an SQL data model. Native multi-model databases support multiple data models with one core and a unified query language to access all data models. 30
  • 31. • Examples • Apache Cassandra on Amazon EC2 or Google Compute Engine • ArangoDB on Amazon EC2, Google Compute or Microsoft Azure • Clusterpoint Database Virtual Box VM • CouchDB on Amazon EC2 or Google Cloud Platform • EDB Postgres Advanced Server • Hadoop on Amazon EC2, Google Cloud Platform, or Rackspace • MarkLogic on Amazon EC2 or Google Cloud Platform • MongoDB on Amazon EC2, Google Compute Engine, Microsoft Azure, or Rackspace • Neo4J on Amazon EC2 or Microsoft Azure • ScyllaDB on Amazon EC2 or Google Cloud Platform • YugabyteDB 31