Unit III
Database
Administrat
ion
What is Data
Warehousing?
▸ A Data Warehousing (DW) is process for collecting and
managing data from varied sources to provide meaningful
business insights.
▸ A Data warehouse is typically used to connect and analyze
business data from heterogeneous sources.
▸ The data warehouse is the core of the BI (Business
Intelligence) system which is built for data analysis and
reporting.
2
What is Data
Warehousing? …
▸ It is a blend of technologies and components which aids
the strategic use of data.
▸ It is electronic storage of a large amount of information
by a business which is designed for query and analysis
instead of transaction processing.
▸ It is a process of transforming data into information and
making it available to users in a timely manner to make a
difference.
3
What is Data
Warehousing? …
▸ The decision support database (Data Warehouse) is
maintained separately from the organization's operational
database.
▸ However, the data warehouse is not a product but an
environment.
▸ It is an architectural construct of an information system
which provides users with current and historical decision
support information which is difficult to access or present in
the traditional operational data store.
4
What is Data
Warehousing? …
▸ You many know that a 3NF-designed database for an
inventory system many have tables related to each other.
▸ For example, a report on current inventory information
can include more than 12 joined conditions.
▹ This can quickly slow down the response time of the query and report.
▸ A data warehouse provides a new design which can help
to reduce the response time and helps to enhance the
performance of queries for reports and analytics.
5
What is Data
Warehousing? …
▸ Data warehouse system is also known by the following
name:
▹ Decision Support System (DSS)
▹ Executive Information System
▹ Management Information System
▹ Business Intelligence Solution
▹ Analytic Application
▹ Data Warehouse
6
What is Data
Warehousing? …
7
History of Data
Warehousing
▸ The Datawarehouse benefits users to understand
and enhance their organization's performance.
▸ The need to warehouse data evolved as
computer systems became more complex and
needed to handle increasing amounts of
Information.
8
History of Data
Warehousing …
▸ Here are some key events in evolution of Data
Warehouse-
▹ 1960- Dartmouth and General Mills in a joint research project,
develop the terms dimensions and facts.
▹ 1970- A Nielsen and IRI introduces dimensional data marts for
retail sales.
▹ 1983- Tera Data Corporation introduces a database management
system which is specifically designed for decision support
9
History of Data
Warehousing …
▹ Data warehousing started in the late 1980s when
IBM worker Paul Murphy and Barry Devlin
developed the Business Data Warehouse.
▹ However, the real concept was given by Inmon
Bill.
▹ He was considered as a father of data
warehouse.
▹ He had written about a variety of topics for
10
How does a Datawarehouse
work?
▸ A Data Warehouse works as a central
repository where information arrives from
one or more data sources.
▸ Data flows into a data warehouse from the
transactional system and other relational
databases.
11
How does a Datawarehouse
work?…
▸ Data may be:
Structured
Semi-structured
Unstructured data
12
How does a Datawarehouse
work?…
▸ The data is processed, transformed, and
ingested so that users can access the
processed data in the Data Warehouse
through Business Intelligence tools, SQL
clients, and spreadsheets.
▸ A data warehouse merges information
coming from different sources into one
13
How does a Datawarehouse
work?…
▸ By merging all of this information in one place, an
organization can analyze its customers more holistically.
▸ This helps to ensure that it has considered all the
information available.
▸ Data warehousing makes data mining possible.
▹ Data mining is looking for patterns in the data that may lead to higher
sales and profits.
14
Types of
Datawarehouse
▸ Three main types of Data Warehouses
(DWH) are:
1. Enterprise Data Warehouse
(EDW)
2. Operational Data Store
3. Data Mart
15
1. Enterprise Data Warehouse
(EDW):
▸ Enterprise Data Warehouse (EDW) is a centralized
warehouse.
▸ It provides decision support service across the enterprise.
▸ It offers a unified approach for organizing and representing
data.
▸ It also provide the ability to classify data according to the
subject and give access according to those divisions.
16
2. Operational Data Store
(ODS):
▸ Operational Data Store, which is also called ODS, are
nothing but data store required when neither Data
warehouse nor OLTP systems support organizations
reporting needs.
▸ In ODS, Data warehouse is refreshed in real time.
▸ Hence, it is widely preferred for routine activities like
storing records of the Employees.
17
3. Data Mart:
▸ A data mart is a subset of the data
warehouse.
▸ It specially designed for a particular line of
business, such as sales, finance, sales or
finance.
▸ In an independent data mart, data can collect
18
General stages of Data
Warehouse
▸ Earlier, organizations started relatively simple use of data
warehousing.
▸ However, over time, more sophisticated use of data warehousing
begun.
▸ The following are general stages of use of the data warehouse
(DWH):
 Offline Operational Database
 Offline Data Warehouse
 Real time Data Warehouse
19
General stages of Data
Warehouse …
Offline Operational Database:
▸ In this stage, data is just copied from an
operational system to another server.
▸ In this way, loading, processing, and
reporting of the copied data do not impact
the operational system's performance.
20
General stages of Data
Warehouse …
Offline Data Warehouse:
▸ Data in the Datawarehouse is regularly
updated from the Operational Database.
▸ The data in Datawarehouse is mapped and
transformed to meet the Datawarehouse
objectives.
21
General stages of Data
Warehouse …
Real time Data Warehouse:
▸ In this stage, Data warehouses are
updated whenever any transaction takes
place in operational database.
▸ For example, Airline or railway booking
system.
22
General stages of Data
Warehouse …
Integrated Data Warehouse:
▸ In this stage, Data Warehouses are updated
continuously when the operational system
performs a transaction.
▸ The Datawarehouse then generates
transactions which are passed back to the
operational system.
23
Components of Data
Warehouse
▸ Four components of Data Warehouses are:
a)Load manager
b)Warehouse Manager
c)Query Manager
d)End-user access tools
24
Components of Data
Warehouse …
Load manager:
▸ Load manager is also called the front component.
▸ It performs with all the operations associated with
the extraction and load of data into the warehouse.
▸ These operations include transformations to
prepare the data for entering into the Data
warehouse.
25
Components of Data
Warehouse …
Warehouse Manager:
▸ Warehouse manager performs operations associated
with the management of the data in the warehouse.
▸ It performs operations like analysis of data to ensure
consistency, creation of indexes and views,
generation of denormalization and aggregations,
transformation and merging of source data and
archiving and baking-up data.
26
Components of Data
Warehouse …
Query Manager:
▸ Query manager is also known as backend component.
▸ It performs all the operation operations related to the
management of user queries.
▸ The operations of this Data warehouse components are
direct queries to the appropriate tables for scheduling
the execution of queries.
27
Components of Data
Warehouse …
End-user access tools:
▸ This is categorized into five different
groups like
1) Data Reporting
2) Query Tools
3) Application development tools
4) EIS tools,
5) OLAP tools and data mining tools.
28
Who needs Data
Warehouse
▸ DWH (Data warehouse) is needed for all types of users like:
 Decision makers who rely on mass amount of data
 Users who use customized, complex processes to obtain information from multiple
data sources.
 It is also used by the people who want simple technology to access the data
 It also essential for those people who want a systematic approach for making
decisions.
 If the user wants fast performance on a huge amount of data which is a necessity for
reports, grids or charts, then Data warehouse proves useful.
 Data warehouse is a first step If you want to discover 'hidden patterns' of data-flows
and groupings.
29
What is a Data Warehouse used
for?
▸ Here, are most common sectors where Data warehouse is used:
 Airline
 Banking
 Healthcare
 Public sector
 Investment and Insurance sector
 Retain chain
 Telecommunication
 Hospitality Industry
30
What is a Data Warehouse used
for? …
Airline:
▸ In the Airline system, it is used for operation purpose like crew
assignment, analyses of route profitability, frequent flyer program
promotions, etc.
Banking:
▸ It is widely used in the banking sector to manage the resources
available on desk effectively.
▸ Few banks also used for the market research, performance analysis
of the product and operations.
31
What is a Data Warehouse used
for? …
Healthcare:
▸ Healthcare sector also used Data warehouse to strategize and
predict outcomes, generate patient's treatment reports, share
data with tie-in insurance companies, medical aid services, etc.
Public sector:
▸ In the public sector, data warehouse is used for intelligence
gathering.
▸ It helps government agencies to maintain and analyze tax
records, health policy records, for every individual.
32
What is a Data Warehouse used
for? …
Investment and Insurance sector:
▸ In this sector, the warehouses are primarily used to analyze
data patterns, customer trends, and to track market
movements.
Retain chain:
▸ In retail chains, Data warehouse is widely used for distribution
and marketing.
▸ It also helps to track items, customer buying pattern,
promotions and also used for determining pricing policy.
33
What is a Data Warehouse used
for? …
Telecommunication:
▸ A data warehouse is used in this sector for product
promotions, sales decisions and to make distribution
decisions.
Hospitality Industry:
▸ This Industry utilizes warehouse services to design as well
as estimate their advertising and promotion campaigns
where they want to target clients based on their feedback
and travel patterns.
34
Steps to implement Data
Warehouse
▸ The best way to address the business risk associated with
a Datawarehouse implementation is to employ a three-
prong strategy as below
1.Enterprise strategy
2.Phased delivery
3.Iterative Prototyping
35
Steps to implement Data
Warehouse …
1. Enterprise strategy: Here we identify technical including current
architecture and tools.
▹ We also identify facts, dimensions, and attributes. Data mapping and transformation is also
passed.
2. Phased delivery: Datawarehouse implementation should be phased
based on subject areas.
▹ Related business entities like booking and billing should be first implemented and then
integrated with each other.
3. Iterative Prototyping: Rather than a big bang approach to
implementation, the Datawarehouse should be developed and
tested iteratively.
36
Steps to implement Data
Warehouse …
▸ Here, are key steps in Datawarehouse implementation along with its
deliverables.
37
Step Tasks Deliverables
1 Need to define project scope Scope Definition
2 Need to determine business needs Logical Data Model
3 Define Operational Datastore requirements Operational Data Store Model
4 Acquire or develop Extraction tools Extract tools and Software
5 Define Data Warehouse Data requirements Transition Data Model
6 Document missing data To Do Project List
7 Maps Operational Data Store to Data Warehouse D/W Data Integration Map
8 Develop Data Warehouse Database design D/W Database Design
9 Extract Data from Operational Data Store Integrated D/W Data Extracts
10 Load Data Warehouse Initial Data Load
11 Maintain Data Warehouse On-going Data Access and Subsequent Loads
Best practices to implement a Data
Warehouse
▸ Decide a plan to test the consistency, accuracy, and integrity of the
data.
▸ The data warehouse must be well integrated, well defined and time
stamped.
▸ While designing Datawarehouse make sure you use right tool, stick
to life cycle, take care about data conflicts and ready to learn you're
your mistakes.
▸ Never replace operational systems and reports
38
Best practices to implement a Data
Warehouse …
▸ Don't spend too much time on extracting, cleaning
and loading data.
▸ Ensure to involve all stakeholders including business
personnel in Datawarehouse implementation
process.
▹ Establish that Data warehousing is a joint/ team project.
▹ You don't want to create Data warehouse that is not useful to the
end users.
39
Advantages of Data Warehouse
(DWH)
 Data warehouse allows business users to quickly access
critical data from some sources all in one place.
 Data warehouse provides consistent information on
various cross-functional activities.
 It is also supporting ad-hoc reporting and query.
 Data Warehouse helps to integrate many sources of data
to reduce stress on the production system.
 Data warehouse helps to reduce total turnaround time
for analysis and reporting.
40
Advantages of Data Warehouse
(DWH) …
 Restructuring and Integration make it easier for the user
to use for reporting and analysis.
 Data warehouse allows users to access critical data from
the number of sources in a single place.
 Therefore, it saves user's time of retrieving data from multiple
sources.
 Data warehouse stores a large amount of historical data.
 This helps users to analyze different time periods and trends to
make future predictions.
41
Disadvantages of Data Warehouse
(DWH)
 Not an ideal option for unstructured data.
 Creation and Implementation of Data Warehouse is surely time confusing
affair.
 Data Warehouse can be outdated relatively quickly
 Difficult to make changes in data types and ranges, data source schema,
indexes, and queries.
 The data warehouse may seem easy, but actually, it is too complex for the
average users.
 Despite best efforts at project management, data warehousing project scope
will always increase.
 Sometime warehouse users will develop different business rules.
 Organizations need to spend lots of their resources for training and
Implementation purpose.
42
The Future of Data
Warehousing …
o Change in Regulatory constrains may limit the ability to combine source of
disparate data.
o These disparate sources may include unstructured data which is difficult to store.
o As the size of the databases grows, the estimates of what constitutes a very
large database continue to grow.
o It is complex to build and run data warehouse systems which are always increasing in size.
o The hardware and software resources are available today do not allow to keep a large
amount of data online.
o Multimedia data cannot be easily manipulated as text data, whereas textual
information can be retrieved by the relational software available today.
o This could be a research subject.
43
Data Warehouse
Tools
▸ There are many Data Warehousing tools are
available in the market.
▸ Here, are some most prominent one:
1.MarkLogic
2.Oracle
3.Amazon RedShift
44
Data Warehouse
Tools …
1. MarkLogic:
▸ MarkLogic is useful data warehousing solution that
makes data integration easier and faster using an array
of enterprise features.
▸ This tool helps to perform very complex search
operations.
▸ It can query different types of data like documents,
relationships, and metadata.
https://www.marklogic.com/product/getting-started/
45
Data Warehouse
Tools …
2. Oracle:
▸ Oracle is the industry-leading database.
▸ It offers a wide range of choice of data
warehouse solutions for both on-premises and in
the cloud.
▸ It helps to optimize customer experiences by
increasing operational efficiency.
46
Data Warehouse
Tools …
3. Amazon RedShift:
▸ Amazon Redshift is Data warehouse tool.
▸ It is a simple and cost-effective tool to analyze all types
of data using standard SQL and existing BI tools.
▸ It also allows running complex queries against
petabytes of structured data, using the technique of
query optimization.
https://aws.amazon.com/redshift/?nc2=h_m1
47
KEY LEARNING
▸ Data Warehouse (DWH), is also known as an Enterprise
Data Warehouse (EDW).
▸ A Data Warehouse is defined as a central repository
where information is coming from one or more data
sources.
▸ Three main types of Data warehouses are Enterprise Data
Warehouse (EDW), Operational Data Store, and Data Mart.
48
KEY LEARNING …
▸ General state of a Datawarehouse are Offline Operational
Database, Offline Data Warehouse, Real time Data Warehouse
and Integrated Data Warehouse.
▸ Four main components of Datawarehouse are Load
manager, Warehouse Manager, Query Manager, End-user
access tools
▸ Datawarehouse is used in diverse industries like Airline,
Banking, Healthcare, Insurance, Retail etc.
49
KEY LEARNING …
▸ Implementing Datawarehouse is a 3 prong
strategy viz. Enterprise strategy, Phased
delivery and Iterative Prototyping.
▸ Data warehouse allows business users to
quickly access critical data from some
sources all in one place.
50
Database (DB) vs Data Warehouse
(DWH)
What is Database?
▸ A database is a collection of related data which
represents some elements of the real world.
▸ It is designed to be built and populated with data
for a specific task.
▸ It is also a building block of your data solution.
51
Database (DB) vs Data Warehouse
(DWH) …
What is a Data Warehouse?
▸ A data warehouse is an information system which stores historical
and commutative data from single or multiple sources.
▸ It is designed to analyze, report, integrate transaction data from
different sources.
▸ Data Warehouse eases the analysis and reporting process of an
organization.
▹ It is also a single version of truth for the organization for decision making and
forecasting process.
52
Database (DB) vs Data Warehouse
(DWH) …
KEY DIFFERENCE
▸ Database is a collection of related data that represents some
elements of the real world whereas Data warehouse is an
information system that stores historical and commutative data from
single or multiple sources.
▸ Database is designed to record data whereas the Data warehouse is
designed to analyze data.
▸ Database is application-oriented-collection of data whereas Data
Warehouse is the subject-oriented collection of data.
53
Database (DB) vs Data Warehouse
(DWH) …
KEY DIFFERENCE
▸ Database uses Online Transactional Processing (OLTP) whereas
Data warehouse uses Online Analytical Processing (OLAP).
▸ Database tables and joins are complicated because they are
normalized whereas Data Warehouse tables and joins are easy
because they are denormalized.
▸ ER modeling techniques are used for designing Database
whereas data modeling techniques are used for designing
Data Warehouse.
54
Database (DB) vs Data Warehouse
(DWH) …
Why use a Database?
▸ Here, are prime reasons for using Database system:
▹ It offers the security of data and its access
▹ A database offers a variety of techniques to store and retrieve data
▹ Database act as an efficient handler to balance the requirement of multiple
applications using the same data
▹ A DBMS offers integrity constraints to get a high level of protection to prevent access
to prohibited data
55
Database (DB) vs Data Warehouse
(DWH) …
Why Use Data Warehouse?
▸ Here, are Important reasons for using Data Warehouse:
▹ Data warehouse helps business users to access critical data from some
sources all in one place.
▹ It provides consistent information on various cross-functional activities.
▹ Helps you to integrate many sources of data to reduce stress on the
production system.
▹ Data warehouse helps you to reduce TAT (total turnaround time) for
56
Database (DB) vs Data Warehouse
(DWH) …
Why Use Data Warehouse?
▹ Data warehouse helps users to access critical data from different sources in a single
place so, it saves user's time of retrieving data information from multiple sources.
▹ You can also access data from the cloud easily.
▹ Data warehouse allows you to stores a large amount of historical data to analyze
different periods and trends to make future predictions.
▹ Enhances the value of operational business applications and customer relationship
management systems.
▹ Separates analytics processing from transactional databases, improving the
performance of both systems.
57
Database (DB) vs Data Warehouse
(DWH) …
Characteristics of Database
▸ Offers security and removes redundancy
▸ Allow multiple views of the data
▸ Database system follows the ACID compliance ( Atomicity,
Consistency, Isolation, and Durability).
▸ Allows insulation between programs and data
▸ Sharing of data and multiuser transaction processing
▸ Relational Database support multi-user environment
58
Database (DB) vs Data Warehouse
(DWH) …
Characteristics of Data Warehouse
▸ A data warehouse is subject oriented as it offers
information related to theme instead of companies'
ongoing operations.
▸ The data also needs to be stored in the Datawarehouse in
common and unanimously acceptable manner.
▸ The time horizon for the data warehouse is relatively
extensive compared with other operational systems.
▸ A data warehouse is non-volatile which means the
previous data is not erased when new information is
59
Database (DB) vs Data Warehouse
(DWH) …
Difference between Database and Data Warehouse
60
Database (DB) vs Data Warehouse
(DWH) …
61
Parameter Database Data Warehouse
Purpose Is designed to record. Is designed to analyse.
Processing
Method
The database uses the Online
Transactional Processing (OLTP).
Data warehouse uses Online Analytical
Processing (OLAP).
Usage
The database helps to perform
fundamental operations for your
business.
Data warehouse allows you to analyze
your business.
Tables and
Joins
Tables and joins of a database are
complex as they are normalized.
Table and joins are simple in a data
warehouse because they are
denormalized.
Orientation
Is an application-oriented collection of
data.
It is a subject-oriented collection of data.
Storage limit
Generally limited to a single
application.
Stores data from any number of
applications.
Database (DB) vs Data Warehouse
(DWH) …
62
Parameter Database Data Warehouse
Availability
Data is available real-time Data is refreshed from source systems as and
when needed
Usage
ER modeling techniques are used for
designing.
Data modeling techniques are used for
designing.
Technique Capture data Analyze data
Data Type
Data stored in the Database is up to
date.
Current and Historical Data is stored in Data
Warehouse.
May not be up to date.
Storage of
data
Flat Relational Approach method is
used for data storage.
Data Ware House uses dimensional and
normalized approach for the data structure.
Example: Star and snowflake schema.
Query Type Simple transaction queries are used. Complex queries are used for analysis purpose.
Data
Summary
Detailed Data is stored in a
database.
It stores highly summarized data.
Database (DB) vs Data Warehouse
(DWH) …
Applications of Database
63
Sector Usage
Banking Use in the banking sector for customer information, account-related activities,
payments, deposits, loans, credit cards, etc.
Airlines Use for reservations and schedule information.
Universities To store student information, course registrations, colleges, and results.
Telecommunication It helps to store call records, monthly bills, balance maintenance, etc.
Finance Helps you to store information related stock, sales, and purchases of stocks and
bonds.
Sales & Production Use for storing customer, product and sales details.
Manufacturing It is used for the data management of the supply chain and for tracking
production of items, inventories status.
HR Management Detail about employee's salaries, deduction, generation of paychecks, etc.
Database (DB) vs Data Warehouse
(DWH) …
Applications of Data Warehousing
64
Sector Usage
Airline
It is used for airline system management operations like crew assignment, analyzes of
route, frequent flyer program discount schemes for passenger, etc.
Banking It is used in the banking sector to manage the resources available on the desk effectively.
Healthcare sector
Data warehouse used to strategize and predict outcomes, create patient's treatment
reports, etc.
Advanced machine learning, big data enable Datawarehouse systems can predict
ailments.
Insurance sector
Data warehouses are widely used to analyze data patterns, customer trends, and to track
market movements quickly.
Retain chain
It helps you to track items, identify the buying pattern of the customer, promotions and
also used for determining pricing policy.
Telecommunication
In this sector, data warehouse used for product promotions, sales decisions and to make
distribution decisions.
Database (DB) vs Data Warehouse
(DWH) …
Disadvantages of Database
 Cost of Hardware and Software of an implementing Database
system is high which can increase the budget of your
organization.
 Many DBMS systems are often complex systems, so the
training for users to use the DBMS is required.
 DBMS can't perform sophisticated calculations
 Issues regarding compatibility with systems which is already in
place
 Data owners may lose control over their data, raising security,
ownership, and privacy issues.
65
Database (DB) vs Data Warehouse
(DWH) …
Disadvantages of Data Warehouse
▸ Adding new data sources takes time, and it is associated with high
cost.
▸ Sometimes problems associated with the data warehouse may be
undetected for many years.
▸ Data warehouses are high maintenance systems. Extracting, loading,
and cleaning data could be time-consuming.
▸ The data warehouse may look simple, but actually, it is too
complicated for the average users.
▹ You need to provide training to end-users, who end up not using the data mining and
warehouse.
▸ Despite best efforts at project management, the scope of data
66
67
Data Mining
▸ Data mining is one of the most useful techniques that help
entrepreneurs, researchers, and individuals to extract valuable
information from huge sets of data.
▸ Data mining is also called Knowledge Discovery in Database
(KDD).
▸ The knowledge discovery process includes Data cleaning, Data
integration, Data selection, Data transformation, Data mining,
Pattern evaluation, and Knowledge presentation.
68
What is Data
Mining?
▸ The process of extracting information to identify patterns,
trends, and useful data that would allow the business to take
the data-driven decision from huge sets of data is called Data
Mining.
▸ In other words, we can say that Data Mining is the process of
investigating hidden patterns of information to various
perspectives for categorization into useful data, which is
collected and assembled in particular areas such as data
warehouses, efficient analysis, data mining algorithm, helping
decision making and other data requirement to eventually
69
What is Data
Mining? …
▸ Data mining is the act of automatically searching for large
stores of information to find trends and patterns that go
beyond simple analysis procedures.
▸ Data mining utilizes complex mathematical algorithms for
data segments and evaluates the probability of future
events.
▸ Data Mining is also called Knowledge Discovery of Data
(KDD).
70
What is Data
Mining?…
▸ Data Mining is a process used by organizations to extract
specific data from huge databases to solve business problems.
▹ It primarily turns raw data into useful information.
▸ Data Mining is similar to Data Science carried out by a person,
in a specific situation, on a particular data set, with an
objective.
▸ This process includes various types of services such as text
mining, web mining, audio and video mining, pictorial data
mining, and social media mining.
71
What is Data
Mining?…
▸ By outsourcing data mining, all the work can be done faster with low
operation costs.
▸ Specialized firms can also use new technologies to collect data that is
impossible to locate manually.
▹ There are tones of information available on various platforms, but very little knowledge is
accessible.
▸ The biggest challenge is to analyze the data to extract important
information that can be used to solve a problem or for company
development.
▸ There are many powerful instruments and techniques available to
mine data and find better insight from it.
72
What is Data
Mining?…
73
Types of Data
Mining
▸ Data mining can be performed on the following types of
data:
1)Relational Database
2)Data warehouses
3)Data Repositories
4)Object-Relational Database
5)Transactional Database
74
Types of Data
Mining …
1) Relational Database:
▸ A relational database is a collection of multiple data
sets formally organized by tables, records, and
columns from which data can be accessed in various
ways without having to recognize the database
tables.
▸ Tables convey and share information, which
facilitates data searchability, reporting, and
75
Types of Data
Mining …
2) Data warehouses:
▸ A Data Warehouse is the technology that collects the data from
various sources within the organization to provide meaningful
business insights.
▸ The huge amount of data comes from multiple places such as
Marketing and Finance.
▸ The extracted data is utilized for analytical purposes and helps in
decision- making for a business organization.
▸ The data warehouse is designed for the analysis of data rather than
transaction processing.
76
Types of Data
Mining …
3) Data Repositories:
▸ The Data Repository generally refers to a destination for
data storage.
▸ However, many IT professionals utilize the term more
clearly to refer to a specific kind of setup within an IT
structure.
▸ For example, a group of databases, where an
organization has kept various kinds of information.
77
Types of Data
Mining …
4) Object-Relational Database:
▸ A combination of an object-oriented database model and relational
database model is called an object-relational model.
▸ It supports Classes, Objects, Inheritance, etc.
▸ One of the primary objectives of the Object-relational data model is
to close the gap between the Relational database and the object-
oriented model practices frequently utilized in many programming
languages, for example, C++, Java, C#, and so on.
78
Types of Data
Mining …
5) Transactional Database:
▸ A transactional database refers to a database
management system (DBMS) that has the potential
to undo a database transaction if it is not performed
appropriately.
▸ Even though this was a unique capability a very long
while back, today, most of the relational database
systems support transactional database activities.
79
Advantages of Data
Mining
 The Data Mining technique enables organizations to
obtain knowledge-based data.
 Data mining enables organizations to make lucrative
modifications in operation and production.
 Compared with other statistical data applications, data
mining is a cost-efficient.
80
Advantages of Data
Mining …
▸ Data Mining helps the decision-making process of an
organization.
▸ It Facilitates the automated discovery of hidden patterns as
well as the prediction of trends and behaviors.
▸ It can be induced in the new system as well as the existing
platforms.
▸ It is a quick process that makes it easy for new users to analyze
81
Disadvantages of Data
Mining
▸ There is a probability that the organizations may
sell useful data of customers to other
organizations for money.
▹ As per the report, American Express has sold credit card purchases of
their customers to other organizations.
▸ Many data mining analytics software is difficult to
operate and needs advance training to work on.
82
Disadvantages of Data
Mining …
▸ Different data mining instruments operate in
distinct ways due to the different algorithms
used in their design.
▹ Therefore, the selection of the right data mining tools is a very
challenging task.
▸ The data mining techniques are not precise, so
that it may lead to severe consequences in
certain conditions.
83
Applications of Data
Mining
▸ Data Mining is primarily used by organizations with
intense consumer demands- Retail, Communication,
Financial, marketing company, determine price,
consumer preferences, product positioning, and impact
on sales, customer satisfaction, and corporate profits.
▸ Data mining enables a retailer to use point-of-sale
records of customer purchases to develop products and
promotions that help the organization to attract the
customer.
84
Applications of Data
Mining …
85
Applications of Data
Mining …
Data Mining in Healthcare:
▸ Data mining in healthcare has excellent potential to improve the
health system.
▸ It uses data and analytics for better insights and to identify best
practices that will enhance health care services and reduce costs.
▹ Analysts use data mining approaches such as Machine learning, Multi-dimensional database,
Data visualization, Soft computing, and statistics.
▸ Data Mining can be used to forecast patients in each category.
▸ The procedures ensure that the patients get intensive care at the
right place and at the right time.
▸ Data mining also enables healthcare insurers to recognize fraud and
abuse.
86
Applications of Data
Mining …
Data Mining in Market Basket Analysis:
▸ Market basket analysis is a modeling method based on a
hypothesis.
▹ If you buy a specific group of products, then you are more likely to buy another
group of products.
▹ This technique may enable the retailer to understand the purchase behavior of a
buyer.
▹ This data may assist the retailer in understanding the requirements of the buyer
and altering the store's layout accordingly.
▸ Using a different analytical comparison of results
between various stores, between customers in different
demographic groups can be done.
87
Applications of Data
Mining …
Data mining in Education:
▸ Education data mining is a newly emerging field, concerned with
developing techniques that explore knowledge from the data
generated from educational Environments.
▸ EDM objectives are recognized as affirming student's future learning
behavior, studying the impact of educational support, and promoting
learning science.
▸ An organization can use data mining to make precise decisions and
also to predict the results of the student.
▹ With the results, the institution can concentrate on what to teach and
how to teach.
88
Applications of Data
Mining …
Data Mining in Manufacturing Engineering:
▸ Knowledge is the best asset possessed by a manufacturing
company.
▸ Data mining tools can be beneficial to find patterns in a
complex manufacturing process. Data mining can be used in
system-level designing to obtain the relationships between
product architecture, product portfolio, and data needs of the
customers.
▸ It can also be used to forecast the product development
period, cost, and expectations among the other tasks.
89
Applications of Data
Mining …
Data Mining in CRM (Customer Relationship
Management):
▸ Customer Relationship Management (CRM) is all about
obtaining and holding Customers, also enhancing
customer loyalty and implementing customer-oriented
strategies.
▸ To get a decent relationship with the customer, a
business organization needs to collect data and
analyze the data.
▸ With data mining technologies, the collected data can
90
Applications of Data
Mining …
Data Mining in Fraud detection:
▸ Billions of dollars are lost to the action of frauds.
▸ Traditional methods of fraud detection are a little bit time consuming
and sophisticated. Data mining provides meaningful patterns and
turning data into information.
▸ An ideal fraud detection system should protect the data of all the
users.
▹ Supervised methods consist of a collection of sample records, and these
records are classified as fraudulent or non-fraudulent.
▸ A model is constructed using this data, and the technique is made to
identify whether the document is fraudulent or not.
91
Applications of Data
Mining …
Data Mining in Lie Detection:
▸ Apprehending a criminal is not a big deal, but bringing out the
truth from him is a very challenging task.
▸ Law enforcement may use data mining techniques to
investigate offenses, monitor suspected terrorist
communications, etc.
▸ This technique includes text mining also, and it seeks
meaningful patterns in data, which is usually unstructured
text.
▸ The information collected from the previous investigations is
compared, and a model for lie detection is constructed.
92
Applications of Data
Mining …
Data Mining Financial Banking:
▸ The Digitalization of the banking system is supposed to generate
an enormous amount of data with every new transaction.
▸ The data mining technique can help bankers by solving business-
related problems in banking and finance by identifying trends,
casualties, and correlations in business information and market
costs that are not instantly evident to managers or executives
because the data volume is too large or are produced too rapidly
on the screen by experts.
▸ The manager may find these data for better targeting, acquiring,
retaining, segmenting, and maintain a profitable customer.
93
Challenges of Implementation in Data
Mining
▸ Although data mining is very powerful, it faces many
challenges during its execution.
▸ Various challenges could be related to performance, data,
methods, and techniques, etc.
▸ The process of data mining becomes effective when the
challenges or problems are correctly recognized and
adequately resolved.
94
Challenges of Implementation in Data
Mining …
95
Challenges of Implementation in Data
Mining …
Incomplete and noisy data:
▸ The process of extracting useful data from large volumes of data
is data mining.
▸ The data in the real-world is heterogeneous, incomplete, and
noisy. Data in huge quantities will usually be inaccurate or
unreliable.
▸ These problems may occur due to data measuring instrument or
because of human errors.
▸ Suppose a retail chain collects phone numbers of customers who
spend more than $ 500, and the accounting employees put the
information into their system.
96
Challenges of Implementation in Data
Mining …
Incomplete and noisy data:
▸ The person may make a digit mistake when entering
the phone number, which results in incorrect data.
▸ Even some customers may not be willing to disclose
their phone numbers, which results in incomplete
data.
▸ The data could get changed due to human or system
error.
▸ All these consequences (noisy and incomplete
97
Challenges of Implementation in Data
Mining …
Data Distribution:
▸ Real-worlds data is usually stored on various platforms in a
distributed computing environment.
▹ It might be in a database, individual systems, or even on the internet.
▸ Practically, It is a quite tough task to make all the data to a
centralized data repository mainly due to organizational and
technical concerns.
▹ For example, various regional offices may have their servers to store
their data.
▹ It is not feasible to store, all the data from all the offices on a central
server.
▸ Therefore, data mining requires the development of tools and
algorithms that allow the mining of distributed data.
98
Challenges of Implementation in Data
Mining …
Complex Data:
▸ Real-world data is heterogeneous, and it could be
multimedia data, including audio and video, images,
complex data, spatial data, time series, and so on.
▸ Managing these various types of data and extracting
useful information is a tough task.
▸ Most of the time, new technologies, new tools, and
methodologies would have to be refined to obtain
specific information.
99
Challenges of Implementation in Data
Mining …
Performance:
▸ The data mining system's performance relies
primarily on the efficiency of algorithms and
techniques used.
▸ If the designed algorithm and techniques are not
up to the mark, then the efficiency of the data
mining process will be affected adversely.
100
Challenges of Implementation in Data
Mining …
Data Privacy and Security:
▸ Data mining usually leads to serious issues in
terms of data security, governance, and privacy.
▸ For example, if a retailer analyzes the details of
the purchased items, then it reveals data about
buying habits and preferences of the customers
without their permission.
101
Challenges of Implementation in Data
Mining …
Data Visualization:
▸ In data mining, data visualization is a very important process
because it is the primary method that shows the output to the user in
a presentable way.
▸ The extracted data should convey the exact meaning of what it
intends to express.
▸ But many times, representing the information to the end-user in a
precise and easy way is difficult.
▸ The input data and the output information being complicated, very
efficient, and successful data visualization processes need to be
implemented to make it successful.
102
Data Mining
Techniques
▸ Data mining includes the utilization of refined data
analysis tools to find previously unknown, valid patterns
and relationships in huge data sets.
▸ These tools can incorporate statistical models, machine
learning techniques, and mathematical algorithms, such
as neural networks or decision trees.
▸ Thus, data mining incorporates analysis and prediction.
103
Data Mining
Techniques …
▸ Depending on various methods and technologies from the
intersection of machine learning, database management, and
statistics, professionals in data mining have devoted their
careers to better understanding how to process and make
conclusions from the huge amount of data, but what are the
methods they use to make it happen?
▸ In recent data mining projects, various major data mining
techniques have been developed and used, including
association, classification, clustering, prediction, sequential
patterns, and regression.
104
Data Mining
Techniques …
105
Data Mining
Techniques …
1. Classification:
▸ This technique is used to obtain important and relevant information about data and
metadata.
▸ This data mining technique helps to classify data in different classes.
▸ Data mining techniques can be classified by different criteria, as follows:
i. Classification of Data mining frameworks as per the type of data sources
mined:
▹ This classification is as per the type of data handled.
▹ For example, multimedia, spatial data, text data, time-series data, World Wide
Web, and so on..
ii. Classification of data mining frameworks as per the database involved:
▹ This classification based on the data model involved.
▹ For example. Object-oriented database, transactional database, relational
database, and so on..
106
Data Mining
Techniques …
1. Classification:
iii. Classification of data mining frameworks as per the kind of knowledge
discovered:
▹ This classification depends on the types of knowledge discovered or data mining
functionalities.
▹ For example, discrimination, classification, clustering, characterization, etc. some
frameworks tend to be extensive frameworks offering a few data mining
functionalities together..
iv. Classification of data mining frameworks according to data mining
techniques used:
▹ This classification is as per the data analysis approach utilized, such as neural
networks, machine learning, genetic algorithms, visualization, statistics, data
warehouse-oriented or database-oriented, etc.
▹ The classification can also take into account, the level of user interaction involved
in the data mining procedure, such as query-driven systems, autonomous
systems, or interactive exploratory systems.
107
Data Mining
Techniques …
2. Clustering:
▸ Clustering is a division of information into groups of connected
objects.
▸ Describing the data by a few clusters mainly loses certain confine
details, but accomplishes improvement.
▸ It models data by its clusters. Data modeling puts clustering from
a historical point of view rooted in statistics, mathematics, and
numerical analysis.
▸ From a machine learning point of view, clusters relate to hidden
patterns, the search for clusters is unsupervised learning, and the
subsequent framework represents a data concept.
108
Data Mining
Techniques …
2. Clustering:
▸ From a practical point of view, clustering plays an extraordinary job in
data mining applications.
▹ For example, scientific data exploration, text mining, information retrieval, spatial
database applications, CRM, Web analysis, computational biology, medical
diagnostics, and much more.
▸ In other words, we can say that Clustering analysis is a data mining
technique to identify similar data.
▹ This technique helps to recognize the differences and similarities between the
data.
▸ Clustering is very similar to the classification, but it involves grouping
chunks of data together based on their similarities
109
Data Mining
Techniques …
3. Regression:
▸ Regression analysis is the data mining process is used to identify and
analyze the relationship between variables because of the presence
of the other factor.
▸ It is used to define the probability of the specific variable.
▸ Regression, primarily a form of planning and modeling.
▹ For example, we might use it to project certain costs, depending on
other factors such as availability, consumer demand, and competition.
▸ it gives the exact relationship between two or more variables in the
given data set.
110
Data Mining
Techniques …
4. Association Rules:
▸ This data mining technique helps to discover a link between two or
more items.
▹ It finds a hidden pattern in the data set.
▸ Association rules are if-then statements that support to show the
probability of interactions between data items within large data sets
in different types of databases.
▹ Association rule mining has several applications and is commonly used to help
sales correlations in data or medical data sets.
▸ The way the algorithm works is that you have various data,
▹ For example, a list of grocery items that you have been buying for the last six
months.
▹ It calculates a percentage of items being purchased tog
111
Data Mining
Techniques …
4. Association Rules:
▸ These are three major measurements technique:
▸ Lift: This measurement technique measures the accuracy of the confidence
over how often item B is purchased.
(Confidence) / (item B)/ (Entire dataset)
▸ Support: This measurement technique measures how often multiple items
are purchased and compared it to the overall dataset.
(Item A + Item B) / (Entire dataset)
▸ Confidence: This measurement technique measures how often item B is
purchased when item A is purchased as well.
(Item A + Item B)/ (Item A)
112
Data Mining
Techniques …
5. Outer detection:
▸ This type of data mining technique relates to the observation of data
items in the data set, which do not match an expected pattern or
expected behavior.
▹ This technique may be used in various domains like intrusion, detection, fraud
detection, etc.
▹ It is also known as Outlier Analysis or Outlier mining.
▸ The outlier is a data point that diverges too much from the rest of the
dataset. The majority of the real-world datasets have an outlier.
▹ Outlier detection plays a significant role in the data mining field.
▹ Outlier detection is valuable in numerous fields like network interruption
identification, credit or debit card fraud detection, detecting outlying in wireless
sensor network data, etc.
113
Data Mining
Techniques …
6. Sequential Patterns:
▸ The sequential pattern is a data mining technique specialized for
evaluating sequential data to discover sequential patterns.
▸ It comprises of finding interesting subsequences in a set of
sequences, where the stake of a sequence can be measured in
terms of different criteria like length, occurrence frequency, etc.
▸ In other words, this technique of data mining helps to discover or
recognize similar patterns in transaction data over some time.
114
Data Mining
Techniques …
7. Prediction:
▸ Prediction used a combination of other
data mining techniques such as trends,
clustering, classification, etc.
▸ It analyzes past events or instances in the
right sequence to predict a future event.
115

Database Administration (Database Administrator (DBA) is a professional responsible for managing and maintaining computer databases. )pptx

  • 1.
  • 2.
    What is Data Warehousing? ▸A Data Warehousing (DW) is process for collecting and managing data from varied sources to provide meaningful business insights. ▸ A Data warehouse is typically used to connect and analyze business data from heterogeneous sources. ▸ The data warehouse is the core of the BI (Business Intelligence) system which is built for data analysis and reporting. 2
  • 3.
    What is Data Warehousing?… ▸ It is a blend of technologies and components which aids the strategic use of data. ▸ It is electronic storage of a large amount of information by a business which is designed for query and analysis instead of transaction processing. ▸ It is a process of transforming data into information and making it available to users in a timely manner to make a difference. 3
  • 4.
    What is Data Warehousing?… ▸ The decision support database (Data Warehouse) is maintained separately from the organization's operational database. ▸ However, the data warehouse is not a product but an environment. ▸ It is an architectural construct of an information system which provides users with current and historical decision support information which is difficult to access or present in the traditional operational data store. 4
  • 5.
    What is Data Warehousing?… ▸ You many know that a 3NF-designed database for an inventory system many have tables related to each other. ▸ For example, a report on current inventory information can include more than 12 joined conditions. ▹ This can quickly slow down the response time of the query and report. ▸ A data warehouse provides a new design which can help to reduce the response time and helps to enhance the performance of queries for reports and analytics. 5
  • 6.
    What is Data Warehousing?… ▸ Data warehouse system is also known by the following name: ▹ Decision Support System (DSS) ▹ Executive Information System ▹ Management Information System ▹ Business Intelligence Solution ▹ Analytic Application ▹ Data Warehouse 6
  • 7.
  • 8.
    History of Data Warehousing ▸The Datawarehouse benefits users to understand and enhance their organization's performance. ▸ The need to warehouse data evolved as computer systems became more complex and needed to handle increasing amounts of Information. 8
  • 9.
    History of Data Warehousing… ▸ Here are some key events in evolution of Data Warehouse- ▹ 1960- Dartmouth and General Mills in a joint research project, develop the terms dimensions and facts. ▹ 1970- A Nielsen and IRI introduces dimensional data marts for retail sales. ▹ 1983- Tera Data Corporation introduces a database management system which is specifically designed for decision support 9
  • 10.
    History of Data Warehousing… ▹ Data warehousing started in the late 1980s when IBM worker Paul Murphy and Barry Devlin developed the Business Data Warehouse. ▹ However, the real concept was given by Inmon Bill. ▹ He was considered as a father of data warehouse. ▹ He had written about a variety of topics for 10
  • 11.
    How does aDatawarehouse work? ▸ A Data Warehouse works as a central repository where information arrives from one or more data sources. ▸ Data flows into a data warehouse from the transactional system and other relational databases. 11
  • 12.
    How does aDatawarehouse work?… ▸ Data may be: Structured Semi-structured Unstructured data 12
  • 13.
    How does aDatawarehouse work?… ▸ The data is processed, transformed, and ingested so that users can access the processed data in the Data Warehouse through Business Intelligence tools, SQL clients, and spreadsheets. ▸ A data warehouse merges information coming from different sources into one 13
  • 14.
    How does aDatawarehouse work?… ▸ By merging all of this information in one place, an organization can analyze its customers more holistically. ▸ This helps to ensure that it has considered all the information available. ▸ Data warehousing makes data mining possible. ▹ Data mining is looking for patterns in the data that may lead to higher sales and profits. 14
  • 15.
    Types of Datawarehouse ▸ Threemain types of Data Warehouses (DWH) are: 1. Enterprise Data Warehouse (EDW) 2. Operational Data Store 3. Data Mart 15
  • 16.
    1. Enterprise DataWarehouse (EDW): ▸ Enterprise Data Warehouse (EDW) is a centralized warehouse. ▸ It provides decision support service across the enterprise. ▸ It offers a unified approach for organizing and representing data. ▸ It also provide the ability to classify data according to the subject and give access according to those divisions. 16
  • 17.
    2. Operational DataStore (ODS): ▸ Operational Data Store, which is also called ODS, are nothing but data store required when neither Data warehouse nor OLTP systems support organizations reporting needs. ▸ In ODS, Data warehouse is refreshed in real time. ▸ Hence, it is widely preferred for routine activities like storing records of the Employees. 17
  • 18.
    3. Data Mart: ▸A data mart is a subset of the data warehouse. ▸ It specially designed for a particular line of business, such as sales, finance, sales or finance. ▸ In an independent data mart, data can collect 18
  • 19.
    General stages ofData Warehouse ▸ Earlier, organizations started relatively simple use of data warehousing. ▸ However, over time, more sophisticated use of data warehousing begun. ▸ The following are general stages of use of the data warehouse (DWH):  Offline Operational Database  Offline Data Warehouse  Real time Data Warehouse 19
  • 20.
    General stages ofData Warehouse … Offline Operational Database: ▸ In this stage, data is just copied from an operational system to another server. ▸ In this way, loading, processing, and reporting of the copied data do not impact the operational system's performance. 20
  • 21.
    General stages ofData Warehouse … Offline Data Warehouse: ▸ Data in the Datawarehouse is regularly updated from the Operational Database. ▸ The data in Datawarehouse is mapped and transformed to meet the Datawarehouse objectives. 21
  • 22.
    General stages ofData Warehouse … Real time Data Warehouse: ▸ In this stage, Data warehouses are updated whenever any transaction takes place in operational database. ▸ For example, Airline or railway booking system. 22
  • 23.
    General stages ofData Warehouse … Integrated Data Warehouse: ▸ In this stage, Data Warehouses are updated continuously when the operational system performs a transaction. ▸ The Datawarehouse then generates transactions which are passed back to the operational system. 23
  • 24.
    Components of Data Warehouse ▸Four components of Data Warehouses are: a)Load manager b)Warehouse Manager c)Query Manager d)End-user access tools 24
  • 25.
    Components of Data Warehouse… Load manager: ▸ Load manager is also called the front component. ▸ It performs with all the operations associated with the extraction and load of data into the warehouse. ▸ These operations include transformations to prepare the data for entering into the Data warehouse. 25
  • 26.
    Components of Data Warehouse… Warehouse Manager: ▸ Warehouse manager performs operations associated with the management of the data in the warehouse. ▸ It performs operations like analysis of data to ensure consistency, creation of indexes and views, generation of denormalization and aggregations, transformation and merging of source data and archiving and baking-up data. 26
  • 27.
    Components of Data Warehouse… Query Manager: ▸ Query manager is also known as backend component. ▸ It performs all the operation operations related to the management of user queries. ▸ The operations of this Data warehouse components are direct queries to the appropriate tables for scheduling the execution of queries. 27
  • 28.
    Components of Data Warehouse… End-user access tools: ▸ This is categorized into five different groups like 1) Data Reporting 2) Query Tools 3) Application development tools 4) EIS tools, 5) OLAP tools and data mining tools. 28
  • 29.
    Who needs Data Warehouse ▸DWH (Data warehouse) is needed for all types of users like:  Decision makers who rely on mass amount of data  Users who use customized, complex processes to obtain information from multiple data sources.  It is also used by the people who want simple technology to access the data  It also essential for those people who want a systematic approach for making decisions.  If the user wants fast performance on a huge amount of data which is a necessity for reports, grids or charts, then Data warehouse proves useful.  Data warehouse is a first step If you want to discover 'hidden patterns' of data-flows and groupings. 29
  • 30.
    What is aData Warehouse used for? ▸ Here, are most common sectors where Data warehouse is used:  Airline  Banking  Healthcare  Public sector  Investment and Insurance sector  Retain chain  Telecommunication  Hospitality Industry 30
  • 31.
    What is aData Warehouse used for? … Airline: ▸ In the Airline system, it is used for operation purpose like crew assignment, analyses of route profitability, frequent flyer program promotions, etc. Banking: ▸ It is widely used in the banking sector to manage the resources available on desk effectively. ▸ Few banks also used for the market research, performance analysis of the product and operations. 31
  • 32.
    What is aData Warehouse used for? … Healthcare: ▸ Healthcare sector also used Data warehouse to strategize and predict outcomes, generate patient's treatment reports, share data with tie-in insurance companies, medical aid services, etc. Public sector: ▸ In the public sector, data warehouse is used for intelligence gathering. ▸ It helps government agencies to maintain and analyze tax records, health policy records, for every individual. 32
  • 33.
    What is aData Warehouse used for? … Investment and Insurance sector: ▸ In this sector, the warehouses are primarily used to analyze data patterns, customer trends, and to track market movements. Retain chain: ▸ In retail chains, Data warehouse is widely used for distribution and marketing. ▸ It also helps to track items, customer buying pattern, promotions and also used for determining pricing policy. 33
  • 34.
    What is aData Warehouse used for? … Telecommunication: ▸ A data warehouse is used in this sector for product promotions, sales decisions and to make distribution decisions. Hospitality Industry: ▸ This Industry utilizes warehouse services to design as well as estimate their advertising and promotion campaigns where they want to target clients based on their feedback and travel patterns. 34
  • 35.
    Steps to implementData Warehouse ▸ The best way to address the business risk associated with a Datawarehouse implementation is to employ a three- prong strategy as below 1.Enterprise strategy 2.Phased delivery 3.Iterative Prototyping 35
  • 36.
    Steps to implementData Warehouse … 1. Enterprise strategy: Here we identify technical including current architecture and tools. ▹ We also identify facts, dimensions, and attributes. Data mapping and transformation is also passed. 2. Phased delivery: Datawarehouse implementation should be phased based on subject areas. ▹ Related business entities like booking and billing should be first implemented and then integrated with each other. 3. Iterative Prototyping: Rather than a big bang approach to implementation, the Datawarehouse should be developed and tested iteratively. 36
  • 37.
    Steps to implementData Warehouse … ▸ Here, are key steps in Datawarehouse implementation along with its deliverables. 37 Step Tasks Deliverables 1 Need to define project scope Scope Definition 2 Need to determine business needs Logical Data Model 3 Define Operational Datastore requirements Operational Data Store Model 4 Acquire or develop Extraction tools Extract tools and Software 5 Define Data Warehouse Data requirements Transition Data Model 6 Document missing data To Do Project List 7 Maps Operational Data Store to Data Warehouse D/W Data Integration Map 8 Develop Data Warehouse Database design D/W Database Design 9 Extract Data from Operational Data Store Integrated D/W Data Extracts 10 Load Data Warehouse Initial Data Load 11 Maintain Data Warehouse On-going Data Access and Subsequent Loads
  • 38.
    Best practices toimplement a Data Warehouse ▸ Decide a plan to test the consistency, accuracy, and integrity of the data. ▸ The data warehouse must be well integrated, well defined and time stamped. ▸ While designing Datawarehouse make sure you use right tool, stick to life cycle, take care about data conflicts and ready to learn you're your mistakes. ▸ Never replace operational systems and reports 38
  • 39.
    Best practices toimplement a Data Warehouse … ▸ Don't spend too much time on extracting, cleaning and loading data. ▸ Ensure to involve all stakeholders including business personnel in Datawarehouse implementation process. ▹ Establish that Data warehousing is a joint/ team project. ▹ You don't want to create Data warehouse that is not useful to the end users. 39
  • 40.
    Advantages of DataWarehouse (DWH)  Data warehouse allows business users to quickly access critical data from some sources all in one place.  Data warehouse provides consistent information on various cross-functional activities.  It is also supporting ad-hoc reporting and query.  Data Warehouse helps to integrate many sources of data to reduce stress on the production system.  Data warehouse helps to reduce total turnaround time for analysis and reporting. 40
  • 41.
    Advantages of DataWarehouse (DWH) …  Restructuring and Integration make it easier for the user to use for reporting and analysis.  Data warehouse allows users to access critical data from the number of sources in a single place.  Therefore, it saves user's time of retrieving data from multiple sources.  Data warehouse stores a large amount of historical data.  This helps users to analyze different time periods and trends to make future predictions. 41
  • 42.
    Disadvantages of DataWarehouse (DWH)  Not an ideal option for unstructured data.  Creation and Implementation of Data Warehouse is surely time confusing affair.  Data Warehouse can be outdated relatively quickly  Difficult to make changes in data types and ranges, data source schema, indexes, and queries.  The data warehouse may seem easy, but actually, it is too complex for the average users.  Despite best efforts at project management, data warehousing project scope will always increase.  Sometime warehouse users will develop different business rules.  Organizations need to spend lots of their resources for training and Implementation purpose. 42
  • 43.
    The Future ofData Warehousing … o Change in Regulatory constrains may limit the ability to combine source of disparate data. o These disparate sources may include unstructured data which is difficult to store. o As the size of the databases grows, the estimates of what constitutes a very large database continue to grow. o It is complex to build and run data warehouse systems which are always increasing in size. o The hardware and software resources are available today do not allow to keep a large amount of data online. o Multimedia data cannot be easily manipulated as text data, whereas textual information can be retrieved by the relational software available today. o This could be a research subject. 43
  • 44.
    Data Warehouse Tools ▸ Thereare many Data Warehousing tools are available in the market. ▸ Here, are some most prominent one: 1.MarkLogic 2.Oracle 3.Amazon RedShift 44
  • 45.
    Data Warehouse Tools … 1.MarkLogic: ▸ MarkLogic is useful data warehousing solution that makes data integration easier and faster using an array of enterprise features. ▸ This tool helps to perform very complex search operations. ▸ It can query different types of data like documents, relationships, and metadata. https://www.marklogic.com/product/getting-started/ 45
  • 46.
    Data Warehouse Tools … 2.Oracle: ▸ Oracle is the industry-leading database. ▸ It offers a wide range of choice of data warehouse solutions for both on-premises and in the cloud. ▸ It helps to optimize customer experiences by increasing operational efficiency. 46
  • 47.
    Data Warehouse Tools … 3.Amazon RedShift: ▸ Amazon Redshift is Data warehouse tool. ▸ It is a simple and cost-effective tool to analyze all types of data using standard SQL and existing BI tools. ▸ It also allows running complex queries against petabytes of structured data, using the technique of query optimization. https://aws.amazon.com/redshift/?nc2=h_m1 47
  • 48.
    KEY LEARNING ▸ DataWarehouse (DWH), is also known as an Enterprise Data Warehouse (EDW). ▸ A Data Warehouse is defined as a central repository where information is coming from one or more data sources. ▸ Three main types of Data warehouses are Enterprise Data Warehouse (EDW), Operational Data Store, and Data Mart. 48
  • 49.
    KEY LEARNING … ▸General state of a Datawarehouse are Offline Operational Database, Offline Data Warehouse, Real time Data Warehouse and Integrated Data Warehouse. ▸ Four main components of Datawarehouse are Load manager, Warehouse Manager, Query Manager, End-user access tools ▸ Datawarehouse is used in diverse industries like Airline, Banking, Healthcare, Insurance, Retail etc. 49
  • 50.
    KEY LEARNING … ▸Implementing Datawarehouse is a 3 prong strategy viz. Enterprise strategy, Phased delivery and Iterative Prototyping. ▸ Data warehouse allows business users to quickly access critical data from some sources all in one place. 50
  • 51.
    Database (DB) vsData Warehouse (DWH) What is Database? ▸ A database is a collection of related data which represents some elements of the real world. ▸ It is designed to be built and populated with data for a specific task. ▸ It is also a building block of your data solution. 51
  • 52.
    Database (DB) vsData Warehouse (DWH) … What is a Data Warehouse? ▸ A data warehouse is an information system which stores historical and commutative data from single or multiple sources. ▸ It is designed to analyze, report, integrate transaction data from different sources. ▸ Data Warehouse eases the analysis and reporting process of an organization. ▹ It is also a single version of truth for the organization for decision making and forecasting process. 52
  • 53.
    Database (DB) vsData Warehouse (DWH) … KEY DIFFERENCE ▸ Database is a collection of related data that represents some elements of the real world whereas Data warehouse is an information system that stores historical and commutative data from single or multiple sources. ▸ Database is designed to record data whereas the Data warehouse is designed to analyze data. ▸ Database is application-oriented-collection of data whereas Data Warehouse is the subject-oriented collection of data. 53
  • 54.
    Database (DB) vsData Warehouse (DWH) … KEY DIFFERENCE ▸ Database uses Online Transactional Processing (OLTP) whereas Data warehouse uses Online Analytical Processing (OLAP). ▸ Database tables and joins are complicated because they are normalized whereas Data Warehouse tables and joins are easy because they are denormalized. ▸ ER modeling techniques are used for designing Database whereas data modeling techniques are used for designing Data Warehouse. 54
  • 55.
    Database (DB) vsData Warehouse (DWH) … Why use a Database? ▸ Here, are prime reasons for using Database system: ▹ It offers the security of data and its access ▹ A database offers a variety of techniques to store and retrieve data ▹ Database act as an efficient handler to balance the requirement of multiple applications using the same data ▹ A DBMS offers integrity constraints to get a high level of protection to prevent access to prohibited data 55
  • 56.
    Database (DB) vsData Warehouse (DWH) … Why Use Data Warehouse? ▸ Here, are Important reasons for using Data Warehouse: ▹ Data warehouse helps business users to access critical data from some sources all in one place. ▹ It provides consistent information on various cross-functional activities. ▹ Helps you to integrate many sources of data to reduce stress on the production system. ▹ Data warehouse helps you to reduce TAT (total turnaround time) for 56
  • 57.
    Database (DB) vsData Warehouse (DWH) … Why Use Data Warehouse? ▹ Data warehouse helps users to access critical data from different sources in a single place so, it saves user's time of retrieving data information from multiple sources. ▹ You can also access data from the cloud easily. ▹ Data warehouse allows you to stores a large amount of historical data to analyze different periods and trends to make future predictions. ▹ Enhances the value of operational business applications and customer relationship management systems. ▹ Separates analytics processing from transactional databases, improving the performance of both systems. 57
  • 58.
    Database (DB) vsData Warehouse (DWH) … Characteristics of Database ▸ Offers security and removes redundancy ▸ Allow multiple views of the data ▸ Database system follows the ACID compliance ( Atomicity, Consistency, Isolation, and Durability). ▸ Allows insulation between programs and data ▸ Sharing of data and multiuser transaction processing ▸ Relational Database support multi-user environment 58
  • 59.
    Database (DB) vsData Warehouse (DWH) … Characteristics of Data Warehouse ▸ A data warehouse is subject oriented as it offers information related to theme instead of companies' ongoing operations. ▸ The data also needs to be stored in the Datawarehouse in common and unanimously acceptable manner. ▸ The time horizon for the data warehouse is relatively extensive compared with other operational systems. ▸ A data warehouse is non-volatile which means the previous data is not erased when new information is 59
  • 60.
    Database (DB) vsData Warehouse (DWH) … Difference between Database and Data Warehouse 60
  • 61.
    Database (DB) vsData Warehouse (DWH) … 61 Parameter Database Data Warehouse Purpose Is designed to record. Is designed to analyse. Processing Method The database uses the Online Transactional Processing (OLTP). Data warehouse uses Online Analytical Processing (OLAP). Usage The database helps to perform fundamental operations for your business. Data warehouse allows you to analyze your business. Tables and Joins Tables and joins of a database are complex as they are normalized. Table and joins are simple in a data warehouse because they are denormalized. Orientation Is an application-oriented collection of data. It is a subject-oriented collection of data. Storage limit Generally limited to a single application. Stores data from any number of applications.
  • 62.
    Database (DB) vsData Warehouse (DWH) … 62 Parameter Database Data Warehouse Availability Data is available real-time Data is refreshed from source systems as and when needed Usage ER modeling techniques are used for designing. Data modeling techniques are used for designing. Technique Capture data Analyze data Data Type Data stored in the Database is up to date. Current and Historical Data is stored in Data Warehouse. May not be up to date. Storage of data Flat Relational Approach method is used for data storage. Data Ware House uses dimensional and normalized approach for the data structure. Example: Star and snowflake schema. Query Type Simple transaction queries are used. Complex queries are used for analysis purpose. Data Summary Detailed Data is stored in a database. It stores highly summarized data.
  • 63.
    Database (DB) vsData Warehouse (DWH) … Applications of Database 63 Sector Usage Banking Use in the banking sector for customer information, account-related activities, payments, deposits, loans, credit cards, etc. Airlines Use for reservations and schedule information. Universities To store student information, course registrations, colleges, and results. Telecommunication It helps to store call records, monthly bills, balance maintenance, etc. Finance Helps you to store information related stock, sales, and purchases of stocks and bonds. Sales & Production Use for storing customer, product and sales details. Manufacturing It is used for the data management of the supply chain and for tracking production of items, inventories status. HR Management Detail about employee's salaries, deduction, generation of paychecks, etc.
  • 64.
    Database (DB) vsData Warehouse (DWH) … Applications of Data Warehousing 64 Sector Usage Airline It is used for airline system management operations like crew assignment, analyzes of route, frequent flyer program discount schemes for passenger, etc. Banking It is used in the banking sector to manage the resources available on the desk effectively. Healthcare sector Data warehouse used to strategize and predict outcomes, create patient's treatment reports, etc. Advanced machine learning, big data enable Datawarehouse systems can predict ailments. Insurance sector Data warehouses are widely used to analyze data patterns, customer trends, and to track market movements quickly. Retain chain It helps you to track items, identify the buying pattern of the customer, promotions and also used for determining pricing policy. Telecommunication In this sector, data warehouse used for product promotions, sales decisions and to make distribution decisions.
  • 65.
    Database (DB) vsData Warehouse (DWH) … Disadvantages of Database  Cost of Hardware and Software of an implementing Database system is high which can increase the budget of your organization.  Many DBMS systems are often complex systems, so the training for users to use the DBMS is required.  DBMS can't perform sophisticated calculations  Issues regarding compatibility with systems which is already in place  Data owners may lose control over their data, raising security, ownership, and privacy issues. 65
  • 66.
    Database (DB) vsData Warehouse (DWH) … Disadvantages of Data Warehouse ▸ Adding new data sources takes time, and it is associated with high cost. ▸ Sometimes problems associated with the data warehouse may be undetected for many years. ▸ Data warehouses are high maintenance systems. Extracting, loading, and cleaning data could be time-consuming. ▸ The data warehouse may look simple, but actually, it is too complicated for the average users. ▹ You need to provide training to end-users, who end up not using the data mining and warehouse. ▸ Despite best efforts at project management, the scope of data 66
  • 67.
  • 68.
    Data Mining ▸ Datamining is one of the most useful techniques that help entrepreneurs, researchers, and individuals to extract valuable information from huge sets of data. ▸ Data mining is also called Knowledge Discovery in Database (KDD). ▸ The knowledge discovery process includes Data cleaning, Data integration, Data selection, Data transformation, Data mining, Pattern evaluation, and Knowledge presentation. 68
  • 69.
    What is Data Mining? ▸The process of extracting information to identify patterns, trends, and useful data that would allow the business to take the data-driven decision from huge sets of data is called Data Mining. ▸ In other words, we can say that Data Mining is the process of investigating hidden patterns of information to various perspectives for categorization into useful data, which is collected and assembled in particular areas such as data warehouses, efficient analysis, data mining algorithm, helping decision making and other data requirement to eventually 69
  • 70.
    What is Data Mining?… ▸ Data mining is the act of automatically searching for large stores of information to find trends and patterns that go beyond simple analysis procedures. ▸ Data mining utilizes complex mathematical algorithms for data segments and evaluates the probability of future events. ▸ Data Mining is also called Knowledge Discovery of Data (KDD). 70
  • 71.
    What is Data Mining?… ▸Data Mining is a process used by organizations to extract specific data from huge databases to solve business problems. ▹ It primarily turns raw data into useful information. ▸ Data Mining is similar to Data Science carried out by a person, in a specific situation, on a particular data set, with an objective. ▸ This process includes various types of services such as text mining, web mining, audio and video mining, pictorial data mining, and social media mining. 71
  • 72.
    What is Data Mining?… ▸By outsourcing data mining, all the work can be done faster with low operation costs. ▸ Specialized firms can also use new technologies to collect data that is impossible to locate manually. ▹ There are tones of information available on various platforms, but very little knowledge is accessible. ▸ The biggest challenge is to analyze the data to extract important information that can be used to solve a problem or for company development. ▸ There are many powerful instruments and techniques available to mine data and find better insight from it. 72
  • 73.
  • 74.
    Types of Data Mining ▸Data mining can be performed on the following types of data: 1)Relational Database 2)Data warehouses 3)Data Repositories 4)Object-Relational Database 5)Transactional Database 74
  • 75.
    Types of Data Mining… 1) Relational Database: ▸ A relational database is a collection of multiple data sets formally organized by tables, records, and columns from which data can be accessed in various ways without having to recognize the database tables. ▸ Tables convey and share information, which facilitates data searchability, reporting, and 75
  • 76.
    Types of Data Mining… 2) Data warehouses: ▸ A Data Warehouse is the technology that collects the data from various sources within the organization to provide meaningful business insights. ▸ The huge amount of data comes from multiple places such as Marketing and Finance. ▸ The extracted data is utilized for analytical purposes and helps in decision- making for a business organization. ▸ The data warehouse is designed for the analysis of data rather than transaction processing. 76
  • 77.
    Types of Data Mining… 3) Data Repositories: ▸ The Data Repository generally refers to a destination for data storage. ▸ However, many IT professionals utilize the term more clearly to refer to a specific kind of setup within an IT structure. ▸ For example, a group of databases, where an organization has kept various kinds of information. 77
  • 78.
    Types of Data Mining… 4) Object-Relational Database: ▸ A combination of an object-oriented database model and relational database model is called an object-relational model. ▸ It supports Classes, Objects, Inheritance, etc. ▸ One of the primary objectives of the Object-relational data model is to close the gap between the Relational database and the object- oriented model practices frequently utilized in many programming languages, for example, C++, Java, C#, and so on. 78
  • 79.
    Types of Data Mining… 5) Transactional Database: ▸ A transactional database refers to a database management system (DBMS) that has the potential to undo a database transaction if it is not performed appropriately. ▸ Even though this was a unique capability a very long while back, today, most of the relational database systems support transactional database activities. 79
  • 80.
    Advantages of Data Mining The Data Mining technique enables organizations to obtain knowledge-based data.  Data mining enables organizations to make lucrative modifications in operation and production.  Compared with other statistical data applications, data mining is a cost-efficient. 80
  • 81.
    Advantages of Data Mining… ▸ Data Mining helps the decision-making process of an organization. ▸ It Facilitates the automated discovery of hidden patterns as well as the prediction of trends and behaviors. ▸ It can be induced in the new system as well as the existing platforms. ▸ It is a quick process that makes it easy for new users to analyze 81
  • 82.
    Disadvantages of Data Mining ▸There is a probability that the organizations may sell useful data of customers to other organizations for money. ▹ As per the report, American Express has sold credit card purchases of their customers to other organizations. ▸ Many data mining analytics software is difficult to operate and needs advance training to work on. 82
  • 83.
    Disadvantages of Data Mining… ▸ Different data mining instruments operate in distinct ways due to the different algorithms used in their design. ▹ Therefore, the selection of the right data mining tools is a very challenging task. ▸ The data mining techniques are not precise, so that it may lead to severe consequences in certain conditions. 83
  • 84.
    Applications of Data Mining ▸Data Mining is primarily used by organizations with intense consumer demands- Retail, Communication, Financial, marketing company, determine price, consumer preferences, product positioning, and impact on sales, customer satisfaction, and corporate profits. ▸ Data mining enables a retailer to use point-of-sale records of customer purchases to develop products and promotions that help the organization to attract the customer. 84
  • 85.
  • 86.
    Applications of Data Mining… Data Mining in Healthcare: ▸ Data mining in healthcare has excellent potential to improve the health system. ▸ It uses data and analytics for better insights and to identify best practices that will enhance health care services and reduce costs. ▹ Analysts use data mining approaches such as Machine learning, Multi-dimensional database, Data visualization, Soft computing, and statistics. ▸ Data Mining can be used to forecast patients in each category. ▸ The procedures ensure that the patients get intensive care at the right place and at the right time. ▸ Data mining also enables healthcare insurers to recognize fraud and abuse. 86
  • 87.
    Applications of Data Mining… Data Mining in Market Basket Analysis: ▸ Market basket analysis is a modeling method based on a hypothesis. ▹ If you buy a specific group of products, then you are more likely to buy another group of products. ▹ This technique may enable the retailer to understand the purchase behavior of a buyer. ▹ This data may assist the retailer in understanding the requirements of the buyer and altering the store's layout accordingly. ▸ Using a different analytical comparison of results between various stores, between customers in different demographic groups can be done. 87
  • 88.
    Applications of Data Mining… Data mining in Education: ▸ Education data mining is a newly emerging field, concerned with developing techniques that explore knowledge from the data generated from educational Environments. ▸ EDM objectives are recognized as affirming student's future learning behavior, studying the impact of educational support, and promoting learning science. ▸ An organization can use data mining to make precise decisions and also to predict the results of the student. ▹ With the results, the institution can concentrate on what to teach and how to teach. 88
  • 89.
    Applications of Data Mining… Data Mining in Manufacturing Engineering: ▸ Knowledge is the best asset possessed by a manufacturing company. ▸ Data mining tools can be beneficial to find patterns in a complex manufacturing process. Data mining can be used in system-level designing to obtain the relationships between product architecture, product portfolio, and data needs of the customers. ▸ It can also be used to forecast the product development period, cost, and expectations among the other tasks. 89
  • 90.
    Applications of Data Mining… Data Mining in CRM (Customer Relationship Management): ▸ Customer Relationship Management (CRM) is all about obtaining and holding Customers, also enhancing customer loyalty and implementing customer-oriented strategies. ▸ To get a decent relationship with the customer, a business organization needs to collect data and analyze the data. ▸ With data mining technologies, the collected data can 90
  • 91.
    Applications of Data Mining… Data Mining in Fraud detection: ▸ Billions of dollars are lost to the action of frauds. ▸ Traditional methods of fraud detection are a little bit time consuming and sophisticated. Data mining provides meaningful patterns and turning data into information. ▸ An ideal fraud detection system should protect the data of all the users. ▹ Supervised methods consist of a collection of sample records, and these records are classified as fraudulent or non-fraudulent. ▸ A model is constructed using this data, and the technique is made to identify whether the document is fraudulent or not. 91
  • 92.
    Applications of Data Mining… Data Mining in Lie Detection: ▸ Apprehending a criminal is not a big deal, but bringing out the truth from him is a very challenging task. ▸ Law enforcement may use data mining techniques to investigate offenses, monitor suspected terrorist communications, etc. ▸ This technique includes text mining also, and it seeks meaningful patterns in data, which is usually unstructured text. ▸ The information collected from the previous investigations is compared, and a model for lie detection is constructed. 92
  • 93.
    Applications of Data Mining… Data Mining Financial Banking: ▸ The Digitalization of the banking system is supposed to generate an enormous amount of data with every new transaction. ▸ The data mining technique can help bankers by solving business- related problems in banking and finance by identifying trends, casualties, and correlations in business information and market costs that are not instantly evident to managers or executives because the data volume is too large or are produced too rapidly on the screen by experts. ▸ The manager may find these data for better targeting, acquiring, retaining, segmenting, and maintain a profitable customer. 93
  • 94.
    Challenges of Implementationin Data Mining ▸ Although data mining is very powerful, it faces many challenges during its execution. ▸ Various challenges could be related to performance, data, methods, and techniques, etc. ▸ The process of data mining becomes effective when the challenges or problems are correctly recognized and adequately resolved. 94
  • 95.
    Challenges of Implementationin Data Mining … 95
  • 96.
    Challenges of Implementationin Data Mining … Incomplete and noisy data: ▸ The process of extracting useful data from large volumes of data is data mining. ▸ The data in the real-world is heterogeneous, incomplete, and noisy. Data in huge quantities will usually be inaccurate or unreliable. ▸ These problems may occur due to data measuring instrument or because of human errors. ▸ Suppose a retail chain collects phone numbers of customers who spend more than $ 500, and the accounting employees put the information into their system. 96
  • 97.
    Challenges of Implementationin Data Mining … Incomplete and noisy data: ▸ The person may make a digit mistake when entering the phone number, which results in incorrect data. ▸ Even some customers may not be willing to disclose their phone numbers, which results in incomplete data. ▸ The data could get changed due to human or system error. ▸ All these consequences (noisy and incomplete 97
  • 98.
    Challenges of Implementationin Data Mining … Data Distribution: ▸ Real-worlds data is usually stored on various platforms in a distributed computing environment. ▹ It might be in a database, individual systems, or even on the internet. ▸ Practically, It is a quite tough task to make all the data to a centralized data repository mainly due to organizational and technical concerns. ▹ For example, various regional offices may have their servers to store their data. ▹ It is not feasible to store, all the data from all the offices on a central server. ▸ Therefore, data mining requires the development of tools and algorithms that allow the mining of distributed data. 98
  • 99.
    Challenges of Implementationin Data Mining … Complex Data: ▸ Real-world data is heterogeneous, and it could be multimedia data, including audio and video, images, complex data, spatial data, time series, and so on. ▸ Managing these various types of data and extracting useful information is a tough task. ▸ Most of the time, new technologies, new tools, and methodologies would have to be refined to obtain specific information. 99
  • 100.
    Challenges of Implementationin Data Mining … Performance: ▸ The data mining system's performance relies primarily on the efficiency of algorithms and techniques used. ▸ If the designed algorithm and techniques are not up to the mark, then the efficiency of the data mining process will be affected adversely. 100
  • 101.
    Challenges of Implementationin Data Mining … Data Privacy and Security: ▸ Data mining usually leads to serious issues in terms of data security, governance, and privacy. ▸ For example, if a retailer analyzes the details of the purchased items, then it reveals data about buying habits and preferences of the customers without their permission. 101
  • 102.
    Challenges of Implementationin Data Mining … Data Visualization: ▸ In data mining, data visualization is a very important process because it is the primary method that shows the output to the user in a presentable way. ▸ The extracted data should convey the exact meaning of what it intends to express. ▸ But many times, representing the information to the end-user in a precise and easy way is difficult. ▸ The input data and the output information being complicated, very efficient, and successful data visualization processes need to be implemented to make it successful. 102
  • 103.
    Data Mining Techniques ▸ Datamining includes the utilization of refined data analysis tools to find previously unknown, valid patterns and relationships in huge data sets. ▸ These tools can incorporate statistical models, machine learning techniques, and mathematical algorithms, such as neural networks or decision trees. ▸ Thus, data mining incorporates analysis and prediction. 103
  • 104.
    Data Mining Techniques … ▸Depending on various methods and technologies from the intersection of machine learning, database management, and statistics, professionals in data mining have devoted their careers to better understanding how to process and make conclusions from the huge amount of data, but what are the methods they use to make it happen? ▸ In recent data mining projects, various major data mining techniques have been developed and used, including association, classification, clustering, prediction, sequential patterns, and regression. 104
  • 105.
  • 106.
    Data Mining Techniques … 1.Classification: ▸ This technique is used to obtain important and relevant information about data and metadata. ▸ This data mining technique helps to classify data in different classes. ▸ Data mining techniques can be classified by different criteria, as follows: i. Classification of Data mining frameworks as per the type of data sources mined: ▹ This classification is as per the type of data handled. ▹ For example, multimedia, spatial data, text data, time-series data, World Wide Web, and so on.. ii. Classification of data mining frameworks as per the database involved: ▹ This classification based on the data model involved. ▹ For example. Object-oriented database, transactional database, relational database, and so on.. 106
  • 107.
    Data Mining Techniques … 1.Classification: iii. Classification of data mining frameworks as per the kind of knowledge discovered: ▹ This classification depends on the types of knowledge discovered or data mining functionalities. ▹ For example, discrimination, classification, clustering, characterization, etc. some frameworks tend to be extensive frameworks offering a few data mining functionalities together.. iv. Classification of data mining frameworks according to data mining techniques used: ▹ This classification is as per the data analysis approach utilized, such as neural networks, machine learning, genetic algorithms, visualization, statistics, data warehouse-oriented or database-oriented, etc. ▹ The classification can also take into account, the level of user interaction involved in the data mining procedure, such as query-driven systems, autonomous systems, or interactive exploratory systems. 107
  • 108.
    Data Mining Techniques … 2.Clustering: ▸ Clustering is a division of information into groups of connected objects. ▸ Describing the data by a few clusters mainly loses certain confine details, but accomplishes improvement. ▸ It models data by its clusters. Data modeling puts clustering from a historical point of view rooted in statistics, mathematics, and numerical analysis. ▸ From a machine learning point of view, clusters relate to hidden patterns, the search for clusters is unsupervised learning, and the subsequent framework represents a data concept. 108
  • 109.
    Data Mining Techniques … 2.Clustering: ▸ From a practical point of view, clustering plays an extraordinary job in data mining applications. ▹ For example, scientific data exploration, text mining, information retrieval, spatial database applications, CRM, Web analysis, computational biology, medical diagnostics, and much more. ▸ In other words, we can say that Clustering analysis is a data mining technique to identify similar data. ▹ This technique helps to recognize the differences and similarities between the data. ▸ Clustering is very similar to the classification, but it involves grouping chunks of data together based on their similarities 109
  • 110.
    Data Mining Techniques … 3.Regression: ▸ Regression analysis is the data mining process is used to identify and analyze the relationship between variables because of the presence of the other factor. ▸ It is used to define the probability of the specific variable. ▸ Regression, primarily a form of planning and modeling. ▹ For example, we might use it to project certain costs, depending on other factors such as availability, consumer demand, and competition. ▸ it gives the exact relationship between two or more variables in the given data set. 110
  • 111.
    Data Mining Techniques … 4.Association Rules: ▸ This data mining technique helps to discover a link between two or more items. ▹ It finds a hidden pattern in the data set. ▸ Association rules are if-then statements that support to show the probability of interactions between data items within large data sets in different types of databases. ▹ Association rule mining has several applications and is commonly used to help sales correlations in data or medical data sets. ▸ The way the algorithm works is that you have various data, ▹ For example, a list of grocery items that you have been buying for the last six months. ▹ It calculates a percentage of items being purchased tog 111
  • 112.
    Data Mining Techniques … 4.Association Rules: ▸ These are three major measurements technique: ▸ Lift: This measurement technique measures the accuracy of the confidence over how often item B is purchased. (Confidence) / (item B)/ (Entire dataset) ▸ Support: This measurement technique measures how often multiple items are purchased and compared it to the overall dataset. (Item A + Item B) / (Entire dataset) ▸ Confidence: This measurement technique measures how often item B is purchased when item A is purchased as well. (Item A + Item B)/ (Item A) 112
  • 113.
    Data Mining Techniques … 5.Outer detection: ▸ This type of data mining technique relates to the observation of data items in the data set, which do not match an expected pattern or expected behavior. ▹ This technique may be used in various domains like intrusion, detection, fraud detection, etc. ▹ It is also known as Outlier Analysis or Outlier mining. ▸ The outlier is a data point that diverges too much from the rest of the dataset. The majority of the real-world datasets have an outlier. ▹ Outlier detection plays a significant role in the data mining field. ▹ Outlier detection is valuable in numerous fields like network interruption identification, credit or debit card fraud detection, detecting outlying in wireless sensor network data, etc. 113
  • 114.
    Data Mining Techniques … 6.Sequential Patterns: ▸ The sequential pattern is a data mining technique specialized for evaluating sequential data to discover sequential patterns. ▸ It comprises of finding interesting subsequences in a set of sequences, where the stake of a sequence can be measured in terms of different criteria like length, occurrence frequency, etc. ▸ In other words, this technique of data mining helps to discover or recognize similar patterns in transaction data over some time. 114
  • 115.
    Data Mining Techniques … 7.Prediction: ▸ Prediction used a combination of other data mining techniques such as trends, clustering, classification, etc. ▸ It analyzes past events or instances in the right sequence to predict a future event. 115