Integration of data mining results into multi-dimensional data models

ENTER 2015 Research Track Slide Number 1
Volker Meyera
Wolfram Höpkena
Matthias Fuchsb
Maria Lexhagenb
a University of Applied Sciences Ravensburg-Weingarten
Weingarten, Germany
{name.surname}@hs-weingarten.de
b Mid-Sweden University
Östersund, Sweden
{name.surname}@miun.se
Integration of data mining results into
multi-dimensional data models

Content
• Introduction
• State of the art
• Concepts for integrating DM results into MDM
• Conclusion

Motivation
• Business intelligence and data mining in tourism
– Amount of available information dramatically increased
• e.g. web-servers store tourists’ website navigation, data bases save transaction and
survey data, etc.
– Methods of BI and DM used to mine information about tourists’ travel
motives, service expectations, channel use, conversion rates or booking trends
(Pyo, et al., 2002; Wong, et al., 2006)
• Business-IT gap
– DM tools demand a huge knowledge about the DM process and the single
techniques (e.g. decision trees, association rules)
– Results can be unintelligible without the right technical knowledge of how to
read them
 Crucial business relevant information is available, but the user who needs the
information is not able to decode it

Objective
• Objective
– Present DM results in a way understandable and managable for
business users
• Approach
– Integrate knowledge generated by DM techniques directly into the
data warehouse structures the underlying data are stemming from
– DM results (e.g. decision trees or association rules) available by well-
established analysis techniques, like online analytic processing (OLAP)
Integration concepts and data warehouse structures are presented for
major data mining techniques, like frequent itemsets, decision trees,
and clustering

Content
• Introduction
• Conclusion

Integrating DM results into databases
• Extending the database standard SQL
(by new data types and database operations)
– Inductive query language by SINDBAD project (Kramer, et al., 2006)
– Mining Association Rule Extension (Meo, et al., 1998)
– Mining Structured Query Language (MSQL) (Imielinski, 1999)
– Data Mining Query Language (DMQL) (Han, et al., 1996)
• Integrating DM results without extending database standard
– ADReM-Group (http://adrem.ua.ac.be/adrem) or Fromont et. al (2007)
standard conformance (standard tools and analysis approaches)
suitable for integrating DM results into existing data warehouse
structures

Multi-dimensional data models (MDM)
• Fundamental concept of MDM
– Separation between
• Performance indicators (facts),
e.g. turnover or number of persons
• Context/dimensions, e.g. time, date,
customer, or product
– Typically represented as star schema
• MDM became famous for data warehousing
– Effective support of complex queries and OLAP analyses
– Better understandability for end users
– Crucial in tourism due to complex data structures (Höpken et al. 2013)
Booking
BookingNo (DD)
Turnover (F)
NoPersons (F)
DimProduct
ProdDesription
ProdCategory
DimCustomer
CusName
CusAge
CusGender
CusOrigin
DimTime
DayTime
Minutes
Hours
DimDate
DayInWeek
Weekend
Week
Month
Year
Season

Integrating DM results into MDM
• Extending the MDM by additional facts and
dimensions/attributes
– Complexity strongly depends on
concrete DM model
– Cluster membership can just be
represented as an additional attribute
– Decision trees or association rules need a
more complex fact/dimension structure
• Current status
– Simple approaches for market baskets (i.e. frequent itemsets) exist
(Kimball & Ross, 2002)
– Comprehensive approach for all DM models still missing
Booking
BookingNo (DD)
Turnover (F)
NoPersons (F)
DimProduct
ProdDesription
ProdCategory
DimCustomer
CusName
CusAge
CusGender
CusOrigin
DimTime
DayTime
Minutes
Hours
DimDate
DayInWeek
Weekend
Week
Month
Year
Season

Content
• Introduction
• Conclusion

Frequent itemsets
• Frequent itemset = attribute values often co-occuring
• Approach to store frequent itemsets
– Reuse original data structures
• Store co-occuring attribute values in
artifical entries within original star schema
– Add frequent itemset table
referencing to artifical
entries in orginal star
schema

Frequent itemsets
• Example: „old“ and „Swedisch“ customers

Frequent itemsets and OLAP analyses
• Overall revenue per frequent itemset
– Frequent itemsets used as new analysis dimension
• Identifying most valuable frequent itemsets
(which is not possible in typical data mining tools)

Frequent itemsets and OLAP analyses
• Drill-through by frequent itemsets
– Looking at single bookings belonging to
(i.e. supporting) a frequent itemset
– Example:
Frequent itemset „old customers
booking a hotel“ with detailed
information booking data, season,
customer age, origin, sex and
booking price

Clustering
• Clustering = grouping similar records into homogeneous clusters
• Approach for storing clusters within a multi-dimensional structure
– Cluster centroids (i.e. calculated
cluster centers) stored as artificial
entries in original star schema
– Cluster table stores characteristics
of each cluster of a cluster model
and points to cluster centroid as
artifical entry in star schema
– In the original fact table the
cluster membership is stored for
each original data entry (attribute
FKCluster pointing to the cluster
table)

Clustering
• Example: Customer clusters

Cluster models and OLAP analyses
• Revenue per customer cluster
– Clusters are used as new
dimension for data
analyses
– New characteristics of
clusters can be
calculated, e.g. sum of
booking price, grouped
by any other dimension
characteristic e.g. season

Decision trees
• Decision tree
– Separating data records into predefined classes based on a
series of decisions

Decision trees
• Storing a decision tree in a multi-dimensional structure
– Each node is represented by a decision rule
• booking = short-term -> valuable = yes
• booking = long-term & type appartment = yes -> valuable = yes
– Decision rules stored by
• Reusing original star schema to specify attribute values of the rule
• Specific table specifying rule characteristics and referencing to artifical entry in
original structure

Decision trees
• Example decision rule
booking = long-term & type appartment = yes -> valuable = yes

Decision trees and OLAP analyses
– Decision tree nodes
are used as new
dimension for data
analyses
– New characteristics
of decision tree
nodes can be
calculated, e.g. sum
of booking price
(based on any fact
of the fact table)

Decision trees and OLAP analyses
– Decision trees
are used to
narrow down
the analysis to
interesting
subgroups (i.e.
nodes with a
high accuracy)

Benefit of presented approach
• Advantages of integrating data mining results into
original multi-dimensional data structure
– Ordinary OLAP queries can be used to analyse data mining
results (like frequent itemsets, cluster models or decision
trees)
– Data mining results complement existing information and
enhance explanation power of analyses by constituting a
new dimension
• E.g. calculate overall turnover of frequent itemsets, decision tree
nodes or clusters
• E.g. filter bookings by a specific frequent itemset (only looking at
bookings from old and Swedish customers)

Content
• Introduction
• Conclusion

Conclusion & Outlook
• BI & data mining in tourism
– Multi-dimensional data warehouse structures important concept for
tourism (destinations)
– All data mining techniques heavily used in tourism
• Novel approach for integrating data mining models into
underlying multi-dimensional data structures
– Frequent itemsets, association rules, clustering, decision trees
– Complement existing information and enrich OLAP analyses
• Future activities
– Automatic transformation of data mining results into multi-
dimensional structures to support broader evaluation
– Evaluate user acceptance of new analysis possibilities

Integration of data mining results into multi-dimensional data models

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Integration of data mining results into multi-dimensional data models

Similar to Integration of data mining results into multi-dimensional data models (20)

Recently uploaded

Recently uploaded (20)

Integration of data mining results into multi-dimensional data models