SlideShare a Scribd company logo
1 of 24
ENTER 2015 Research Track Slide Number 1
Volker Meyera
Wolfram Höpkena
Matthias Fuchsb
Maria Lexhagenb
a University of Applied Sciences Ravensburg-Weingarten
Weingarten, Germany
{name.surname}@hs-weingarten.de
b Mid-Sweden University
Östersund, Sweden
{name.surname}@miun.se
Integration of data mining results into
multi-dimensional data models
ENTER 2015 Research Track Slide Number 2
Content
• Introduction
• State of the art
• Concepts for integrating DM results into MDM
• Conclusion
ENTER 2015 Research Track Slide Number 3
Motivation
• Business intelligence and data mining in tourism
– Amount of available information dramatically increased
• e.g. web-servers store tourists’ website navigation, data bases save transaction and
survey data, etc.
– Methods of BI and DM used to mine information about tourists’ travel
motives, service expectations, channel use, conversion rates or booking trends
(Pyo, et al., 2002; Wong, et al., 2006)
• Business-IT gap
– DM tools demand a huge knowledge about the DM process and the single
techniques (e.g. decision trees, association rules)
– Results can be unintelligible without the right technical knowledge of how to
read them
 Crucial business relevant information is available, but the user who needs the
information is not able to decode it
ENTER 2015 Research Track Slide Number 4
Objective
• Objective
– Present DM results in a way understandable and managable for
business users
• Approach
– Integrate knowledge generated by DM techniques directly into the
data warehouse structures the underlying data are stemming from
– DM results (e.g. decision trees or association rules) available by well-
established analysis techniques, like online analytic processing (OLAP)
Integration concepts and data warehouse structures are presented for
major data mining techniques, like frequent itemsets, decision trees,
and clustering
ENTER 2015 Research Track Slide Number 5
Content
• Introduction
• State of the art
• Concepts for integrating DM results into MDM
• Conclusion
ENTER 2015 Research Track Slide Number 6
Integrating DM results into databases
• Extending the database standard SQL
(by new data types and database operations)
– Inductive query language by SINDBAD project (Kramer, et al., 2006)
– Mining Association Rule Extension (Meo, et al., 1998)
– Mining Structured Query Language (MSQL) (Imielinski, 1999)
– Data Mining Query Language (DMQL) (Han, et al., 1996)
• Integrating DM results without extending database standard
– ADReM-Group (http://adrem.ua.ac.be/adrem) or Fromont et. al (2007)
standard conformance (standard tools and analysis approaches)
suitable for integrating DM results into existing data warehouse
structures
ENTER 2015 Research Track Slide Number 7
Multi-dimensional data models (MDM)
• Fundamental concept of MDM
– Separation between
• Performance indicators (facts),
e.g. turnover or number of persons
• Context/dimensions, e.g. time, date,
customer, or product
– Typically represented as star schema
• MDM became famous for data warehousing
– Effective support of complex queries and OLAP analyses
– Better understandability for end users
– Crucial in tourism due to complex data structures (Höpken et al. 2013)
Booking
BookingNo (DD)
Turnover (F)
NoPersons (F)
DimProduct
ProdDesription
ProdCategory
DimCustomer
CusName
CusAge
CusGender
CusOrigin
DimTime
DayTime
Minutes
Hours
DimDate
DayInWeek
Weekend
Week
Month
Year
Season
ENTER 2015 Research Track Slide Number 8
Integrating DM results into MDM
• Extending the MDM by additional facts and
dimensions/attributes
– Complexity strongly depends on
concrete DM model
– Cluster membership can just be
represented as an additional attribute
– Decision trees or association rules need a
more complex fact/dimension structure
• Current status
– Simple approaches for market baskets (i.e. frequent itemsets) exist
(Kimball & Ross, 2002)
– Comprehensive approach for all DM models still missing
Booking
BookingNo (DD)
Turnover (F)
NoPersons (F)
DimProduct
ProdDesription
ProdCategory
DimCustomer
CusName
CusAge
CusGender
CusOrigin
DimTime
DayTime
Minutes
Hours
DimDate
DayInWeek
Weekend
Week
Month
Year
Season
ENTER 2015 Research Track Slide Number 9
Content
• Introduction
• State of the art
• Concepts for integrating DM results into MDM
• Conclusion
ENTER 2015 Research Track Slide Number 10
Frequent itemsets
• Frequent itemset = attribute values often co-occuring
• Approach to store frequent itemsets
– Reuse original data structures
• Store co-occuring attribute values in
artifical entries within original star schema
– Add frequent itemset table
referencing to artifical
entries in orginal star
schema
ENTER 2015 Research Track Slide Number 11
Frequent itemsets
• Example: „old“ and „Swedisch“ customers
ENTER 2015 Research Track Slide Number 12
Frequent itemsets and OLAP analyses
• Overall revenue per frequent itemset
– Frequent itemsets used as new analysis dimension
• Identifying most valuable frequent itemsets
(which is not possible in typical data mining tools)
ENTER 2015 Research Track Slide Number 13
Frequent itemsets and OLAP analyses
• Drill-through by frequent itemsets
– Looking at single bookings belonging to
(i.e. supporting) a frequent itemset
– Example:
Frequent itemset „old customers
booking a hotel“ with detailed
information booking data, season,
customer age, origin, sex and
booking price
ENTER 2015 Research Track Slide Number 14
Clustering
• Clustering = grouping similar records into homogeneous clusters
• Approach for storing clusters within a multi-dimensional structure
– Cluster centroids (i.e. calculated
cluster centers) stored as artificial
entries in original star schema
– Cluster table stores characteristics
of each cluster of a cluster model
and points to cluster centroid as
artifical entry in star schema
– In the original fact table the
cluster membership is stored for
each original data entry (attribute
FKCluster pointing to the cluster
table)
ENTER 2015 Research Track Slide Number 15
Clustering
• Example: Customer clusters
ENTER 2015 Research Track Slide Number 16
Cluster models and OLAP analyses
• Revenue per customer cluster
– Clusters are used as new
dimension for data
analyses
– New characteristics of
clusters can be
calculated, e.g. sum of
booking price, grouped
by any other dimension
characteristic e.g. season
ENTER 2015 Research Track Slide Number 17
Decision trees
• Decision tree
– Separating data records into predefined classes based on a
series of decisions
ENTER 2015 Research Track Slide Number 18
Decision trees
• Storing a decision tree in a multi-dimensional structure
– Each node is represented by a decision rule
• booking = short-term -> valuable = yes
• booking = long-term & type appartment = yes -> valuable = yes
– Decision rules stored by
• Reusing original star schema to specify attribute values of the rule
• Specific table specifying rule characteristics and referencing to artifical entry in
original structure
ENTER 2015 Research Track Slide Number 19
Decision trees
• Example decision rule
booking = long-term & type appartment = yes -> valuable = yes
ENTER 2015 Research Track Slide Number 20
Decision trees and OLAP analyses
– Decision tree nodes
are used as new
dimension for data
analyses
– New characteristics
of decision tree
nodes can be
calculated, e.g. sum
of booking price
(based on any fact
of the fact table)
ENTER 2015 Research Track Slide Number 21
Decision trees and OLAP analyses
– Decision trees
are used to
narrow down
the analysis to
interesting
subgroups (i.e.
nodes with a
high accuracy)
ENTER 2015 Research Track Slide Number 22
Benefit of presented approach
• Advantages of integrating data mining results into
original multi-dimensional data structure
– Ordinary OLAP queries can be used to analyse data mining
results (like frequent itemsets, cluster models or decision
trees)
– Data mining results complement existing information and
enhance explanation power of analyses by constituting a
new dimension
• E.g. calculate overall turnover of frequent itemsets, decision tree
nodes or clusters
• E.g. filter bookings by a specific frequent itemset (only looking at
bookings from old and Swedish customers)
ENTER 2015 Research Track Slide Number 23
Content
• Introduction
• State of the art
• Concepts for integrating DM results into MDM
• Conclusion
ENTER 2015 Research Track Slide Number 24
Conclusion & Outlook
• BI & data mining in tourism
– Multi-dimensional data warehouse structures important concept for
tourism (destinations)
– All data mining techniques heavily used in tourism
• Novel approach for integrating data mining models into
underlying multi-dimensional data structures
– Frequent itemsets, association rules, clustering, decision trees
– Complement existing information and enrich OLAP analyses
• Future activities
– Automatic transformation of data mining results into multi-
dimensional structures to support broader evaluation
– Evaluate user acceptance of new analysis possibilities

More Related Content

Viewers also liked

Viewers also liked (20)

UNCOVERING TRAVELLERS’ EXPECTATIONS THROUGH ‘NETNOGRAPHY’: A BIG DATA APPROAC...
UNCOVERING TRAVELLERS’ EXPECTATIONS THROUGH ‘NETNOGRAPHY’: A BIG DATA APPROAC...UNCOVERING TRAVELLERS’ EXPECTATIONS THROUGH ‘NETNOGRAPHY’: A BIG DATA APPROAC...
UNCOVERING TRAVELLERS’ EXPECTATIONS THROUGH ‘NETNOGRAPHY’: A BIG DATA APPROAC...
 
Gamification in Tourism: Analysis of Brazil Quest Game
Gamification in Tourism: Analysis of Brazil Quest GameGamification in Tourism: Analysis of Brazil Quest Game
Gamification in Tourism: Analysis of Brazil Quest Game
 
Online Marketing Challenges 2015
Online Marketing Challenges 2015Online Marketing Challenges 2015
Online Marketing Challenges 2015
 
Methodology for the publication of Linked Open Data from small and medium siz...
Methodology for the publication of Linked Open Data from small and medium siz...Methodology for the publication of Linked Open Data from small and medium siz...
Methodology for the publication of Linked Open Data from small and medium siz...
 
Trevii: Cheaper tickets for tourist attraction in an user-friendly way.
Trevii: Cheaper tickets for tourist attraction in an user-friendly way.Trevii: Cheaper tickets for tourist attraction in an user-friendly way.
Trevii: Cheaper tickets for tourist attraction in an user-friendly way.
 
An examination of the e-bookers and e-browsers in emerging markets – online b...
An examination of the e-bookers and e-browsers in emerging markets – online b...An examination of the e-bookers and e-browsers in emerging markets – online b...
An examination of the e-bookers and e-browsers in emerging markets – online b...
 
Smart Tourism: Where are we now, where are we heading? The Consumer Perspective
Smart Tourism: Where are we now, where are we heading? The Consumer PerspectiveSmart Tourism: Where are we now, where are we heading? The Consumer Perspective
Smart Tourism: Where are we now, where are we heading? The Consumer Perspective
 
Information gathering by ubiquitous services for CRM in tourism destinations:...
Information gathering by ubiquitous services for CRM in tourism destinations:...Information gathering by ubiquitous services for CRM in tourism destinations:...
Information gathering by ubiquitous services for CRM in tourism destinations:...
 
“This City is absolutely Fun and Trendy”. A Destination Brand Personality Ana...
“This City is absolutely Fun and Trendy”. A Destination Brand Personality Ana...“This City is absolutely Fun and Trendy”. A Destination Brand Personality Ana...
“This City is absolutely Fun and Trendy”. A Destination Brand Personality Ana...
 
A Theoretical Model of Impulsive Buying Behaviour in Tourism Social Commerce
A Theoretical Model of Impulsive Buying Behaviour in Tourism Social CommerceA Theoretical Model of Impulsive Buying Behaviour in Tourism Social Commerce
A Theoretical Model of Impulsive Buying Behaviour in Tourism Social Commerce
 
Online marketing challenges Zürich Tourism
Online marketing challenges Zürich TourismOnline marketing challenges Zürich Tourism
Online marketing challenges Zürich Tourism
 
E-TOURISM IN SWITZERLAND: THE ACADEMIC PERSPECTIVE AND SWISS DAY CLOSING
E-TOURISM IN SWITZERLAND: THE ACADEMIC PERSPECTIVE AND SWISS DAY CLOSINGE-TOURISM IN SWITZERLAND: THE ACADEMIC PERSPECTIVE AND SWISS DAY CLOSING
E-TOURISM IN SWITZERLAND: THE ACADEMIC PERSPECTIVE AND SWISS DAY CLOSING
 
How to get research papers published
How to get research papers publishedHow to get research papers published
How to get research papers published
 
Tourism, Innovation and Technology: Building the Future
Tourism, Innovation and Technology: Building the FutureTourism, Innovation and Technology: Building the Future
Tourism, Innovation and Technology: Building the Future
 
Tourism destination perspective. Best practices of Zermatt - Matterhorn.
Tourism destination perspective. Best practices of Zermatt - Matterhorn.Tourism destination perspective. Best practices of Zermatt - Matterhorn.
Tourism destination perspective. Best practices of Zermatt - Matterhorn.
 
Increasing Financial Returns and Guest Satisfaction through Human Capital Dev...
Increasing Financial Returns and Guest Satisfaction through Human Capital Dev...Increasing Financial Returns and Guest Satisfaction through Human Capital Dev...
Increasing Financial Returns and Guest Satisfaction through Human Capital Dev...
 
Investigating E-Lerning Effects on Continuance Intentions of Hospitality Mana...
Investigating E-Lerning Effects on Continuance Intentions of Hospitality Mana...Investigating E-Lerning Effects on Continuance Intentions of Hospitality Mana...
Investigating E-Lerning Effects on Continuance Intentions of Hospitality Mana...
 
Probabilistic Modelling of Influences on Travel Decision Making
Probabilistic Modelling of Influences on Travel Decision MakingProbabilistic Modelling of Influences on Travel Decision Making
Probabilistic Modelling of Influences on Travel Decision Making
 
Mediating Roles of Self-Image Expression: Sharing Travel Information of SNSs
Mediating Roles of Self-Image Expression: Sharing Travel Information of SNSsMediating Roles of Self-Image Expression: Sharing Travel Information of SNSs
Mediating Roles of Self-Image Expression: Sharing Travel Information of SNSs
 
Usages and Role of Instant Messaging Applications during the Beatification of...
Usages and Role of Instant Messaging Applications during the Beatification of...Usages and Role of Instant Messaging Applications during the Beatification of...
Usages and Role of Instant Messaging Applications during the Beatification of...
 

Similar to Integration of data mining results into multi-dimensional data models

1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
tafosepsdfasg
 
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Perficient
 

Similar to Integration of data mining results into multi-dimensional data models (20)

Application Middleware Overview
Application Middleware OverviewApplication Middleware Overview
Application Middleware Overview
 
Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
 
Lesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptxLesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptx
 
2. data warehouse 2nd unit
2. data warehouse 2nd unit2. data warehouse 2nd unit
2. data warehouse 2nd unit
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 
Data Warehouse approaches with Dynamics AX
Data Warehouse  approaches with Dynamics AXData Warehouse  approaches with Dynamics AX
Data Warehouse approaches with Dynamics AX
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
Leveraging Oracle's Life Sciences Data Hub to Enable Dynamic Cross-Study Anal...
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
 
Preprocessing_new.ppt
Preprocessing_new.pptPreprocessing_new.ppt
Preprocessing_new.ppt
 
Elements of Data Documentation
Elements of Data DocumentationElements of Data Documentation
Elements of Data Documentation
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
CV MG DU TOIT V0 8.8
CV MG DU TOIT V0 8.8CV MG DU TOIT V0 8.8
CV MG DU TOIT V0 8.8
 
Week 02.pdf
Week 02.pdfWeek 02.pdf
Week 02.pdf
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 

Recently uploaded

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Recently uploaded (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 

Integration of data mining results into multi-dimensional data models

  • 1. ENTER 2015 Research Track Slide Number 1 Volker Meyera Wolfram Höpkena Matthias Fuchsb Maria Lexhagenb a University of Applied Sciences Ravensburg-Weingarten Weingarten, Germany {name.surname}@hs-weingarten.de b Mid-Sweden University Östersund, Sweden {name.surname}@miun.se Integration of data mining results into multi-dimensional data models
  • 2. ENTER 2015 Research Track Slide Number 2 Content • Introduction • State of the art • Concepts for integrating DM results into MDM • Conclusion
  • 3. ENTER 2015 Research Track Slide Number 3 Motivation • Business intelligence and data mining in tourism – Amount of available information dramatically increased • e.g. web-servers store tourists’ website navigation, data bases save transaction and survey data, etc. – Methods of BI and DM used to mine information about tourists’ travel motives, service expectations, channel use, conversion rates or booking trends (Pyo, et al., 2002; Wong, et al., 2006) • Business-IT gap – DM tools demand a huge knowledge about the DM process and the single techniques (e.g. decision trees, association rules) – Results can be unintelligible without the right technical knowledge of how to read them  Crucial business relevant information is available, but the user who needs the information is not able to decode it
  • 4. ENTER 2015 Research Track Slide Number 4 Objective • Objective – Present DM results in a way understandable and managable for business users • Approach – Integrate knowledge generated by DM techniques directly into the data warehouse structures the underlying data are stemming from – DM results (e.g. decision trees or association rules) available by well- established analysis techniques, like online analytic processing (OLAP) Integration concepts and data warehouse structures are presented for major data mining techniques, like frequent itemsets, decision trees, and clustering
  • 5. ENTER 2015 Research Track Slide Number 5 Content • Introduction • State of the art • Concepts for integrating DM results into MDM • Conclusion
  • 6. ENTER 2015 Research Track Slide Number 6 Integrating DM results into databases • Extending the database standard SQL (by new data types and database operations) – Inductive query language by SINDBAD project (Kramer, et al., 2006) – Mining Association Rule Extension (Meo, et al., 1998) – Mining Structured Query Language (MSQL) (Imielinski, 1999) – Data Mining Query Language (DMQL) (Han, et al., 1996) • Integrating DM results without extending database standard – ADReM-Group (http://adrem.ua.ac.be/adrem) or Fromont et. al (2007) standard conformance (standard tools and analysis approaches) suitable for integrating DM results into existing data warehouse structures
  • 7. ENTER 2015 Research Track Slide Number 7 Multi-dimensional data models (MDM) • Fundamental concept of MDM – Separation between • Performance indicators (facts), e.g. turnover or number of persons • Context/dimensions, e.g. time, date, customer, or product – Typically represented as star schema • MDM became famous for data warehousing – Effective support of complex queries and OLAP analyses – Better understandability for end users – Crucial in tourism due to complex data structures (Höpken et al. 2013) Booking BookingNo (DD) Turnover (F) NoPersons (F) DimProduct ProdDesription ProdCategory DimCustomer CusName CusAge CusGender CusOrigin DimTime DayTime Minutes Hours DimDate DayInWeek Weekend Week Month Year Season
  • 8. ENTER 2015 Research Track Slide Number 8 Integrating DM results into MDM • Extending the MDM by additional facts and dimensions/attributes – Complexity strongly depends on concrete DM model – Cluster membership can just be represented as an additional attribute – Decision trees or association rules need a more complex fact/dimension structure • Current status – Simple approaches for market baskets (i.e. frequent itemsets) exist (Kimball & Ross, 2002) – Comprehensive approach for all DM models still missing Booking BookingNo (DD) Turnover (F) NoPersons (F) DimProduct ProdDesription ProdCategory DimCustomer CusName CusAge CusGender CusOrigin DimTime DayTime Minutes Hours DimDate DayInWeek Weekend Week Month Year Season
  • 9. ENTER 2015 Research Track Slide Number 9 Content • Introduction • State of the art • Concepts for integrating DM results into MDM • Conclusion
  • 10. ENTER 2015 Research Track Slide Number 10 Frequent itemsets • Frequent itemset = attribute values often co-occuring • Approach to store frequent itemsets – Reuse original data structures • Store co-occuring attribute values in artifical entries within original star schema – Add frequent itemset table referencing to artifical entries in orginal star schema
  • 11. ENTER 2015 Research Track Slide Number 11 Frequent itemsets • Example: „old“ and „Swedisch“ customers
  • 12. ENTER 2015 Research Track Slide Number 12 Frequent itemsets and OLAP analyses • Overall revenue per frequent itemset – Frequent itemsets used as new analysis dimension • Identifying most valuable frequent itemsets (which is not possible in typical data mining tools)
  • 13. ENTER 2015 Research Track Slide Number 13 Frequent itemsets and OLAP analyses • Drill-through by frequent itemsets – Looking at single bookings belonging to (i.e. supporting) a frequent itemset – Example: Frequent itemset „old customers booking a hotel“ with detailed information booking data, season, customer age, origin, sex and booking price
  • 14. ENTER 2015 Research Track Slide Number 14 Clustering • Clustering = grouping similar records into homogeneous clusters • Approach for storing clusters within a multi-dimensional structure – Cluster centroids (i.e. calculated cluster centers) stored as artificial entries in original star schema – Cluster table stores characteristics of each cluster of a cluster model and points to cluster centroid as artifical entry in star schema – In the original fact table the cluster membership is stored for each original data entry (attribute FKCluster pointing to the cluster table)
  • 15. ENTER 2015 Research Track Slide Number 15 Clustering • Example: Customer clusters
  • 16. ENTER 2015 Research Track Slide Number 16 Cluster models and OLAP analyses • Revenue per customer cluster – Clusters are used as new dimension for data analyses – New characteristics of clusters can be calculated, e.g. sum of booking price, grouped by any other dimension characteristic e.g. season
  • 17. ENTER 2015 Research Track Slide Number 17 Decision trees • Decision tree – Separating data records into predefined classes based on a series of decisions
  • 18. ENTER 2015 Research Track Slide Number 18 Decision trees • Storing a decision tree in a multi-dimensional structure – Each node is represented by a decision rule • booking = short-term -> valuable = yes • booking = long-term & type appartment = yes -> valuable = yes – Decision rules stored by • Reusing original star schema to specify attribute values of the rule • Specific table specifying rule characteristics and referencing to artifical entry in original structure
  • 19. ENTER 2015 Research Track Slide Number 19 Decision trees • Example decision rule booking = long-term & type appartment = yes -> valuable = yes
  • 20. ENTER 2015 Research Track Slide Number 20 Decision trees and OLAP analyses – Decision tree nodes are used as new dimension for data analyses – New characteristics of decision tree nodes can be calculated, e.g. sum of booking price (based on any fact of the fact table)
  • 21. ENTER 2015 Research Track Slide Number 21 Decision trees and OLAP analyses – Decision trees are used to narrow down the analysis to interesting subgroups (i.e. nodes with a high accuracy)
  • 22. ENTER 2015 Research Track Slide Number 22 Benefit of presented approach • Advantages of integrating data mining results into original multi-dimensional data structure – Ordinary OLAP queries can be used to analyse data mining results (like frequent itemsets, cluster models or decision trees) – Data mining results complement existing information and enhance explanation power of analyses by constituting a new dimension • E.g. calculate overall turnover of frequent itemsets, decision tree nodes or clusters • E.g. filter bookings by a specific frequent itemset (only looking at bookings from old and Swedish customers)
  • 23. ENTER 2015 Research Track Slide Number 23 Content • Introduction • State of the art • Concepts for integrating DM results into MDM • Conclusion
  • 24. ENTER 2015 Research Track Slide Number 24 Conclusion & Outlook • BI & data mining in tourism – Multi-dimensional data warehouse structures important concept for tourism (destinations) – All data mining techniques heavily used in tourism • Novel approach for integrating data mining models into underlying multi-dimensional data structures – Frequent itemsets, association rules, clustering, decision trees – Complement existing information and enrich OLAP analyses • Future activities – Automatic transformation of data mining results into multi- dimensional structures to support broader evaluation – Evaluate user acceptance of new analysis possibilities