AGILE DATA MINING 
WITH DATA VAULT 2.0 
Timo Cirkel, Michael Olschimke 
Dörffler & Partner GmbH
Introduction 
Background 
Example 
Conclusion 
AGENDA 
Agile 12.02.2014 Data Mining with Data Vault 2.0 2
INTRODUCTION 
Agile Data Mining with DataVault 2.0 
Agile 12.02.2014 Data Mining with Data Vault 2.0 3
TIMO CIRKEL 
BI-Consultant 
Certified Data Vault 2.0 Practitioner 
Analysis Of Policyholders 
Specialized inCRM, Software Development, 
DWHAutomation 
Industries: Insurance, Energy 
B. Sc. Business Informatics 
12.02.2014 Agile Data Mining with Data Vault 2.0 4
MICHAEL OLSCHIMKE 
Senior BI-Consultant 
Certified Data Vault 2.0 Practitioner 
Official Data Vault 2.0 Trainer in Europe 
AssociateTeacher University of Hannover 
Specializing in Data Vault 2.0, Data Mining, 
CRM, project management 
Industries: Insurance, Automotive, Retail, 
Public Sector, Non-Profits 
12.02.2014 Agile Data Mining with Data Vault 2.0 5
• Medium-sized consulting firm 
• Official Partner of Dan Linstedt In 
Europe 
• Consulting, Training, 
Implementation 
• Industries: 
• Insurance 
• Automotive 
• Banks 
• Trade 
• Pharmaceuticals 
• Telecommunications 
DÖRFFLER & PARTNER GMBH 
12.02.2014 Agile Data Mining With Data Vault 2.0 6
BACKGROUND 
Agile Data Mining with DataVault 2.0 
Agile 12.02.2014 Data Mining with Data Vault 2.0 7
DATA MINING PROJECT IN THE VGH 
Motor insurance 
Customer segmentation 
A first datamining pilot, therefore: 
No specific requirements 
Vision is developed during project 
Agile Project Methodology 
Close co-operation with business 
12.02.2014 Agile Data Mining with Data Vault 2.0 8
• Extracting 
information from 
existing data and 
Patterns 
• Four (large) 
categories: 
• Segmentation 
• Classification 
• Prediction 
• Association 
• Wide range of 
available algorithms 
and methods 
DATA MINING PROJECTS 
"The term Data Mining ... describes 
the extraction implicitly existing, 
non-trivial and useful knowledge 
from large, dynamic, relatively 
complex structured data." 
Datenbank 
Anwendung 
Anwender 
Data-Mining- 
Techniken 
Aussagen, Regeln & 
Informationen 
Data Dictionary 
Fachwissen 
12.02.2014 Agile Data Mining with Data Vault 2.0 9
DATA VAULT 2.0 MODELING 
Surrogate 
Key 
Business 
Keys 
Foreign Keys 
Descriptors 
In accordance with its own representation Linstedt, 2014 
12.02.2014 Agile Data Mining with Data Vault 2.0 10
DATA VAULT 2.0 METHODOLOGY 
Data Vault 
2.0 
Methodology 
Six 
Sigma 
TQM 
Scrum CMMI 
PMP 
SDLC 
12.02.2014 Agile Data Mining with Data Vault 2.0 11
DATA VAULT 2.0 METHODOLOGY FOR DATA MINING 
Advantages 
• Agile project management for DWH projects 
• Automation and generation 
• Rapid adoption to changes in the model 
• Incremental build-out = incremental cost control 
• Targeted delivery = two week sprints 
• Predictable and measurable results 
Disadvantages 
• Focus on loading of raw data and the production 
of information 
• Not many data mining references 
• Many concepts in the methodology are not 
applicable for data mining projects 
• Difficult scaling of team sizes in data mining 
projects 
12.02.2014 Agile Data Mining with Data Vault 2.0 12
CRISP-DM 
Own Representation in accordance with Chapman, et al. , 2000 
12.02.2014 Agile Data Mining with Data Vault 2.0 13
PROCESS MODEL 
Prozessmodell – VGH Kundensegmentierung 
ivv KTC D & P 
Daten in Data Vault 
Modell speichern 
Daten abziehen 
Algorithmus 
auswählen 
Segmentierung 
ausführen 
Ergebnis erzielt? 
Ja 
Ergebnis 
präsentieren 
Ergebnis ok? 
Ende 
Ja 
Start 
Gütefunktion 
erarbeiten 
SQL-Query erstellen 
Relevante VN-Attribute 
ermitteln 
Nein Formel ok? 
Ja 
Nein 
Algorithmen 
erforschen 
Nein 
Geeigneter 
Algorithmus 
gefunden? 
Ja 
Nein 
12.02.2014 Agile Data Mining with Data Vault 2.0 14
RAPIDMINER 
 Java-based 
data 
mining 
software 
 One of 
the most 
widely used 
data mining 
tools 
 Offers 
 Environment fo 
r control flow 
 Large number 
of algorithms 
 Large choice 
of data sources 
Overall CorporaTE Consultants Academics NGO / GOV'T 
© 2012 Rexer AnalYTICS 
12.02.2014 Agile Data Mining with Data Vault 2.0 15
EXAMPLE 
Agile Data Mining with DataVault 2.0 
Agile 12.02.2014 Data Mining with Data Vault 2.0 16
EXAMPLE 
 AdventureWorks-Database 
 Scenario: 
 Advertising campaign for a new bike 
 Identification of the target group 
 Solution: 
 Decision Tree 
 Identify relevant attributes in several iterations 
Lachev, 2005, p. 238ff 
Simple 
Example 
12.02.2014 Agile Data Mining with Data Vault 2.0 17
Agile Data Mining with Data Vault 2.0 18 
10066 Records 
Attribute 
Marital 
Status 
Gender 
Yearly 
Income 
Total 
Children 
Education 
Number Cars 
Owned 
Commute 
Distance 
Occupation 
House Owner 
Flag 
Age
ITERATION 1: DATA VAULT 2.0 MODEL 
English 
Education 
Numbers Cars 
Owned 
Gender 
Marital Status 
Sat 
Customer 
Hub 
Customer 
Customer Key 
Commute 
Distance 
Age 
House Owner 
Flag 
English 
Occupation 
Sat Category 
Product 
Category 
12.02.2014 Agile Data Mining with Data Vault 2.0 19
ITERATION 1: RAPIDMINER PROCESS 
Data Gathering 
Data preparation 
Modeling 
12.02.2014 Agile Data Mining with Data Vault 2.0 20
ITERATION 1: DECISIONTREE MODEL 
12.02.2014 Agile Data Mining with Data Vault 2.0 21
ITERATION 1: RESULTS 
12.02.2014 Agile Data Mining with Data Vault 2.0 22
ITERATION 2: DATA VAULT 2.0 MODEL 
English 
Education 
Numbers Cars 
Owned 
Gender 
Marital Status 
Sat 
Customer 
Hub 
Customer 
Sat Customer 
Income 
Customer Key 
Commute 
Distance 
Age 
House Owner 
Flag 
English 
Occupation 
Sat Customer 
Children 
Sat Category 
Total 
Children 
Yearly 
Income 
Product 
Category 
12.02.2014 Agile Data Mining with Data Vault 2.0 23
ITERATION 2: RAPIDMINER PROCESS 
Data Gathering 
Preparation Modeling 
12.02.2014 Agile Data Mining with Data Vault 2.0 24
ITERATION 2: RESULTS 
+4.01% 
12.02.2014 Agile Data Mining with Data Vault 2.0 25
ITERATION 3: DATA VAULT 2.0 MODEL 
English 
Education 
Numbers Cars 
Owned 
Gender 
Marital Status 
Sat 
Customer 
Hub 
Customer 
Sat Customer 
Income 
Customer Key 
Commute 
Distance 
Age 
House Owner 
Flag 
English 
Occupation 
Sat Customer 
Children 
Sat Category 
Total 
Children 
Yearly 
Income 
Product 
Category 
Commute 
Distance Miles 
CSat Customer 
Distance 
12.02.2014 Agile Data Mining with Data Vault 2.0 26
ITERATION 3: RAPIDMINER PROCESS 
Data Gathering 
Preparation Modeling 
12.02.2014 Agile Data Mining with Data Vault 2.0 27
ITERATION 3: RESULTS 
+0.12% 
12.02.2014 Agile Data Mining with Data Vault 2.0 28
CONCLUSIONS 
Agile Data Mining with DataVault 2.0 
Agile 12.02.2014 Data Mining with Data Vault 2.0 29
CONCLUSIONS 
 Data Vault is a flexible data 
model, with good support for agile project 
methodology 
 DataVault is not an additional hurdle in data mining 
projects 
 Additional attributes can be added at any time during 
the project, in an incremental fashion 
Business Vault: transparent data processing 
12.02.2014 Agile Data Mining with Data Vault 2.0 30
FURTHER INFORMATION 
Appears 
2015 
Available 
Www.doerffler.com WWW.datavault.de Www.learndatavault.com 
Appears 
2015 
12.02.2014 Agile Data Mining with Data Vault 2.0 31
Give us feedback 
Agile Data Mining with Data Vault 2.0 32 
Http://goo.gl/LGO4ze 
Source:Vasilijonline.com 
12.02.2014

Agile Data Mining with Data Vault 2.0 (english)

  • 1.
    AGILE DATA MINING WITH DATA VAULT 2.0 Timo Cirkel, Michael Olschimke Dörffler & Partner GmbH
  • 2.
    Introduction Background Example Conclusion AGENDA Agile 12.02.2014 Data Mining with Data Vault 2.0 2
  • 3.
    INTRODUCTION Agile DataMining with DataVault 2.0 Agile 12.02.2014 Data Mining with Data Vault 2.0 3
  • 4.
    TIMO CIRKEL BI-Consultant Certified Data Vault 2.0 Practitioner Analysis Of Policyholders Specialized inCRM, Software Development, DWHAutomation Industries: Insurance, Energy B. Sc. Business Informatics 12.02.2014 Agile Data Mining with Data Vault 2.0 4
  • 5.
    MICHAEL OLSCHIMKE SeniorBI-Consultant Certified Data Vault 2.0 Practitioner Official Data Vault 2.0 Trainer in Europe AssociateTeacher University of Hannover Specializing in Data Vault 2.0, Data Mining, CRM, project management Industries: Insurance, Automotive, Retail, Public Sector, Non-Profits 12.02.2014 Agile Data Mining with Data Vault 2.0 5
  • 6.
    • Medium-sized consultingfirm • Official Partner of Dan Linstedt In Europe • Consulting, Training, Implementation • Industries: • Insurance • Automotive • Banks • Trade • Pharmaceuticals • Telecommunications DÖRFFLER & PARTNER GMBH 12.02.2014 Agile Data Mining With Data Vault 2.0 6
  • 7.
    BACKGROUND Agile DataMining with DataVault 2.0 Agile 12.02.2014 Data Mining with Data Vault 2.0 7
  • 8.
    DATA MINING PROJECTIN THE VGH Motor insurance Customer segmentation A first datamining pilot, therefore: No specific requirements Vision is developed during project Agile Project Methodology Close co-operation with business 12.02.2014 Agile Data Mining with Data Vault 2.0 8
  • 9.
    • Extracting informationfrom existing data and Patterns • Four (large) categories: • Segmentation • Classification • Prediction • Association • Wide range of available algorithms and methods DATA MINING PROJECTS "The term Data Mining ... describes the extraction implicitly existing, non-trivial and useful knowledge from large, dynamic, relatively complex structured data." Datenbank Anwendung Anwender Data-Mining- Techniken Aussagen, Regeln & Informationen Data Dictionary Fachwissen 12.02.2014 Agile Data Mining with Data Vault 2.0 9
  • 10.
    DATA VAULT 2.0MODELING Surrogate Key Business Keys Foreign Keys Descriptors In accordance with its own representation Linstedt, 2014 12.02.2014 Agile Data Mining with Data Vault 2.0 10
  • 11.
    DATA VAULT 2.0METHODOLOGY Data Vault 2.0 Methodology Six Sigma TQM Scrum CMMI PMP SDLC 12.02.2014 Agile Data Mining with Data Vault 2.0 11
  • 12.
    DATA VAULT 2.0METHODOLOGY FOR DATA MINING Advantages • Agile project management for DWH projects • Automation and generation • Rapid adoption to changes in the model • Incremental build-out = incremental cost control • Targeted delivery = two week sprints • Predictable and measurable results Disadvantages • Focus on loading of raw data and the production of information • Not many data mining references • Many concepts in the methodology are not applicable for data mining projects • Difficult scaling of team sizes in data mining projects 12.02.2014 Agile Data Mining with Data Vault 2.0 12
  • 13.
    CRISP-DM Own Representationin accordance with Chapman, et al. , 2000 12.02.2014 Agile Data Mining with Data Vault 2.0 13
  • 14.
    PROCESS MODEL Prozessmodell– VGH Kundensegmentierung ivv KTC D & P Daten in Data Vault Modell speichern Daten abziehen Algorithmus auswählen Segmentierung ausführen Ergebnis erzielt? Ja Ergebnis präsentieren Ergebnis ok? Ende Ja Start Gütefunktion erarbeiten SQL-Query erstellen Relevante VN-Attribute ermitteln Nein Formel ok? Ja Nein Algorithmen erforschen Nein Geeigneter Algorithmus gefunden? Ja Nein 12.02.2014 Agile Data Mining with Data Vault 2.0 14
  • 15.
    RAPIDMINER  Java-based data mining software  One of the most widely used data mining tools  Offers  Environment fo r control flow  Large number of algorithms  Large choice of data sources Overall CorporaTE Consultants Academics NGO / GOV'T © 2012 Rexer AnalYTICS 12.02.2014 Agile Data Mining with Data Vault 2.0 15
  • 16.
    EXAMPLE Agile DataMining with DataVault 2.0 Agile 12.02.2014 Data Mining with Data Vault 2.0 16
  • 17.
    EXAMPLE  AdventureWorks-Database  Scenario:  Advertising campaign for a new bike  Identification of the target group  Solution:  Decision Tree  Identify relevant attributes in several iterations Lachev, 2005, p. 238ff Simple Example 12.02.2014 Agile Data Mining with Data Vault 2.0 17
  • 18.
    Agile Data Miningwith Data Vault 2.0 18 10066 Records Attribute Marital Status Gender Yearly Income Total Children Education Number Cars Owned Commute Distance Occupation House Owner Flag Age
  • 19.
    ITERATION 1: DATAVAULT 2.0 MODEL English Education Numbers Cars Owned Gender Marital Status Sat Customer Hub Customer Customer Key Commute Distance Age House Owner Flag English Occupation Sat Category Product Category 12.02.2014 Agile Data Mining with Data Vault 2.0 19
  • 20.
    ITERATION 1: RAPIDMINERPROCESS Data Gathering Data preparation Modeling 12.02.2014 Agile Data Mining with Data Vault 2.0 20
  • 21.
    ITERATION 1: DECISIONTREEMODEL 12.02.2014 Agile Data Mining with Data Vault 2.0 21
  • 22.
    ITERATION 1: RESULTS 12.02.2014 Agile Data Mining with Data Vault 2.0 22
  • 23.
    ITERATION 2: DATAVAULT 2.0 MODEL English Education Numbers Cars Owned Gender Marital Status Sat Customer Hub Customer Sat Customer Income Customer Key Commute Distance Age House Owner Flag English Occupation Sat Customer Children Sat Category Total Children Yearly Income Product Category 12.02.2014 Agile Data Mining with Data Vault 2.0 23
  • 24.
    ITERATION 2: RAPIDMINERPROCESS Data Gathering Preparation Modeling 12.02.2014 Agile Data Mining with Data Vault 2.0 24
  • 25.
    ITERATION 2: RESULTS +4.01% 12.02.2014 Agile Data Mining with Data Vault 2.0 25
  • 26.
    ITERATION 3: DATAVAULT 2.0 MODEL English Education Numbers Cars Owned Gender Marital Status Sat Customer Hub Customer Sat Customer Income Customer Key Commute Distance Age House Owner Flag English Occupation Sat Customer Children Sat Category Total Children Yearly Income Product Category Commute Distance Miles CSat Customer Distance 12.02.2014 Agile Data Mining with Data Vault 2.0 26
  • 27.
    ITERATION 3: RAPIDMINERPROCESS Data Gathering Preparation Modeling 12.02.2014 Agile Data Mining with Data Vault 2.0 27
  • 28.
    ITERATION 3: RESULTS +0.12% 12.02.2014 Agile Data Mining with Data Vault 2.0 28
  • 29.
    CONCLUSIONS Agile DataMining with DataVault 2.0 Agile 12.02.2014 Data Mining with Data Vault 2.0 29
  • 30.
    CONCLUSIONS  DataVault is a flexible data model, with good support for agile project methodology  DataVault is not an additional hurdle in data mining projects  Additional attributes can be added at any time during the project, in an incremental fashion Business Vault: transparent data processing 12.02.2014 Agile Data Mining with Data Vault 2.0 30
  • 31.
    FURTHER INFORMATION Appears 2015 Available Www.doerffler.com WWW.datavault.de Www.learndatavault.com Appears 2015 12.02.2014 Agile Data Mining with Data Vault 2.0 31
  • 32.
    Give us feedback Agile Data Mining with Data Vault 2.0 32 Http://goo.gl/LGO4ze Source:Vasilijonline.com 12.02.2014

Editor's Notes

  • #2 In This Slides Only The logos Replace. To Try it out New Design /Discuss Have We No Time
  • #9 Short On the DM Project In The VGH Comment. On the BI Spectrum Article Point out Objectives The Project Used Tools. Crisp-DM Used. Etc. GGF. For more Slides Open Name The insurance? No specific requirements Attributes evolve over time "Customer" does not exactly define first Only private clients or companies? Policyholders or vehicle owners? What kinds of contracts? How are "good" customers?
  • #11 Hubs, Left, Satellite Short Explains With VDV. Take a look at In the Folder Sources, There Can You You Use.
  • #18 We can no data and Findings of the VGH present Therefore to avoid AdventureWorks Setup took over from book
  • #19 Short On Adenture Works DW Comment Background Information Model of the Relevant Tables 25 Attributes, 500k Records
  • #20 On the First DV model Comment.
  • #21 Demo in Rapidminer Also On Measures Comment (Accuracy, Or Precision/recall).  On Best Graphically In Rm Represent.
  • #23 Scatter Matrix Confusion matrix (performance matrix).
  • #24 On the Changes The DV Model Comment. Show As The Then Looks like.  Changes Comprehensible Make (On Animations)
  • #25 Demo in Rapidminer Also On Measures Comment (Accuracy, Or Precision/recall).  On Best Graphically In Rm Represent.
  • #27 On the Changes The DV Model Comment. Show As The Then Looks like.  Changes Comprehensible Make (On Animations)
  • #28 Demo in Rapidminer Also On Measures Comment (Accuracy, Or Precision/recall).  On Best Graphically In Rm Represent.
  • #31 What Are The Benefits From Approach? Reference The VGH Project Take, But Also On the demo
  • #33 TBC: Link Revise (Make I)