Founder and CTO of Information Frameworks, an author, speaker and world-renowned expert on emerging Information Architectures, Integration and Business Intelligence Technologies.
Author of the best selling book titled,
SAP Business Information Warehouse for SAP, 2000 .
SAP BW Certification Guide , authored by Catherine Roze 2002
Contributing Author, SAP BW Handbook, 2002
Member of Intelligent ERP magazine's board of editors, is a frequent speaker at IT industry conferences including SAP TechEd, ASUG, Oracle Open World, DCI, The ERP World, Data Mining and the Data Warehouse Institute.
25+ years of experience in emerging Information Technology research, development, and management; Information Architectures; Enterprise Application Integration e-business; ERP applications; Data Warehousing; Data Mining; CRM; Internet, Object and Client/Server Technologies and Strategic Consulting.
CRoss Industry Standard Process (CRISP) for Data Mining Data Understanding Data Preparation Data Warehouse Initially will take about 60% to 80% of the data mining project time http://www.crisp-dm.org/ Source: http://www.crisp-dm.org/
Data Mining - Tools and Data Formats Source: http://www.kdnuggets.com/polls/ Domains 57% Flat files 37% Proprietary 27% DBMS
Data Mining Technology Visualization Use human pattern recognition capabilities Statistics Applying statistical techniques to predict Decision Trees Building scripts based on historic data Association Rules (Rule Induction) Reasoning from specific facts to reach a hypothesis Clustering Refers to finding and visualizing groups of facts that were not previously known Neural Networks Learning how to solve problems based on examples K-Nearest Neighbor Classification by looking at similar data Genetic Algorithms Survival of the fittest … T E C H N I Q U E S U S A G E Discover Understand Predict
Utilizing the mining results (on the operational side)
SAPGUI is the Interface to the Data Mining modeling and analysis
No Extensive Data Staging
Modeling a Decision Tree Create a mining model Source: SAP 2 Model c columns 1 Specifying the column parameters 6 Specifying the values in case the original values in the column are to be treated differently Indicating the prediction column 4 Indicating the key column 5 The nature of the column content 3 Data type of the column 7
Modeling a Decision Tree Specify Model Parameters Source: SAP Use portion (%) of the data for training or the whole data set for training 1 Size of the window (such as 10%) The number of repeats with different samples Stop training when the no. of cases under the given node is less than/equal to the specified value 4 Stop training when the accuracy is greater than or equal to the expected accuracy 5 If the tree is too big, prune the tree without violating the expected accuracy 6 Use the information gain threshold to check the relevance 7 3 2
Create a training source and map the model columns Source: SAP 2 Modeling a Decision Tree BW Query Runtime parameters for query Model columns 1 Selected source columns 3 Mapping between model column and source column 4 5
Source: SAP 3 5 Viewing Decision Tree Training Results This decision tree predicts whether the customer has left or is still “on board 1 Chances of a customer leaving is 70.7% if the profession is “LABOURER” 2 Chart shows the distribution at the selected node 28/41 customers are likely to leave 13/41 customers are likely to stay 6 Out of a total of 705 cases, 41 cases are covered under this node 4
Data Mining – Decision Trees Uploaded in BW Then BEX for further Analysis Source: SAP
INFORMATION FRAMEWORKS Technology/Solution Assessment Product Strategy Solution Strategy Product Positioning Competitive Analysis Software product architecture Marketing Strategy Product Performance and Benchmarking Consulting Hardware Configuration Market Research Market Assessment Competitive Analysis Technology due Seminars Webinars Keynotes Panel Moderator Publications Hands-on training Conferences Executive and Senior IT Management Consulting Enterprise Information Architectures (EIA) Business Case Development Information Architecture Application Deployment Architectures implementation Legacy Application Migration Strategies ERP Application deployment strategies Enterprise Applications Integration (EAI) Architectures, Service Modeling and design, EAI technology assessment Tools and Technology Assessment Vendor Selection and Assessment Conference Room Pilot implementation Business Intelligence and Portals Architectures, Methodologies Tool/technology/Vendor assessment and selection Data Warehouse, Data Marts, Analytics, Information Delivery Deployment Architectures Business Intelligence and eBusiness Integration architectures Portals Strategies, Business case, Assessment, Architectures, Modeling, Planning and knowledge Transfer KNOWLEDGE TRANSFER INFORMATION TECHNOLOGY ORGANIZATION SOFTWARE AND SOLUTION VENDORS INFORMATION TECHNOLOGY INVESTORS http://infoframeworks.com
Questions Naeem Hashmi Chief Technology Officer September 10, 2002 Email: email@example.com Web Site: http://infoframeworks.com Tel: 603-432-4550