Your SlideShare is downloading. ×
What is data mining ?
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

What is data mining ?


Published on

general description of data mining, its business context, the differences between data mining and statistics, example of an applicaton

general description of data mining, its business context, the differences between data mining and statistics, example of an applicaton

Published in: Business, Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. The Datamining Garden kick-off workshop June, 19th 2007 Regus Pegasus, Diegem What is data mining ? Johan Blomme Circulation Manager, AMP 1
  • 2. 1. Introduction : “Competing on Analytics” 2
  • 3. • Thomas Davenport : organizations that have built their very business on the ability to collect, analyze and act on data are consistently the leaders in their industry. • The demands of business today are creating an increasing need for access to data and the use of it to maintain a sustainable competitive advantage : – the rapid construction of data-driven analytics : • descriptive statistics ; • predictive modeling and optimization techniques ; – the rapid deployment of knowledge derived from data ; – the need to give end users access to results in a form that helps them gain the insights they need to make critical business decisions. 3
  • 4. Industrial Age Information Age interwoven, collaborative Processes: linear, sequential continuous, rapid Tempo: periodic, slow Assets : intangibles tangibles 4
  • 5. 5
  • 6. 2. Business drivers of data mining 6
  • 7. Time and information drive the information age, and competitiveness will be based on obtaining real-time information and acting on it promptly and effectively. The following changes indicate how to compete in the information age : • more complex business environments due to globalization and deregulation ; • greater impact of change from external causes ; • a power shift from sellers to buyers, rapidly shifting customer demands and subsequent reduced product life cycles ; • constant technology change ; • faster business cycles and temporary competitive advantage ; • the need to explore collaborative strategies ; • constant change at ever-increasing speeds and shrinking strategy time horizons. 7
  • 8. • Technology facilitates data gathering : – e.g. RFID ; – currently : applications mainly in production environment and logistics ; – future possibilities : narrowcasting ; – privacy issues ! 8
  • 9. • Technology transforms the way we live and interact : – ubiquitous access to information is changing the economics of knowledge ; – consumer preferences are becoming more complex and are changing more rapidly – customers will increasingly choose how they would like to interact with organizations and will do only business with componies that meet their interaction needs ; – the customer takes the lead ; – technology changes the behaviour of consumers ; consequently, it is very important to track customer interactions and customer behaviour 9
  • 10. 3. Data mining defined 10
  • 11. • Data mining is the extraction of actionable knowledge from large datasets to acquire and sustain a competitive advantage. • Data mining is about achieving the organization’s goals, not about the maths and the statistics. 11
  • 12. • The introduction of data warehousing in the 90’s resulted in a wider acceptance of data mining : – operational data stored in corporate data warehouses has the potential to be exploited as business intelligence ; – data warehouses are multidimensional structures used for on line analytical processing ; – OLAP : • analyze information about past performance on an aggregate level • verification-based approach : the user develops a hypothesis and then tests the data to prove or disprove the hypothesis – data mining : • prospective data analysis • predicting future trends, allowing businesses to make proactive, knowledge driven decisions Data mining and statistics/OLAP can complement each other : the inductively revealed relationships between variables can be used to formulate hypothesis and the insights gained 12
  • 13. 13
  • 14. • Statistics vs. data mining : – Statistical analysis is primarily concerned with confirmatory data analysis (model fitting) : testing if a proposed model of hypothetical relationships between variables does or does not provide a good explanation of the observed data.  Statistical models are based on assumptions or some theory about relationships between variables and assume a deductive process – Data mining : rather than verifying hypothetical patterns, data mining uses the data itself to detect such patterns.  Data mining : computational algorithms play a much greater role in building model through exploratory data analysis (EDA). The nature of the process is inductive. 14
  • 15. 15
  • 16. optimization business value predictive modeling forecasting alerts query / drill down standard reports degree of intelligence 16
  • 17. The CRISP-DM model is an industry- and application-neutral standard for fitting data mining into the general problem-solving strategy of a business. 17
  • 18. 4. An example of DM The case of demand planning of magazines (AMP) 18
  • 19. Distribution of press products : 2.8 mio copies every night 19
  • 20. Business problem : The market for printed magazines is declining. Key reasons : - advertising is migrating to e-media ; - publishers are not investing in the future of printed magazines at the same rate as they are in in the future of e-media products ; - the young generation is brought up in an e-media world and will be less inclined to read printed products ; - publishers’ drive to reduce costs makes e-media publishing an attractive proposition, since paper, printing and distribution costs can be eliminated. The big issue in single copy sales is that of unsolds. If sales volumes go down, the distribution cost/copy increases, since the overhead of the distribution system have to be spread over fewer magazines, and returns as a proportion of delivered magazines increases (the fee earned by distributors is based on cover prices of magazines and number of copies sold (instead of a cost-to-serve model). 20
  • 21. Objective : How to build an intelligent supply chain to improve supply chain efficiency, reduce costs and increase profits ? 21
  • 22. Internet, WWW, Sales Force Retail Catalog - Mail Kiosks SAP BUSINESS WAREHOUSE Product Planning Suppliers & Development Business Understanding • make-to stock environment • lack of visibility of supply chain, esp. day-to-day demand and stock positions • excessive inventory levels • return rates of + 60 % are not uncommon in our industry => Information is key : integrate internal SC activities of AMP withthose of paterners to gain efficiencies across the supply chain 22
  • 23. the traditional (linear) supply chain 23
  • 24. the intelligent supply chain Publisher Distributor Newsstand • POS Data Sharing Product Flow • Inventory levels • Forecasts • Promotional Activities Information Flow • New Product Introduction • Production & delivery schedules Information & Intelligence Sharing for Effectiveness 1 24
  • 25. Internet, WWW, Sales Force Retail Catalog - Mail Kiosks SAP BUSINESS WAREHOUSE Product Planning Suppliers & Development Business Data Preprocessing Understanding . data normalization . handling missing data 25
  • 26. Internet, WWW, Sales Force Retail Catalog - Mail Kiosks SAP BUSINESS WAREHOUSE Product Planning Suppliers & Development Business Data Develop Understanding Preprocessing Forecast Model . flat sales model . intermittent data modeling . discreta data : low volume model . apply business rules 26
  • 27. Internet, WWW, Sales Force Retail Catalog - Mail Kiosks SAP BUSINESS WAREHOUSE Product Planning Suppliers & Development Business Data Develop Deploy Understanding Preprocessing Forecast Model Forecasts . interpret results : simulation . workflow integration (operations) 27
  • 28. service degree level  monthly titles 28
  • 29. reference period Linear.(reference period) draw regulation Log.(draw regulation) 100,0000 75,0000 % weighted oos 50,0000 R² = 0,0213 25,0000 R² = 0,5696 0 0 25,0000 50,0000 75,0000 100,0000 % unsolds 29
  • 30. Shared visibility across supply chain Improved understanding, forecasting and analysis of consumer demand Improved capability to respond and react to changes Improved stability, predictability and efficiency of supply chain operations Improved Fill Rates Reduced lead times Smoother SC execution Improved on-shelf availability Reduced inventories More efficient processes More effective demand generation Reduction of costs for handling activities returns Increased Reduced Reduced Sales Inventories Costs 30