Society and everyone: news, digital cameras, YouTube, forums, blogs, Google & Co
We are drowning in data, but starving for knowledge !
Avoid data tombs
“ Necessity is the mother of invention” — Data mining — Automated analysis of massive data sets.
What is Data Mining?
Data mining (knowledge discovery from data)
Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful ) patterns or knowledge from huge amount of data
Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
Are simple search engines data mining? Are queries data mining? Are expert systems data mining?
Knowledge Discovery (KDD) Process Data sources Data Cleaning Data Warehouse Data Mining Knowledge Pattern Evaluation Selection Data Integration Task-relevant Data
Data Mining and Business Intelligence Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems Quantity of data
Data Mining: confluence of multiple disciplines Data Mining Database Technology Statistics Machine Learning Pattern Recognition Algorithms Other Disciplines Visualization
Why Data Mining?
Why is Data Mining so complex? A matter of data dimensions
Tremendous amount of data
Walmart – Customer buying patterns – a data warehouse 7.5 Terabytes large in 1995
VISA – Detecting credit card interoperability issues – 6800 payment transactions per second
High-dimensionality of data
Many dimensions to be combined together
Data cube example: time, location, product sales
High complexity of data
Time-series data, temporal data, sequence data
Structure data, graphs, social networks and multi-linked data
Spatial, spatiotemporal, multimedia, text and Web data
What does Data Mining provide me with? (1)
Multidimensional concept description : Characterization and discrimination
Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions
Characterization describes things in the same class, discrimination describes how to separate different classes
Frequent patterns , association, correlation vs. causality
Wine Spaghetti [0.3% of all basket cases, 75% of cases when tomato sauce is bought]
Is this correlation or not?
What does Data Mining provide me with? (2)
Classification and prediction
Construct models (functions) that describe and distinguish classes or concepts for future prediction
E.g., classify countries based on climate , or classify cars based on gas mileage
Predict some unknown or missing numerical values
Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns