What is Data Mining ? Mining and discovery of new information in terms of patterns or rules from vast amounts of data.The process of discovering meaningful new correlations, patterns and trends by siftingthrough large amounts of data stored in repositoties, using pattern recognitiontechnologies as well as statical and methematics techniques.
Why we mine Data ? Commercial View Point :- Lots of data is being collected and warehoused . Computers have become cheaper and more powerful. Competitive Pressure is Strong . Scientific View Point :- Data collected and stored at enormous speeds (GB/hour). Traditional techniques infeasible for raw data. Data mining may help scientists.
On what kind of Data...? • Relational databases • Data warehouses • Transactional databases • Advanced database systems: Object-relational Spacial and Temporal Time-series Multimedia, text WWW
What are the goals of Data mining? • Prediction e.g. sales volume, earthquakes • Identification e.g. existence of genes, system intrusions • Classification of different categories e.g. discount seeking shoppers or loyal regular shoppers in a supermarket • Optimization of limited resources such as time, space, money or materials and maximization of outputs such as sales or profits
What are the applications of Data- Mining ?● Marketing ● Finance Analysis of consumer behavior Creditworthiness of clients Advertising campaigns Performance analysis of finance Targeted mailings investments Segmentation of Fraud detection customers, stores, or products● Manufacturing ● Health Care Optimization of resources Discovering patterns in X-ray Optimization of manufacturing images processes Analyzing side effects of drugs Product design based on customer Effectiveness of treatments requirements
What are the presentcommercial tools for Data Mining ? Data to knowledge SAS Oracle data-miner Intelligent miner Clementine
How to build a data mining model? An important concept is that building a mining model is part of a larger process.
1. Defining the problem. Clearly define the business problem.
2. Preparing Data consolidate and clean the data that was identified in the Defining the Problem step.
4.Building Models Before you build a model, you must randomly separate the prepared data into separate training and testing datasets. You use the training dataset to build the model, and the testing dataset to test the accuracy of the model by creating prediction queries.
5. Exploringand validatingmodels Explore the models that you have built and test their effectiveness.
6. Deployingand updating Deploy to a productionmodels environment the models that performed the best.
What are the majorissues in Data-Mining concept ? Mining different kinds of knowledge in databases Interactive mining of knowledge at multiple levels of abstraction Incorporation of background knowledge Data mining query languages and ad-hoc data mining Expression and visualization of data mining results Handling noise and incomplete data Pattern evaluation: the interestingness problem Integration of the discovered knowledge with existing knowledge: A knowledge fusion problem Protection of data security, integrity, and privacy
How will be the future of Data-Mining concept? ● Active research is ongoing Neural Networks Regression Analysis Genetic Algorithms ● Data mining is used in many areas today. We cannot even begin to imagine what the future holds in its womb!