WELCOME TO THE SEMINAR  ON DATA MINING By  Ujjwal Kumar MSC 635
Overview <ul><li>Introduction </li></ul><ul><li>Why Data Mining? </li></ul><ul><li>Goals of Data Mining </li></ul><ul><li>...
Introduction <ul><li>Data Mining – extracting or “mining”   knowledge from large amounts of data. </li></ul><ul><li>It’s a...
Why Data mining? <ul><li>The explosive growth in data: from   terabytes to petabytes </li></ul><ul><li>Data collection and...
Goals of Data Mining <ul><li>prediction : how certain attributes   within the data will behave in the   future. </li></ul>...
Architecture of Data Mining <ul><li>The architecture of a typical data mining   system may have the following major   comp...
User Interface Pattern Evolution Data Mining Engine Database or Data Warehouse Server Database Data Warehouse World Wide W...
Data Mining – On what kind of data? <ul><li>Data Mining should be applicable to any   kind of data repository, as well as ...
Data Mining – On what kind of data? <ul><li>Relational Databases </li></ul><ul><li>Transactional Databases </li></ul><ul><...
Data Mining techniques <ul><li>Classification </li></ul><ul><li>Clustering </li></ul><ul><li>Association </li></ul>
Classification <ul><li>Classification  is the process of finding   a model that describes and distinguishes   data classes...
Clustering <ul><li>In general, the class label are not   present in the training data simply they   are not known to begin...
Clustering <ul><li>Group data in to clusters </li></ul><ul><li>Similar data is grouped in same   cluster </li></ul><ul><li...
A 1% support means that 1% of all of the transactions under analysis showed that computer and software were purchased toge...
Advantages of Data Mining <ul><li>Provides new knowledge from existing   data: </li></ul><ul><ul><li>Public databases </li...
Uses of Data Mining <ul><li>Sales/ Marketing analysis </li></ul><ul><li>Marketing strategies </li></ul><ul><li>Advertiseme...
Uses of Data Mining <ul><li>DNA and Bio-data analysis </li></ul><ul><li>Effectiveness of treatments </li></ul><ul><li>Iden...
Data Mining tools <ul><li>Intelligent Miner, IBMSPSS modeler </li></ul><ul><li>Enterprise Miner </li></ul><ul><li>ODM </li...
Conclusion <ul><li>Data mining is a “Decision Support”   process in which we search for patterns   of information in data....
References <ul><li>“ Data Mining Concepts & Techniques” by   Jiawei Han and Micheline Kamber </li></ul><ul><li>http://www....
Thank You
Upcoming SlideShare
Loading in...5
×

Data mining

2,172

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,172
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
265
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data mining

  1. 1. WELCOME TO THE SEMINAR ON DATA MINING By Ujjwal Kumar MSC 635
  2. 2. Overview <ul><li>Introduction </li></ul><ul><li>Why Data Mining? </li></ul><ul><li>Goals of Data Mining </li></ul><ul><li>Architecture of Data Mining </li></ul><ul><li>Data Mining – On what kind of data? </li></ul><ul><li>Data Mining techniques </li></ul><ul><li>Advantages of Data Mining </li></ul><ul><li>Data mining tools/software </li></ul><ul><li>Conclusion </li></ul><ul><li>References </li></ul>
  3. 3. Introduction <ul><li>Data Mining – extracting or “mining” knowledge from large amounts of data. </li></ul><ul><li>It’s also popularly referred to KDD (Knowledge Discovery from Data) is the automated of convenient extraction of patterns representing knowledge implicitly stored or captured in large databases, data warehouse, the web, other massive information repositories, or data streams. </li></ul>
  4. 4. Why Data mining? <ul><li>The explosive growth in data: from terabytes to petabytes </li></ul><ul><li>Data collection and data availability </li></ul><ul><li>Major sources of abundant data ( Business, Science, Society and everyone) </li></ul><ul><li>We have find a more effective way to use these data in decision support process than just using traditional query languages. </li></ul>
  5. 5. Goals of Data Mining <ul><li>prediction : how certain attributes within the data will behave in the future. </li></ul><ul><li>Identification : identify the existence of an item, an event, an activity. </li></ul><ul><li>Classification : partition the data into categories. </li></ul><ul><li>Optimization : optimize the use of limited resources. </li></ul>
  6. 6. Architecture of Data Mining <ul><li>The architecture of a typical data mining system may have the following major components: </li></ul><ul><li>Database, Data warehouse, WWW, Other information repository </li></ul><ul><li>Database or Data warehouse server </li></ul><ul><li>Data mining engine </li></ul><ul><li>Pattern evolution model </li></ul><ul><li>User interface </li></ul>
  7. 7. User Interface Pattern Evolution Data Mining Engine Database or Data Warehouse Server Database Data Warehouse World Wide Web Other Info Repositories Knowledge Base Data cleaning, integration and selection
  8. 8. Data Mining – On what kind of data? <ul><li>Data Mining should be applicable to any kind of data repository, as well as to transient data, such as data streams. </li></ul><ul><li>The challenges and techniques of mining may differ for each of the repository systems. </li></ul>
  9. 9. Data Mining – On what kind of data? <ul><li>Relational Databases </li></ul><ul><li>Transactional Databases </li></ul><ul><li>Temporal Databases </li></ul><ul><li>Object - Relational Databases </li></ul><ul><li>Spatial Databases </li></ul><ul><li>Text Databases and Multimedia DB </li></ul><ul><li>Legacy Databases </li></ul>
  10. 10. Data Mining techniques <ul><li>Classification </li></ul><ul><li>Clustering </li></ul><ul><li>Association </li></ul>
  11. 11. Classification <ul><li>Classification is the process of finding a model that describes and distinguishes data classes or concepts </li></ul><ul><li>Classify the new item and identify to which class it belongs </li></ul>
  12. 12. Clustering <ul><li>In general, the class label are not present in the training data simply they are not known to begin with </li></ul><ul><li>The objects are clustered or grouped based on the principle of maximizing the intraclass similarity and minimizing the intraclass similarity. </li></ul><ul><li>Example: Insurance company could use clustering to group clients by their age, location and types of insurance purchased. </li></ul>
  13. 13. Clustering <ul><li>Group data in to clusters </li></ul><ul><li>Similar data is grouped in same cluster </li></ul><ul><li>Dissimilar data is grouped in same cluster </li></ul>
  14. 14. A 1% support means that 1% of all of the transactions under analysis showed that computer and software were purchased together. Association rules <ul><li>“ An association algorithm creates rules that describe how often events have occurred together.” </li></ul><ul><li>Example: buys(X, “computer”)=> buys(X, “software”) [support=1%, confidence=50%] </li></ul>Where X is a variable representing a customer. A confidence, or certainty, of 50% means that if a customer buys a computer, there is a 50% chance that they will buy software as well.
  15. 15. Advantages of Data Mining <ul><li>Provides new knowledge from existing data: </li></ul><ul><ul><li>Public databases </li></ul></ul><ul><ul><li>Government sources </li></ul></ul><ul><ul><li>Company Databases </li></ul></ul><ul><li>Old data can be used to develop new knowledge </li></ul><ul><li>New knowledge can be used to improve services or products </li></ul><ul><li>Improvements lead to: </li></ul><ul><ul><li>Bigger profits </li></ul></ul><ul><ul><li>More efficient service </li></ul></ul>
  16. 16. Uses of Data Mining <ul><li>Sales/ Marketing analysis </li></ul><ul><li>Marketing strategies </li></ul><ul><li>Advertisements </li></ul><ul><li>Risk analysis and Management </li></ul><ul><li>Finance and Investment </li></ul><ul><li>Manufacturing and Production </li></ul><ul><li>Text mining (news group, email, documents) and web mining. </li></ul>
  17. 17. Uses of Data Mining <ul><li>DNA and Bio-data analysis </li></ul><ul><li>Effectiveness of treatments </li></ul><ul><li>Identify new drugs </li></ul><ul><li>Fraud detection </li></ul><ul><li>Identify people misusing the system </li></ul><ul><li>Financial transactions </li></ul><ul><li>Customer care </li></ul><ul><li>Identify customer needs </li></ul>
  18. 18. Data Mining tools <ul><li>Intelligent Miner, IBMSPSS modeler </li></ul><ul><li>Enterprise Miner </li></ul><ul><li>ODM </li></ul><ul><li>Ghost Miner </li></ul><ul><li>Rapid Miner </li></ul>
  19. 19. Conclusion <ul><li>Data mining is a “Decision Support” process in which we search for patterns of information in data. </li></ul><ul><li>This technique is used for many types of data. </li></ul><ul><li>Overlaps with machine learning, statistics, artificial intelligence, databases and visualization. </li></ul>
  20. 20. References <ul><li>“ Data Mining Concepts & Techniques” by Jiawei Han and Micheline Kamber </li></ul><ul><li>http://www.oracle.com/ </li></ul><ul><li>http://www.datamininglab.com/ </li></ul><ul><li>http://www.onlinelibrary.wiley.com </li></ul><ul><li>http://www.cs.sjsu.edu/ </li></ul>
  21. 21. Thank You
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×