SlideShare a Scribd company logo
1 of 22
CS 594 1
CS583 – Data Mining and Text
Mining
Course Web Page
http://www.cs.uic.edu/~liub/teach/cs583-spring-
05/cs583.html
CS 594 2
General Information
 Instructor: Bing Liu
 Email: liub@cs.uic.edu
 Tel: (312) 355 1318
 Office: SEO 931
 Course Call Number: 19696
 Lecture times:
 3:30pm – 4:45pm, Tuesday and Thursday
 Room: 208 GH
 Office hours: 3:30pm - 5:00pm Monday (or by
appointment)
CS 594 3
Course structure
 The course has three parts:
 Lectures - Introduction to the main topics
 Research Paper Presentation

Students read papers, and present in class
 Programming projects

2 programming assignments.

To be demonstrated to me
 Lecture slides and other relevant information will
be made available at the course web site
CS 594 4
Paper presentation
 2 people in a group.
 Each group reads one paper and gives a
in-class presentation of the paper.
 Every member should actively participate
in the presentation.
 Marks will be given individually.
 Presentation duration to be determined.
CS 594 5
Programming projects
 Two programming projects
 To be done individually by each student
 You will demonstrate your programs to me
to show that they work
 You will be given a sample dataset
 The data to be used in the demo will be
different from the sample data
CS 594 6
Grading
 Final Exam: 40%
 Midterm: 30%
 1 midterm
 Programming projects: 20%
 2 programming assignments.
 Research paper presentation: 10%
CS 594 7
Prerequisites
 Knowledge of probability and algorithms
CS 594 8
Teaching materials
 Main Text
 Data mining: Concepts and Techniques, by Jiawei
Han and Micheline Kamber, Morgan Kaufmann
Publishers, ISBN 1-55860-489-8.
 References:
 Machine Learning, by Tom M. Mitchell, McGraw-Hill,
ISBN 0-07-042807-7
 Modern Information Retrieval, by Ricardo Baeza-
Yates and Berthier Ribeiro-Neto, Addison Wesley,
ISBN 0-201-39829-X
 Other reading materials (the list will be given to
you later)
 Data mining resource site: KDnuggets Directory
CS 594 9
Topics
 Data pre-processing
 Association rule mining
 Classification (supervised learning)
 Clustering (unsupervised learning)
 Introduction to some other data mining tasks
 Post-processing of data mining results
 Text mining
 Partial/Semi-supervised learning
 Introduction to Web mining
CS 594 10
Any questions and suggestions?
 Your feedback is most welcome!
 I need it to adapt the course to your needs.
 Share your questions and concerns with the
class – very likely others may have the same.
 No pain no gain – no magic for data mining.
 The more you put in, the more you get
 Your grades are proportional to your efforts.
CS 594 11
Rules and Policies
 Statute of limitations: No grading questions or complaints, no matter
how justified, will be listened to one week after the item in question has
been returned.
 Cheating: Cheating will not be tolerated. All work you submitted must
be entirely your own. Any suspicious similarities between students' work
(this includes, exams and program) will be recorded and brought to the
attention of the Dean. The MINIMUM penalty for any student found
cheating will be to receive a 0 for the item in question, and dropping
your final course grade one letter. The MAXIMUM penalty will be
expulsion from the University.
 MOSS: Sharing code with your classmates is not acceptable!!! All
programs will be screened using the Moss (Measure of Software
Similarity.) system.
 Late assignments: Late assignments will not, in general, be
accepted. They will never be accepted if the student has not made
special arrangements with me at least one day before the assignment is
due. If a late assignment is accepted it is subject to a reduction in score
as a late penalty.
CS 594 12
Introduction to Data Mining
CS 594 13
What is data mining?
 Data mining is also called knowledge
discovery and data mining (KDD)
 Data mining is
 extraction of useful patterns from data
sources, e.g., databases, texts, web,
image.
 Patterns must be:
 valid, novel, potentially useful,
understandable
CS 594 14
Example of discovered
patterns
 Association rules:
“80% of customers who buy cheese and milk
also buy bread, and 5% of customers buy
all of them together”
Cheese, Milk→ Bread [sup =5%,
confid=80%]
CS 594 15
Main data mining tasks
 Classification:
mining patterns that can classify future data
into known classes.
 Association rule mining
mining any rule of the form X → Y, where X
and Y are sets of data items.
 Clustering
identifying a set of similarity groups in the
data
CS 594 16
Main data mining tasks (cont …)
 Sequential pattern mining:
A sequential rule: A→ B, says that event A
will be immediately followed by event B
with a certain confidence
 Deviation detection:
discovering the most significant changes in
data
 Data visualization: using graphical
methods to show patterns in data.
CS 594 17
Why is data mining important?
 Rapid computerization of businesses
produce huge amount of data
 How to make best use of data?
 A growing realization: knowledge
discovered from data can be used for
competitive advantage.
CS 594 18
Why is data mining
necessary?
 Make use of your data assets
 There is a big gap from stored data to
knowledge; and the transition won’t occur
automatically.
 Many interesting things you want to find
cannot be found using database queries
“find me people likely to buy my products”
“Who are likely to respond to my promotion”
CS 594 19
Why data mining now?
 The data is abundant.
 The data is being warehoused.
 The computing power is affordable.
 The competitive pressure is strong.
 Data mining tools have become
available
CS 594 20
Related fields
 Data mining is an emerging multi-
disciplinary field:
Statistics
Machine learning
Databases
Information retrieval
Visualization
etc.
CS 594 21
Data mining (KDD) process
 Understand the application domain
 Identify data sources and select target
data
 Pre-process: cleaning, attribute selection
 Data mining to extract patterns or models
 Post-process: identifying interesting or
useful patterns
 Incorporate patterns in real world tasks
CS 594 22
Data mining applications
 Marketing, customer profiling and retention,
identifying potential customers, market
segmentation.
 Fraud detection
identifying credit card fraud, intrusion detection
 Text and web mining
 Scientific data analysis
 Any application that involves a large
amount of data …

More Related Content

Similar to Cs583 intro

Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
butest
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
butest
 

Similar to Cs583 intro (20)

Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdf
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 
Main principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningMain principles of Data Science and Machine Learning
Main principles of Data Science and Machine Learning
 
Data mining
Data miningData mining
Data mining
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 

More from Borseshweta (7)

Cs621 lect7-si-13aug07
Cs621 lect7-si-13aug07Cs621 lect7-si-13aug07
Cs621 lect7-si-13aug07
 
Cs583 link-analysis
Cs583 link-analysisCs583 link-analysis
Cs583 link-analysis
 
Cs583 information-integration
Cs583 information-integrationCs583 information-integration
Cs583 information-integration
 
Cs583 info-retrieval
Cs583 info-retrievalCs583 info-retrieval
Cs583 info-retrieval
 
Cs583 association-sequential-patterns
Cs583 association-sequential-patternsCs583 association-sequential-patterns
Cs583 association-sequential-patterns
 
Cs511 data-extraction
Cs511 data-extractionCs511 data-extraction
Cs511 data-extraction
 
Web crawlingchapter
Web crawlingchapterWeb crawlingchapter
Web crawlingchapter
 

Recently uploaded

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 

Cs583 intro

  • 1. CS 594 1 CS583 – Data Mining and Text Mining Course Web Page http://www.cs.uic.edu/~liub/teach/cs583-spring- 05/cs583.html
  • 2. CS 594 2 General Information  Instructor: Bing Liu  Email: liub@cs.uic.edu  Tel: (312) 355 1318  Office: SEO 931  Course Call Number: 19696  Lecture times:  3:30pm – 4:45pm, Tuesday and Thursday  Room: 208 GH  Office hours: 3:30pm - 5:00pm Monday (or by appointment)
  • 3. CS 594 3 Course structure  The course has three parts:  Lectures - Introduction to the main topics  Research Paper Presentation  Students read papers, and present in class  Programming projects  2 programming assignments.  To be demonstrated to me  Lecture slides and other relevant information will be made available at the course web site
  • 4. CS 594 4 Paper presentation  2 people in a group.  Each group reads one paper and gives a in-class presentation of the paper.  Every member should actively participate in the presentation.  Marks will be given individually.  Presentation duration to be determined.
  • 5. CS 594 5 Programming projects  Two programming projects  To be done individually by each student  You will demonstrate your programs to me to show that they work  You will be given a sample dataset  The data to be used in the demo will be different from the sample data
  • 6. CS 594 6 Grading  Final Exam: 40%  Midterm: 30%  1 midterm  Programming projects: 20%  2 programming assignments.  Research paper presentation: 10%
  • 7. CS 594 7 Prerequisites  Knowledge of probability and algorithms
  • 8. CS 594 8 Teaching materials  Main Text  Data mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber, Morgan Kaufmann Publishers, ISBN 1-55860-489-8.  References:  Machine Learning, by Tom M. Mitchell, McGraw-Hill, ISBN 0-07-042807-7  Modern Information Retrieval, by Ricardo Baeza- Yates and Berthier Ribeiro-Neto, Addison Wesley, ISBN 0-201-39829-X  Other reading materials (the list will be given to you later)  Data mining resource site: KDnuggets Directory
  • 9. CS 594 9 Topics  Data pre-processing  Association rule mining  Classification (supervised learning)  Clustering (unsupervised learning)  Introduction to some other data mining tasks  Post-processing of data mining results  Text mining  Partial/Semi-supervised learning  Introduction to Web mining
  • 10. CS 594 10 Any questions and suggestions?  Your feedback is most welcome!  I need it to adapt the course to your needs.  Share your questions and concerns with the class – very likely others may have the same.  No pain no gain – no magic for data mining.  The more you put in, the more you get  Your grades are proportional to your efforts.
  • 11. CS 594 11 Rules and Policies  Statute of limitations: No grading questions or complaints, no matter how justified, will be listened to one week after the item in question has been returned.  Cheating: Cheating will not be tolerated. All work you submitted must be entirely your own. Any suspicious similarities between students' work (this includes, exams and program) will be recorded and brought to the attention of the Dean. The MINIMUM penalty for any student found cheating will be to receive a 0 for the item in question, and dropping your final course grade one letter. The MAXIMUM penalty will be expulsion from the University.  MOSS: Sharing code with your classmates is not acceptable!!! All programs will be screened using the Moss (Measure of Software Similarity.) system.  Late assignments: Late assignments will not, in general, be accepted. They will never be accepted if the student has not made special arrangements with me at least one day before the assignment is due. If a late assignment is accepted it is subject to a reduction in score as a late penalty.
  • 12. CS 594 12 Introduction to Data Mining
  • 13. CS 594 13 What is data mining?  Data mining is also called knowledge discovery and data mining (KDD)  Data mining is  extraction of useful patterns from data sources, e.g., databases, texts, web, image.  Patterns must be:  valid, novel, potentially useful, understandable
  • 14. CS 594 14 Example of discovered patterns  Association rules: “80% of customers who buy cheese and milk also buy bread, and 5% of customers buy all of them together” Cheese, Milk→ Bread [sup =5%, confid=80%]
  • 15. CS 594 15 Main data mining tasks  Classification: mining patterns that can classify future data into known classes.  Association rule mining mining any rule of the form X → Y, where X and Y are sets of data items.  Clustering identifying a set of similarity groups in the data
  • 16. CS 594 16 Main data mining tasks (cont …)  Sequential pattern mining: A sequential rule: A→ B, says that event A will be immediately followed by event B with a certain confidence  Deviation detection: discovering the most significant changes in data  Data visualization: using graphical methods to show patterns in data.
  • 17. CS 594 17 Why is data mining important?  Rapid computerization of businesses produce huge amount of data  How to make best use of data?  A growing realization: knowledge discovered from data can be used for competitive advantage.
  • 18. CS 594 18 Why is data mining necessary?  Make use of your data assets  There is a big gap from stored data to knowledge; and the transition won’t occur automatically.  Many interesting things you want to find cannot be found using database queries “find me people likely to buy my products” “Who are likely to respond to my promotion”
  • 19. CS 594 19 Why data mining now?  The data is abundant.  The data is being warehoused.  The computing power is affordable.  The competitive pressure is strong.  Data mining tools have become available
  • 20. CS 594 20 Related fields  Data mining is an emerging multi- disciplinary field: Statistics Machine learning Databases Information retrieval Visualization etc.
  • 21. CS 594 21 Data mining (KDD) process  Understand the application domain  Identify data sources and select target data  Pre-process: cleaning, attribute selection  Data mining to extract patterns or models  Post-process: identifying interesting or useful patterns  Incorporate patterns in real world tasks
  • 22. CS 594 22 Data mining applications  Marketing, customer profiling and retention, identifying potential customers, market segmentation.  Fraud detection identifying credit card fraud, intrusion detection  Text and web mining  Scientific data analysis  Any application that involves a large amount of data …