8. What is data mining
• Given lots of data
• Discover patterns and make predictions that
are
– Valid
– Useful
– Unexpected
– Understandable
Viet-‐Trung
Tran
8
9. Data mining tasks
• Descriptive methods
– Find human-interpretable patterns that describe
data
• Clustering
• Predictive methods
– Use some variables to predict the unknown or
future values of other variables
• Recommender systems
Viet-‐Trung
Tran
9
10. Meaningfulness of analytic
answers
• Risk of "data mining" is that the discover is
meaningless
• Bonferroni's principle
– An algorithm or method we think is useful for
finding a particular set of data actually returns
more false positives
Viet-‐Trung
Tran
10
12. Data mining cultures
• Overlap with
– Database: large scale data, simple queries
– Machine learning: Small data, complex models
– CS theory: (Randomized) algorithms
• Different cultures
– To DB guys: extreme form of analytic
processing
– To ML guys: inference of models (A conclusion
reached on the basis of evidence and
reasoning)
Viet-‐Trung
Tran
12
13. What will be learn
• Mine different types of data
– High dimensional
– Graph
– Infinite/never-ending
– Labeled
• Use different models of computation
– Batch processing
– Stream
Viet-‐Trung
Tran
13