1. Data Mining
2114.409: Creative Research Practice
HTTP://WWW.FLICKR.COM/PHOTOS/CPBILLS/2888144434/
2. Reflection
Homework 2
Status?
Auditors
Concerns
Programming
What can we build
HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
3. Course Outline
1. Foundations 3. Prototyping
Introduction Crawling
Survey Methods / Data Mining Text Mining
Visualization and Analysis To be determined (TBD)
Social Mechanics Project Update
2. Methods 4. Refinement
Creativity and Brainstorming TBD x3
Prototyping Project Presentations
Project Management Reflection
4. Data Mining Overview
How do I see and
communicate answers?
Lecture 2, HW2
What questions should
I ask of the data?
Today, HW3
on-demand
How do I clean and
process the data?
How do I gather
Later?
meaningful data?
5. THIS LECTURE BARELY SCRATCHES THE
SURFACE OF INFORMATION VISUALIZATION.
IT IS A JUMPING OFF POINT.
6. Data Mining Overview
How do I see and
communicate answers?
Lecture 2, HW2
What questions should
I ask of the data?
Today, HW3
on-demand
How do I clean and
process the data?
How do I gather
Later?
meaningful data?
7. Data Exploration
Often the questions are not obvious and it’s
useful to look at the data for inspiration.
8. Exploration: Data Cubes
Basic operations:
‣ Group
(how to chunk data)
‣ Summarize
(sum, mean, etc.)
‣ Filter
(which rows to include)
10. Data Mining Overview
How do I see and
communicate answers?
Lecture 2, HW2
What questions should
I ask of the data?
Today, HW3
on-demand
How do I clean and
process the data?
How do I gather
Later?
meaningful data?
11. Objectives
DATA MINING EACH TECHNIQUE
‣ What is it? ‣ What is it doing?
‣ How does it relate to ‣ Why is it useful?
collective intelligence?
‣ How might you apply it?
12. Are there patterns in the data?
HUMAN VISUAL vs.
COMPUTER
SYSTEM ANALYSIS
13. Why might we prefer analysis?
LABOR ACCURACY
Too many pictures to look at. Can test for statistical
significance, etc.
Don’t know which are
interesting. Some patterns don’t
visualize easily.
HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
14. Common Techniques
Clustering Classification & Regression
Association Rules Anomaly Detection
HTTP://WWW.FLICKR.COM/PHOTOS/EXPLORATIVEAPPROACH/3866580875/
25. The Whole Process
Data Set
Featurization
Featurized
Random Split (e.g. 90/10)
Training Data Test Data
Training
Model
Evaluation
Results
26. Real-World Classification
Observations X Y - 100’s of labels
X - 1000’s of features
Labels Y N - Millions of examples
? - Not all data is labeled
? - Some data is mis-labeled
f(x) = y Model spatial context
Model temporal context
29. Homework: Data Mining
1. Form groups!
2. Choose a Collective Intelligence topic from
Lecture 1, or propose similar.
3. Make a list of data sources that might
provide insights to that topic.
4. Propose a set of meaningful questions about
the data based on your intuition.
5. How would you have to clean/process your
data to start answering those questions?
6. Consider clustering, association rules,
anomaly detection, classification. For each
technique, how might you apply it to the
data and what would it show?
7. Document your work and be prepared to
present.
HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/
30. Data Mining Overview
How do I see and
communicate answers?
Lecture 2, HW2
What questions should
I ask of the data?
Today, HW3
on-demand
How do I clean and
process the data?
How do I gather
Later?
meaningful data?