Upcoming SlideShare
×

# Data Mining

1,771 views

Published on

High-level overview of common data mining techniques.

Published in: Technology, Education
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,771
On SlideShare
0
From Embeds
0
Number of Embeds
301
Actions
Shares
0
101
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Data Mining

1. 1. Data Mining 2114.409: Creative Research PracticeHTTP://WWW.FLICKR.COM/PHOTOS/CPBILLS/2888144434/
2. 2. ReﬂectionHomework 2 Status? AuditorsConcerns Programming What can we build HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
3. 3. Course Outline1. Foundations 3. PrototypingIntroduction CrawlingSurvey Methods / Data Mining Text MiningVisualization and Analysis To be determined (TBD)Social Mechanics Project Update2. Methods 4. ReﬁnementCreativity and Brainstorming TBD x3Prototyping Project PresentationsProject Management Reﬂection
4. 4. Data Mining OverviewHow do I see andcommunicate answers? Lecture 2, HW2What questions shouldI ask of the data? Today, HW3 on-demandHow do I clean andprocess the data?How do I gather Later?meaningful data?
5. 5. THIS LECTURE BARELY SCRATCHES THESURFACE OF INFORMATION VISUALIZATION.IT IS A JUMPING OFF POINT.
6. 6. Data Mining OverviewHow do I see andcommunicate answers? Lecture 2, HW2What questions shouldI ask of the data? Today, HW3 on-demandHow do I clean andprocess the data?How do I gather Later?meaningful data?
7. 7. Data ExplorationOften the questions are not obvious and it’s useful to look at the data for inspiration.
8. 8. Exploration: Data Cubes Basic operations: ‣ Group (how to chunk data) ‣ Summarize (sum, mean, etc.) ‣ Filter (which rows to include)
9. 9. Pivot Table Tutorial
10. 10. Data Mining OverviewHow do I see andcommunicate answers? Lecture 2, HW2What questions shouldI ask of the data? Today, HW3 on-demandHow do I clean andprocess the data?How do I gather Later?meaningful data?
11. 11. ObjectivesDATA MINING EACH TECHNIQUE‣ What is it? ‣ What is it doing?‣ How does it relate to ‣ Why is it useful? collective intelligence? ‣ How might you apply it?
12. 12. Are there patterns in the data?HUMAN VISUAL vs. COMPUTER SYSTEM ANALYSIS
13. 13. Why might we prefer analysis? LABOR ACCURACYToo many pictures to look at. Can test for statistical signiﬁcance, etc.Don’t know which areinteresting. Some patterns don’t visualize easily. HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
14. 14. Common Techniques Clustering Classiﬁcation & Regression Association Rules Anomaly DetectionHTTP://WWW.FLICKR.COM/PHOTOS/EXPLORATIVEAPPROACH/3866580875/
15. 15. ClusteringFind naturalgroupings inthe dataOrganize data into classes:‣ high intra-class similarity‣ low inter-class similarity
16. 16. Clustering Input Data Output Clusters Points Hard OR OR SoftSimilarities OR [ # of clusters ] Hierarchical
17. 17. K-Means54 k132 k21 k30 0 1 2 3 4 5
18. 18. K-Means54 k132 k21 k30 0 1 2 3 4 5
19. 19. K-Means54 k132 k31 k20 0 1 2 3 4 5
20. 20. K-Means54 k132 k31 k20 0 1 2 3 4 5
21. 21. K-Means 5expression in condition 2 4 k1 3 2 k2 1 k3 0 0 1 2 3 4 5 expression in condition 1
22. 22. Classiﬁcation RegressionLearn to map objects to Learn map objects tocategories continuous variables
23. 23. Typical ApplicationsSpeech Handwriting OCR
24. 24. ClassiﬁcationObservations X Learn f(x) = yLabels Y Y = gender MaleFemale X = height
25. 25. The Whole Process Data Set Featurization Featurized Random Split (e.g. 90/10)Training Data Test Data Training Model Evaluation Results
26. 26. Real-World ClassiﬁcationObservations X Y - 100’s of labels X - 1000’s of featuresLabels Y N - Millions of examples ? - Not all data is labeled ? - Some data is mis-labeled f(x) = y Model spatial context Model temporal context
27. 27. Association RulesLearn interestingrelations in the data = proportion of events in which X occurs
28. 28. Anomaly Detection Detect strange events in the data
29. 29. Homework: Data Mining1. Form groups!2. Choose a Collective Intelligence topic from Lecture 1, or propose similar.3. Make a list of data sources that might provide insights to that topic.4. Propose a set of meaningful questions about the data based on your intuition.5. How would you have to clean/process your data to start answering those questions?6. Consider clustering, association rules, anomaly detection, classiﬁcation. For each technique, how might you apply it to the data and what would it show?7. Document your work and be prepared to present. HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/
30. 30. Data Mining OverviewHow do I see andcommunicate answers? Lecture 2, HW2What questions shouldI ask of the data? Today, HW3 on-demandHow do I clean andprocess the data?How do I gather Later?meaningful data?
31. 31. Guest Lecture
32. 32. Feedback