0
Data Mining                                                    2114.409: Creative Research PracticeHTTP://WWW.FLICKR.COM/P...
ReflectionHomework 2 Status? AuditorsConcerns Programming What can we build                     HTTP://WWW.FLICKR.COM/PHOTO...
Course Outline1. Foundations                 3. PrototypingIntroduction                   CrawlingSurvey Methods / Data Mi...
Data Mining OverviewHow do I see andcommunicate answers?                        Lecture 2, HW2What questions shouldI ask o...
THIS LECTURE BARELY SCRATCHES THESURFACE OF INFORMATION VISUALIZATION.IT IS A JUMPING OFF POINT.
Data Mining OverviewHow do I see andcommunicate answers?                        Lecture 2, HW2What questions shouldI ask o...
Data ExplorationOften the questions are not obvious and it’s useful to look at the data for inspiration.
Exploration: Data Cubes             Basic operations:             ‣ Group               (how to chunk data)             ‣ ...
Pivot Table Tutorial
Data Mining OverviewHow do I see andcommunicate answers?                        Lecture 2, HW2What questions shouldI ask o...
ObjectivesDATA MINING                  EACH TECHNIQUE‣ What is it?                ‣ What is it doing?‣ How does it relate ...
Are there patterns in the data?HUMAN VISUAL   vs.                     COMPUTER  SYSTEM             ANALYSIS
Why might we prefer analysis?         LABOR                       ACCURACYToo many pictures to look at.   Can test for sta...
Common Techniques                       Clustering                              Classification & Regression             Ass...
ClusteringFind naturalgroupings inthe dataOrganize data into classes:‣ high intra-class similarity‣ low inter-class simila...
Clustering         Input Data                  Output Clusters  Points                                           Hard     ...
K-Means54                     k132            k21                              k30    0   1        2        3        4   5
K-Means54                     k132            k21                              k30    0   1        2        3        4   5
K-Means54                         k132                         k31           k20    0   1        2   3        4   5
K-Means54                         k132                         k31           k20    0   1        2   3        4   5
K-Means                            5expression in condition 2                            4                                ...
Classification               RegressionLearn to map objects to   Learn map objects tocategories                continuous v...
Typical ApplicationsSpeech      Handwriting   OCR
ClassificationObservations    X   Learn         f(x) = yLabels          Y                     Y = gender MaleFemale        ...
The Whole Process                     Data Set                                Featurization                   Featurized  ...
Real-World ClassificationObservations   X   Y - 100’s of labels                   X - 1000’s of featuresLabels         Y   ...
Association RulesLearn interestingrelations in the data                        = proportion of events in which X occurs
Anomaly Detection          Detect strange          events in the data
Homework: Data Mining1. Form groups!2. Choose a Collective Intelligence topic from   Lecture 1, or propose similar.3. Make...
Data Mining OverviewHow do I see andcommunicate answers?                        Lecture 2, HW2What questions shouldI ask o...
Guest Lecture
Feedback
Upcoming SlideShare
Loading in...5
×

Data Mining

1,403

Published on

High-level overview of common data mining techniques.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,403
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
100
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Data Mining"

  1. 1. Data Mining 2114.409: Creative Research PracticeHTTP://WWW.FLICKR.COM/PHOTOS/CPBILLS/2888144434/
  2. 2. ReflectionHomework 2 Status? AuditorsConcerns Programming What can we build HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
  3. 3. Course Outline1. Foundations 3. PrototypingIntroduction CrawlingSurvey Methods / Data Mining Text MiningVisualization and Analysis To be determined (TBD)Social Mechanics Project Update2. Methods 4. RefinementCreativity and Brainstorming TBD x3Prototyping Project PresentationsProject Management Reflection
  4. 4. Data Mining OverviewHow do I see andcommunicate answers? Lecture 2, HW2What questions shouldI ask of the data? Today, HW3 on-demandHow do I clean andprocess the data?How do I gather Later?meaningful data?
  5. 5. THIS LECTURE BARELY SCRATCHES THESURFACE OF INFORMATION VISUALIZATION.IT IS A JUMPING OFF POINT.
  6. 6. Data Mining OverviewHow do I see andcommunicate answers? Lecture 2, HW2What questions shouldI ask of the data? Today, HW3 on-demandHow do I clean andprocess the data?How do I gather Later?meaningful data?
  7. 7. Data ExplorationOften the questions are not obvious and it’s useful to look at the data for inspiration.
  8. 8. Exploration: Data Cubes Basic operations: ‣ Group (how to chunk data) ‣ Summarize (sum, mean, etc.) ‣ Filter (which rows to include)
  9. 9. Pivot Table Tutorial
  10. 10. Data Mining OverviewHow do I see andcommunicate answers? Lecture 2, HW2What questions shouldI ask of the data? Today, HW3 on-demandHow do I clean andprocess the data?How do I gather Later?meaningful data?
  11. 11. ObjectivesDATA MINING EACH TECHNIQUE‣ What is it? ‣ What is it doing?‣ How does it relate to ‣ Why is it useful? collective intelligence? ‣ How might you apply it?
  12. 12. Are there patterns in the data?HUMAN VISUAL vs. COMPUTER SYSTEM ANALYSIS
  13. 13. Why might we prefer analysis? LABOR ACCURACYToo many pictures to look at. Can test for statistical significance, etc.Don’t know which areinteresting. Some patterns don’t visualize easily. HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
  14. 14. Common Techniques Clustering Classification & Regression Association Rules Anomaly DetectionHTTP://WWW.FLICKR.COM/PHOTOS/EXPLORATIVEAPPROACH/3866580875/
  15. 15. ClusteringFind naturalgroupings inthe dataOrganize data into classes:‣ high intra-class similarity‣ low inter-class similarity
  16. 16. Clustering Input Data Output Clusters Points Hard OR OR SoftSimilarities OR [ # of clusters ] Hierarchical
  17. 17. K-Means54 k132 k21 k30 0 1 2 3 4 5
  18. 18. K-Means54 k132 k21 k30 0 1 2 3 4 5
  19. 19. K-Means54 k132 k31 k20 0 1 2 3 4 5
  20. 20. K-Means54 k132 k31 k20 0 1 2 3 4 5
  21. 21. K-Means 5expression in condition 2 4 k1 3 2 k2 1 k3 0 0 1 2 3 4 5 expression in condition 1
  22. 22. Classification RegressionLearn to map objects to Learn map objects tocategories continuous variables
  23. 23. Typical ApplicationsSpeech Handwriting OCR
  24. 24. ClassificationObservations X Learn f(x) = yLabels Y Y = gender MaleFemale X = height
  25. 25. The Whole Process Data Set Featurization Featurized Random Split (e.g. 90/10)Training Data Test Data Training Model Evaluation Results
  26. 26. Real-World ClassificationObservations X Y - 100’s of labels X - 1000’s of featuresLabels Y N - Millions of examples ? - Not all data is labeled ? - Some data is mis-labeled f(x) = y Model spatial context Model temporal context
  27. 27. Association RulesLearn interestingrelations in the data = proportion of events in which X occurs
  28. 28. Anomaly Detection Detect strange events in the data
  29. 29. Homework: Data Mining1. Form groups!2. Choose a Collective Intelligence topic from Lecture 1, or propose similar.3. Make a list of data sources that might provide insights to that topic.4. Propose a set of meaningful questions about the data based on your intuition.5. How would you have to clean/process your data to start answering those questions?6. Consider clustering, association rules, anomaly detection, classification. For each technique, how might you apply it to the data and what would it show?7. Document your work and be prepared to present. HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/
  30. 30. Data Mining OverviewHow do I see andcommunicate answers? Lecture 2, HW2What questions shouldI ask of the data? Today, HW3 on-demandHow do I clean andprocess the data?How do I gather Later?meaningful data?
  31. 31. Guest Lecture
  32. 32. Feedback
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×