Data Mining
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Data Mining

  • 1,447 views
Uploaded on

High-level overview of common data mining techniques.

High-level overview of common data mining techniques.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,447
On Slideshare
1,154
From Embeds
293
Number of Embeds
2

Actions

Shares
Downloads
100
Comments
0
Likes
1

Embeds 293

http://snu.lab80.co 283
http://www.tumblr.com 10

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Mining 2114.409: Creative Research PracticeHTTP://WWW.FLICKR.COM/PHOTOS/CPBILLS/2888144434/
  • 2. ReflectionHomework 2 Status? AuditorsConcerns Programming What can we build HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
  • 3. Course Outline1. Foundations 3. PrototypingIntroduction CrawlingSurvey Methods / Data Mining Text MiningVisualization and Analysis To be determined (TBD)Social Mechanics Project Update2. Methods 4. RefinementCreativity and Brainstorming TBD x3Prototyping Project PresentationsProject Management Reflection
  • 4. Data Mining OverviewHow do I see andcommunicate answers? Lecture 2, HW2What questions shouldI ask of the data? Today, HW3 on-demandHow do I clean andprocess the data?How do I gather Later?meaningful data?
  • 5. THIS LECTURE BARELY SCRATCHES THESURFACE OF INFORMATION VISUALIZATION.IT IS A JUMPING OFF POINT.
  • 6. Data Mining OverviewHow do I see andcommunicate answers? Lecture 2, HW2What questions shouldI ask of the data? Today, HW3 on-demandHow do I clean andprocess the data?How do I gather Later?meaningful data?
  • 7. Data ExplorationOften the questions are not obvious and it’s useful to look at the data for inspiration.
  • 8. Exploration: Data Cubes Basic operations: ‣ Group (how to chunk data) ‣ Summarize (sum, mean, etc.) ‣ Filter (which rows to include)
  • 9. Pivot Table Tutorial
  • 10. Data Mining OverviewHow do I see andcommunicate answers? Lecture 2, HW2What questions shouldI ask of the data? Today, HW3 on-demandHow do I clean andprocess the data?How do I gather Later?meaningful data?
  • 11. ObjectivesDATA MINING EACH TECHNIQUE‣ What is it? ‣ What is it doing?‣ How does it relate to ‣ Why is it useful? collective intelligence? ‣ How might you apply it?
  • 12. Are there patterns in the data?HUMAN VISUAL vs. COMPUTER SYSTEM ANALYSIS
  • 13. Why might we prefer analysis? LABOR ACCURACYToo many pictures to look at. Can test for statistical significance, etc.Don’t know which areinteresting. Some patterns don’t visualize easily. HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
  • 14. Common Techniques Clustering Classification & Regression Association Rules Anomaly DetectionHTTP://WWW.FLICKR.COM/PHOTOS/EXPLORATIVEAPPROACH/3866580875/
  • 15. ClusteringFind naturalgroupings inthe dataOrganize data into classes:‣ high intra-class similarity‣ low inter-class similarity
  • 16. Clustering Input Data Output Clusters Points Hard OR OR SoftSimilarities OR [ # of clusters ] Hierarchical
  • 17. K-Means54 k132 k21 k30 0 1 2 3 4 5
  • 18. K-Means54 k132 k21 k30 0 1 2 3 4 5
  • 19. K-Means54 k132 k31 k20 0 1 2 3 4 5
  • 20. K-Means54 k132 k31 k20 0 1 2 3 4 5
  • 21. K-Means 5expression in condition 2 4 k1 3 2 k2 1 k3 0 0 1 2 3 4 5 expression in condition 1
  • 22. Classification RegressionLearn to map objects to Learn map objects tocategories continuous variables
  • 23. Typical ApplicationsSpeech Handwriting OCR
  • 24. ClassificationObservations X Learn f(x) = yLabels Y Y = gender MaleFemale X = height
  • 25. The Whole Process Data Set Featurization Featurized Random Split (e.g. 90/10)Training Data Test Data Training Model Evaluation Results
  • 26. Real-World ClassificationObservations X Y - 100’s of labels X - 1000’s of featuresLabels Y N - Millions of examples ? - Not all data is labeled ? - Some data is mis-labeled f(x) = y Model spatial context Model temporal context
  • 27. Association RulesLearn interestingrelations in the data = proportion of events in which X occurs
  • 28. Anomaly Detection Detect strange events in the data
  • 29. Homework: Data Mining1. Form groups!2. Choose a Collective Intelligence topic from Lecture 1, or propose similar.3. Make a list of data sources that might provide insights to that topic.4. Propose a set of meaningful questions about the data based on your intuition.5. How would you have to clean/process your data to start answering those questions?6. Consider clustering, association rules, anomaly detection, classification. For each technique, how might you apply it to the data and what would it show?7. Document your work and be prepared to present. HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/
  • 30. Data Mining OverviewHow do I see andcommunicate answers? Lecture 2, HW2What questions shouldI ask of the data? Today, HW3 on-demandHow do I clean andprocess the data?How do I gather Later?meaningful data?
  • 31. Guest Lecture
  • 32. Feedback