Data Design

807
-1

Published on

Combining data mining building blocks to build real systems.

Published in: Technology, Sports
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
807
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Design

  1. 1. Data Design 2114.409: Creative Research PracticeHTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
  2. 2. ReflectionStatus CheckConcerns Programming What can we build HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
  3. 3. Course Outline1. Foundations 3. PrototypingIntroduction CrawlingSurvey Methods / Data Mining Text MiningVisualization and Analysis To be determined (TBD)Social Mechanics Project Update2. Methods 4. RefinementCreativity and Brainstorming TBD x3Prototyping Project PresentationsProject Management Reflection
  4. 4. Last Week: Building Blocks Clustering Classification & Regression Association Rules Outlier Detection HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
  5. 5. This Week: SystemsHTTPS://WWW.FACEBOOK.COM/PHOTO.PHP?FBID=407391545956901&SET=A.407391429290246.110679.100000581776191&TYPE=3&THEATER
  6. 6. Data Mining OverviewHow do I see and Visualization, Storytellingcommunicate answers?What questions should Design, Data ExplorationI ask of the data?How do I clean and Analysis Techniquesprocess the data?How do I gather Crawling, Surveys, UX Designmeaningful data?
  7. 7. Why might we prefer analysis? LABOR ACCURACYToo many pictures to look at. Can test for statistical significance, etc.Don’t know which areinteresting. Some patterns don’t visualize easily. HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
  8. 8. ClusteringFind naturalgroupings inthe dataOrganize data into classes:‣ high intra-class similarity‣ low inter-class similarity
  9. 9. Clustering Input Data Output Clusters Points Hard OR OR SoftSimilarities OR [ # of clusters ] Hierarchical
  10. 10. Classification RegressionLearn to map objects to Learn map objects tocategories continuous variables
  11. 11. ClassificationObservations X Learn f(x) = yLabels Y Y = gender MaleFemale X = height
  12. 12. The Whole Process Data Set Featurization Featurized Random Split (e.g. 90/10)Training Data Test Data Training Model Evaluation Results
  13. 13. Association RulesLearn interestingrelations in the data = proportion of events in which X occurs
  14. 14. Anomaly Detection Detect strange events in the data Simplest measure:
  15. 15. What Can We Build?HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
  16. 16. Collective IntelligenceClicks,) Likes,) Updates,) Ar,cles,)Scrolls,) Links,) Reviews,) Images,) Time) Checkins) Comments) Video) Collec,ve) How can we harness the Intelligence) activities of the world’s digital citizens to build new and useful consumer services? Community)
  17. 17. PoliticsThe Korean elections are coming. Howdoes the Internet tell us more thantraditional polling ever could?
  18. 18. PoliticsWhat issues are important?Who are the influencers?How can we segment/characterize support groups?How do we spread our opinions more widely?Who will win the election?
  19. 19. How can we build this? “Can socialmedia predict electionoutcomes?” HTTP://WWW.USATODAY.COM/TECH/ NEWS/STORY/2012-03-05/SOCIAL- SUPER-TUESDAY-PREDICTION/ 53374536/1
  20. 20. Tweet Insert Magic Author Date Here? BodyRetweetsHashtags Prediction Candidate Location Classification &Author Clustering Regression Score Profile Confidence TweetsFavoritesFollowingFollowers Association OutlierLocation Rules Detection
  21. 21. Workshop
  22. 22. Sentiment + Candidate System OverviewTweet Inputs Correction based Scoring on past elections RefinementsAuthor Inputs RMSE Evaluation
  23. 23. Sentiment DetailInput Observation Feature Extractor Classifier Output Label Confusion Matrix Evaluation N-Gram Features Training Process Tweet + Label
  24. 24. Entertainment Food Movements HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/ HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/ Collaboration Shopping Travel HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/ HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/ HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/ Investing Medicine TrustHTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/ HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/ HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
  25. 25. Homework: Data Mining1. Form groups!2. Choose a Collective Intelligence topic from Lecture 1, or propose similar.3. Make a list of data sources that might provide insights to that topic.4. Propose a set of meaningful questions about the data based on your intuition.5. How would you have to clean/process your data to start answering those questions?6. Consider clustering, association rules, anomaly detection, classification. For each technique, how might you apply it to the data and what would it show?7. Document your work and be prepared to present. HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/
  26. 26. Feedback
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×