Data Design                                                           2114.409: Creative Research PracticeHTTP://WWW.FLICK...
ReflectionStatus CheckConcerns Programming What can we build                     HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/7671...
Course Outline1. Foundations                 3. PrototypingIntroduction                   CrawlingSurvey Methods / Data Mi...
Last Week: Building Blocks    Clustering   Classification   & Regression   Association     Rules     Outlier    Detection  ...
This Week: SystemsHTTPS://WWW.FACEBOOK.COM/PHOTO.PHP?FBID=407391545956901&SET=A.407391429290246.110679.100000581776191&TYP...
Data Mining OverviewHow do I see and                        Visualization, Storytellingcommunicate answers?What questions ...
Why might we prefer analysis?         LABOR                       ACCURACYToo many pictures to look at.   Can test for sta...
ClusteringFind naturalgroupings inthe dataOrganize data into classes:‣ high intra-class similarity‣ low inter-class simila...
Clustering         Input Data                  Output Clusters  Points                                           Hard     ...
Classification               RegressionLearn to map objects to   Learn map objects tocategories                continuous v...
ClassificationObservations    X   Learn         f(x) = yLabels          Y                     Y = gender MaleFemale        ...
The Whole Process                     Data Set                                Featurization                   Featurized  ...
Association RulesLearn interestingrelations in the data                        = proportion of events in which X occurs
Anomaly Detection          Detect strange          events in the data            Simplest measure:
What Can                                                  We Build?HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
Collective IntelligenceClicks,)      Likes,)      Updates,)   Ar,cles,)Scrolls,)     Links,)      Reviews,)   Images,) Tim...
PoliticsThe Korean elections are coming. Howdoes the Internet tell us more thantraditional polling ever could?
PoliticsWhat issues are important?Who are the influencers?How can we segment/characterize support groups?How do we spread ...
How can we build this? “Can socialmedia predict  electionoutcomes?” HTTP://WWW.USATODAY.COM/TECH/ NEWS/STORY/2012-03-05/SO...
Tweet       Insert Magic Author  Date         Here? BodyRetweetsHashtags                                    Prediction    ...
Workshop
Sentiment +                         Candidate              System OverviewTweet Inputs                                    ...
Sentiment DetailInput Observation   Feature Extractor                                                          Classifier  ...
Entertainment                                                              Food                                           ...
Homework: Data Mining1. Form groups!2. Choose a Collective Intelligence topic from   Lecture 1, or propose similar.3. Make...
Feedback
Upcoming SlideShare
Loading in...5
×

Data Design

761

Published on

Combining data mining building blocks to build real systems.

Published in: Technology, Sports
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
761
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Data Design"

  1. 1. Data Design 2114.409: Creative Research PracticeHTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
  2. 2. ReflectionStatus CheckConcerns Programming What can we build HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
  3. 3. Course Outline1. Foundations 3. PrototypingIntroduction CrawlingSurvey Methods / Data Mining Text MiningVisualization and Analysis To be determined (TBD)Social Mechanics Project Update2. Methods 4. RefinementCreativity and Brainstorming TBD x3Prototyping Project PresentationsProject Management Reflection
  4. 4. Last Week: Building Blocks Clustering Classification & Regression Association Rules Outlier Detection HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
  5. 5. This Week: SystemsHTTPS://WWW.FACEBOOK.COM/PHOTO.PHP?FBID=407391545956901&SET=A.407391429290246.110679.100000581776191&TYPE=3&THEATER
  6. 6. Data Mining OverviewHow do I see and Visualization, Storytellingcommunicate answers?What questions should Design, Data ExplorationI ask of the data?How do I clean and Analysis Techniquesprocess the data?How do I gather Crawling, Surveys, UX Designmeaningful data?
  7. 7. Why might we prefer analysis? LABOR ACCURACYToo many pictures to look at. Can test for statistical significance, etc.Don’t know which areinteresting. Some patterns don’t visualize easily. HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
  8. 8. ClusteringFind naturalgroupings inthe dataOrganize data into classes:‣ high intra-class similarity‣ low inter-class similarity
  9. 9. Clustering Input Data Output Clusters Points Hard OR OR SoftSimilarities OR [ # of clusters ] Hierarchical
  10. 10. Classification RegressionLearn to map objects to Learn map objects tocategories continuous variables
  11. 11. ClassificationObservations X Learn f(x) = yLabels Y Y = gender MaleFemale X = height
  12. 12. The Whole Process Data Set Featurization Featurized Random Split (e.g. 90/10)Training Data Test Data Training Model Evaluation Results
  13. 13. Association RulesLearn interestingrelations in the data = proportion of events in which X occurs
  14. 14. Anomaly Detection Detect strange events in the data Simplest measure:
  15. 15. What Can We Build?HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
  16. 16. Collective IntelligenceClicks,) Likes,) Updates,) Ar,cles,)Scrolls,) Links,) Reviews,) Images,) Time) Checkins) Comments) Video) Collec,ve) How can we harness the Intelligence) activities of the world’s digital citizens to build new and useful consumer services? Community)
  17. 17. PoliticsThe Korean elections are coming. Howdoes the Internet tell us more thantraditional polling ever could?
  18. 18. PoliticsWhat issues are important?Who are the influencers?How can we segment/characterize support groups?How do we spread our opinions more widely?Who will win the election?
  19. 19. How can we build this? “Can socialmedia predict electionoutcomes?” HTTP://WWW.USATODAY.COM/TECH/ NEWS/STORY/2012-03-05/SOCIAL- SUPER-TUESDAY-PREDICTION/ 53374536/1
  20. 20. Tweet Insert Magic Author Date Here? BodyRetweetsHashtags Prediction Candidate Location Classification &Author Clustering Regression Score Profile Confidence TweetsFavoritesFollowingFollowers Association OutlierLocation Rules Detection
  21. 21. Workshop
  22. 22. Sentiment + Candidate System OverviewTweet Inputs Correction based Scoring on past elections RefinementsAuthor Inputs RMSE Evaluation
  23. 23. Sentiment DetailInput Observation Feature Extractor Classifier Output Label Confusion Matrix Evaluation N-Gram Features Training Process Tweet + Label
  24. 24. Entertainment Food Movements HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/ HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/ Collaboration Shopping Travel HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/ HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/ HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/ Investing Medicine TrustHTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/ HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/ HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
  25. 25. Homework: Data Mining1. Form groups!2. Choose a Collective Intelligence topic from Lecture 1, or propose similar.3. Make a list of data sources that might provide insights to that topic.4. Propose a set of meaningful questions about the data based on your intuition.5. How would you have to clean/process your data to start answering those questions?6. Consider clustering, association rules, anomaly detection, classification. For each technique, how might you apply it to the data and what would it show?7. Document your work and be prepared to present. HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/
  26. 26. Feedback
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×