Data Design 2114.409: Creative Research PracticeHTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
ReﬂectionStatus CheckConcerns Programming What can we build HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
Course Outline1. Foundations 3. PrototypingIntroduction CrawlingSurvey Methods / Data Mining Text MiningVisualization and Analysis To be determined (TBD)Social Mechanics Project Update2. Methods 4. ReﬁnementCreativity and Brainstorming TBD x3Prototyping Project PresentationsProject Management Reﬂection
Last Week: Building Blocks Clustering Classiﬁcation & Regression Association Rules Outlier Detection HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
This Week: SystemsHTTPS://WWW.FACEBOOK.COM/PHOTO.PHP?FBID=407391545956901&SET=A.407391429290246.110679.100000581776191&TYPE=3&THEATER
Data Mining OverviewHow do I see and Visualization, Storytellingcommunicate answers?What questions should Design, Data ExplorationI ask of the data?How do I clean and Analysis Techniquesprocess the data?How do I gather Crawling, Surveys, UX Designmeaningful data?
Why might we prefer analysis? LABOR ACCURACYToo many pictures to look at. Can test for statistical signiﬁcance, etc.Don’t know which areinteresting. Some patterns don’t visualize easily. HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
ClusteringFind naturalgroupings inthe dataOrganize data into classes:‣ high intra-class similarity‣ low inter-class similarity
Clustering Input Data Output Clusters Points Hard OR OR SoftSimilarities OR [ # of clusters ] Hierarchical
Classiﬁcation RegressionLearn to map objects to Learn map objects tocategories continuous variables
ClassiﬁcationObservations X Learn f(x) = yLabels Y Y = gender MaleFemale X = height
The Whole Process Data Set Featurization Featurized Random Split (e.g. 90/10)Training Data Test Data Training Model Evaluation Results
Association RulesLearn interestingrelations in the data = proportion of events in which X occurs
Anomaly Detection Detect strange events in the data Simplest measure:
What Can We Build?HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
Collective IntelligenceClicks,) Likes,) Updates,) Ar,cles,)Scrolls,) Links,) Reviews,) Images,) Time) Checkins) Comments) Video) Collec,ve) How can we harness the Intelligence) activities of the world’s digital citizens to build new and useful consumer services? Community)
PoliticsThe Korean elections are coming. Howdoes the Internet tell us more thantraditional polling ever could?
PoliticsWhat issues are important?Who are the influencers?How can we segment/characterize support groups?How do we spread our opinions more widely?Who will win the election?
How can we build this? “Can socialmedia predict electionoutcomes?” HTTP://WWW.USATODAY.COM/TECH/ NEWS/STORY/2012-03-05/SOCIAL- SUPER-TUESDAY-PREDICTION/ 53374536/1
Sentiment + Candidate System OverviewTweet Inputs Correction based Scoring on past elections ReﬁnementsAuthor Inputs RMSE Evaluation
Sentiment DetailInput Observation Feature Extractor Classiﬁer Output Label Confusion Matrix Evaluation N-Gram Features Training Process Tweet + Label
Entertainment Food Movements HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/ HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/ Collaboration Shopping Travel HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/ HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/ HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/ Investing Medicine TrustHTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/ HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/ HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
Homework: Data Mining1. Form groups!2. Choose a Collective Intelligence topic from Lecture 1, or propose similar.3. Make a list of data sources that might provide insights to that topic.4. Propose a set of meaningful questions about the data based on your intuition.5. How would you have to clean/process your data to start answering those questions?6. Consider clustering, association rules, anomaly detection, classiﬁcation. For each technique, how might you apply it to the data and what would it show?7. Document your work and be prepared to present. HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/