Your SlideShare is downloading. ×
0
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Data Design
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Design

720

Published on

Combining data mining building blocks to build real systems.

Combining data mining building blocks to build real systems.

Published in: Technology, Sports
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
720
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data Design 2114.409: Creative Research PracticeHTTP://WWW.FLICKR.COM/PHOTOS/SERGIU_BACIOIU/4370021957/
  • 2. ReflectionStatus CheckConcerns Programming What can we build HTTP://WWW.FLICKR.COM/PHOTOS/FLOWER87/76719859/
  • 3. Course Outline1. Foundations 3. PrototypingIntroduction CrawlingSurvey Methods / Data Mining Text MiningVisualization and Analysis To be determined (TBD)Social Mechanics Project Update2. Methods 4. RefinementCreativity and Brainstorming TBD x3Prototyping Project PresentationsProject Management Reflection
  • 4. Last Week: Building Blocks Clustering Classification & Regression Association Rules Outlier Detection HTTP://WWW.FLICKR.COM/PHOTOS/OGIMOGI/2253657555/
  • 5. This Week: SystemsHTTPS://WWW.FACEBOOK.COM/PHOTO.PHP?FBID=407391545956901&SET=A.407391429290246.110679.100000581776191&TYPE=3&THEATER
  • 6. Data Mining OverviewHow do I see and Visualization, Storytellingcommunicate answers?What questions should Design, Data ExplorationI ask of the data?How do I clean and Analysis Techniquesprocess the data?How do I gather Crawling, Surveys, UX Designmeaningful data?
  • 7. Why might we prefer analysis? LABOR ACCURACYToo many pictures to look at. Can test for statistical significance, etc.Don’t know which areinteresting. Some patterns don’t visualize easily. HTTP://WWW.FLICKR.COM/PHOTOS/STRIATIC/2144933705/
  • 8. ClusteringFind naturalgroupings inthe dataOrganize data into classes:‣ high intra-class similarity‣ low inter-class similarity
  • 9. Clustering Input Data Output Clusters Points Hard OR OR SoftSimilarities OR [ # of clusters ] Hierarchical
  • 10. Classification RegressionLearn to map objects to Learn map objects tocategories continuous variables
  • 11. ClassificationObservations X Learn f(x) = yLabels Y Y = gender MaleFemale X = height
  • 12. The Whole Process Data Set Featurization Featurized Random Split (e.g. 90/10)Training Data Test Data Training Model Evaluation Results
  • 13. Association RulesLearn interestingrelations in the data = proportion of events in which X occurs
  • 14. Anomaly Detection Detect strange events in the data Simplest measure:
  • 15. What Can We Build?HTTP://WWW.FLICKR.COM/PHOTOS/BPENDE/6736531173/
  • 16. Collective IntelligenceClicks,) Likes,) Updates,) Ar,cles,)Scrolls,) Links,) Reviews,) Images,) Time) Checkins) Comments) Video) Collec,ve) How can we harness the Intelligence) activities of the world’s digital citizens to build new and useful consumer services? Community)
  • 17. PoliticsThe Korean elections are coming. Howdoes the Internet tell us more thantraditional polling ever could?
  • 18. PoliticsWhat issues are important?Who are the influencers?How can we segment/characterize support groups?How do we spread our opinions more widely?Who will win the election?
  • 19. How can we build this? “Can socialmedia predict electionoutcomes?” HTTP://WWW.USATODAY.COM/TECH/ NEWS/STORY/2012-03-05/SOCIAL- SUPER-TUESDAY-PREDICTION/ 53374536/1
  • 20. Tweet Insert Magic Author Date Here? BodyRetweetsHashtags Prediction Candidate Location Classification &Author Clustering Regression Score Profile Confidence TweetsFavoritesFollowingFollowers Association OutlierLocation Rules Detection
  • 21. Workshop
  • 22. Sentiment + Candidate System OverviewTweet Inputs Correction based Scoring on past elections RefinementsAuthor Inputs RMSE Evaluation
  • 23. Sentiment DetailInput Observation Feature Extractor Classifier Output Label Confusion Matrix Evaluation N-Gram Features Training Process Tweet + Label
  • 24. Entertainment Food Movements HTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/WILLIA4/2504379334/ HTTP://WWW.FLICKR.COM/PHOTOS/GILSONROME/6247208325/ Collaboration Shopping Travel HTTP://WWW.FLICKR.COM/PHOTOS/FIDELMAN/4640722483/ HTTP://WWW.FLICKR.COM/PHOTOS/ZOOBOING/4473219605/ HTTP://WWW.FLICKR.COM/PHOTOS/FELIPENEVES/5414239936/ Investing Medicine TrustHTTP://WWW.FLICKR.COM/PHOTOS/STUCKINCUSTOMS/2786154526/ HTTP://WWW.FLICKR.COM/PHOTOS/TRAVEL_AFICIONADO/2396819536/ HTTP://WWW.FLICKR.COM/PHOTOS/AGECOMBAHIA/6425101047/ HTTP://WWW.FLICKR.COM/PHOTOS/MARKETINGFACTS/6758968163/
  • 25. Homework: Data Mining1. Form groups!2. Choose a Collective Intelligence topic from Lecture 1, or propose similar.3. Make a list of data sources that might provide insights to that topic.4. Propose a set of meaningful questions about the data based on your intuition.5. How would you have to clean/process your data to start answering those questions?6. Consider clustering, association rules, anomaly detection, classification. For each technique, how might you apply it to the data and what would it show?7. Document your work and be prepared to present. HTTP://WWW.FLICKR.COM/PHOTOS/31907740@N00/4860840019/
  • 26. Feedback

×