Mining Social Web APIs
with IPython Notebook
Matthew A. Russell - @ptwobrussell - http://MiningTheSocialWeb.com
Montréal -...
Intro
2
Hello, My Name Is ... Matthew
3
Background in Computer Science
Data mining & machine learning
CTO @ Digital Reasoning Syst...
4
The only easy day was yesterday
-- Motto of the U.S. Navy SEALs
5
It pays to be a winner
-- Motto of the U.S. Navy SEALs
Transforming Curiosity Into Insight
6
An open source software (OSS) project
http://bit.ly/MiningTheSocialWeb2E
A book
http...
Table of Contents (1/2)
Chapter 1 - Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking
About, ...
Table of Contents (2/2)
Chapter 7 - Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs,
and...
Designed for Pedagogy
Brief Intro
Objectives
API Primer
Analysis Technique(s)
Data Visualization
Recap
Suggested Exercises...
The Social Web Is All the Rage
World population: ~7B people
Facebook: 1.15B users
Twitter: 500M users
Google+ 343M users
L...
Overview
Intro (5 mins)
Module 1 - Virtual Machine Setup (10 mins)
Module 2 - Mining Twitter (45 mins)
Module 3 - Mining F...
Module Format
~10-15 minutes of exposition
I talk; you listen
~15 minutes of independent (or collaborative) work
You hack ...
Workshop Objective
To send you away as a social web hacker
Broad working knowledge popular social web APIs
Hands-on experi...
Just a Few More Things
This workshop is...
An adaptation of Mining the Social Web, 2nd Edition
More of a guided hacking se...
Assumptions
At some point in your life, you have
Programmed with Python
Worked with JSON
Made requests and processed respo...
Module 1: Virtual Machine Setup
16
Why do you need a VM?
17
To save time
Because installation and configuration management is harder than it first
appears
So t...
But I can do all of that myself...
True...
If you would rather troubleshoot unexpected installation/configuration issues
in...
The Virtual Machine Experience
Vagrant
A nice abstraction around virtual machine providers
One ring to rule them all
Virtu...
What happens when you vagrant up?
Vagrant follows the instructions in your Vagrantfile
Starts up a Virtualbox instance
Uses...
Why Should I Use IPython Notebook?
Because it's great for hacking
And hacking is usually the first step
Because it's great ...
22
23
VM Quick Start Instructions
Go to http://MiningTheSocialWeb.com/quick-start/
Follow the instructions
And watch the screenc...
What Could Be Easier?
A hosted version of the VM!
But only for a few hours during this workshop
Because it costs money to ...
Module 2: Mining Twitter
26
Objectives
27
Be able to identify Twitter primitives
Understand tweet metadata and how to use it
Learn how to extract enti...
Twitter Primitives
28
Accounts Types: "Anything"
"Following" Relationships
Favorites
Retweets
Replies
(Almost) No Privacy ...
API Requests
RESTful requests
Everything is a "resource"
You GET, PUT, POST, and DELETE resources
Standard HTTP "verbs"
Ex...
Twitter is an Interest Graph
30
Roberto Mercedes
Jorge
Ana
Nina
Johnny
Araya
Rodolfo
Hernández
What's in a Tweet?
31
140 Characters ...
... Plus ~5KB of metadata!
Authorship
Time & location
Tweet "entities"
Replying, ...
What are Tweet Entities?
Essentially, the "easy to get at" data in the 140 characters
@usermentions
#hashtags
URLs
multipl...
Data Mining = Curiosity + Stats
Curiosity
Interests, desires, and intuitions
Statistics
Counting
Comparing
Filtering
Ranki...
Histograms
A chart that is handy for frequency analysis
They look like bar charts...except they're not bar charts
Each val...
35
Example: Histogram of Retweets
Social Media Analysis Framework
A memorable four step process to guide data science experiments:
Aspire
To test a hypothes...
Exercises
Review Python idioms in the "Appendix C (Python Tips & Tricks)" notebook
Follow the setup instructions in the "C...
Module 3: Mining Facebook
38
Objectives
39
Be able to identify Facebook primitives
Learn about Facebook’s Social Graph API and how to make API requests...
Facebook Primitives
Account Types: People & Pages
Mutual Connections
Likes
Shares
Comments
Extensive Privacy Controls
40
API Requests
Social Graph API requests
Not RESTful but easy to learn and use
Special "field expansion" syntax
Example: GET ...
Facebook is an Interest Graph
42
Roberto Mercedes
Jorge
Ana
Nina
Johnny
Araya
Rodolfo
Hernández
Facebook API Explorer
43
Go to https://developers.facebook.com/tools/explorer
Really, go there right now...
44
Retrieve Your Likes
Facebook Permissions
45
Facebook Permissions
46
Explore Facebook Pages
47
Names of pages
MiningTheSocialWeb
CrossFit
OReilly
Web URLs (OGP extensions to Facebook's Social...
Social Media Analysis Framework
Recall the same four step process to guide data science experiments:
Aspire
Acquire
Analyz...
Social Network Diagram with D3
49
Exercises
Copy/paste your access token from the Graph API Explorer into the "Chapter 2
(Mining Facebook)" notebook
Paste t...
Module 4: Mining LinkedIn
51
Objectives
52
Learn about LinkedIn’s Developer Platform
Understand how clustering works
A fundamental type of machine lear...
LinkedIn Primitives
Account Types: People, Companies
The data seems "more closely held" than Facebook or Twitter
No FOAF v...
API Requests
(Strangely) RESTful Requests
Not really RESTful
Field selector syntax
http://api.linkedin.com/v1/people/~:(fir...
Is LinkedIn an Interest Graph?
Fundamentally: yes. But not so much at the developer API level
Less trivial to find some of ...
Clustering
An unsupervised machine learning learning technique
Think: an algorithm that organizes the data into partitions...
Example: Clustered Job Titles
57
3 Steps to Clustering Your Data
Normalization
Compare (similarity/distance measurement)
n-grams, edit distance, and Jaccar...
Jaccard Similarity
59
k-Means Explained
1. Randomly pick k points in the data space as initial values that will be used to
compute the k cluster...
k-Means: Initialize
61
k-Means: Step 1
62
k-Means: Step 2
63
k-Means: Step 3
64
k-Means: (Fast-Forward) Step 9
65
Geocoding
Transforming a location to a set of coordinates
Nashville, TN => (36.16783905029297, -86.77816009521484)
A harde...
Introducing: The Dorling Cartogram
67
Social Media Analysis Framework
Remember: Use the same four step process to guide data science experiments:
Aspire
Acquire...
Exercises
Follow the instructions in the "Chapter 3 (Mining LinkedIn)" notebook to create an API
connection and follow alo...
Module 5: Choice
70
Objectives
71
To work on "loose ends" or areas of interest from previous modules
To hack on code in notebooks not yet enco...
Social Media Analysis Framework
Remember:
Aspire
Acquire
Analyze
Summarize
72
Recommendations
Setup your own development environment if you haven't already
Appendix A
Text Mining & Natural Language Pr...
Module 6: Privacy & Ethics
74
75
Know thy data, and know thyself
--Matthew A. Russell
76
If we have data, let’s look at data.
If we have opinions, let’s go with mine
--Jim Barksdale
77
In God we trust. All others must bring data
--W. Edwards Deming
Communication => Data
Communication
Senders
humans & machines
Messages
natural language, images, videos, etc.
Recipients
h...
Data Alchemy
Data: Documents & document fragments (text messages, etc.)
Information: "Assertions", summaries, tags, etc.
K...
Machine Learning
80
A program that learns (improves) from experience (data) according
to some objective
Supervised learnin...
81
Knowledge is a process of piling up facts;
wisdom lies in their simplification
--Martin Fischer
82
Any sufficiently advanced technology is
indistinguishable from magic
--Arthur C. Clarke
Is Privacy Already an Illusion?
83
Digital happenings circa 2014
The Cloud
Social Media
Deep Learning
The Internet of Thin...
84
Civilization is the progress toward a society of privacy...
-- Ayn Rand
85
If you have something that you don’t want anyone to know,
maybe you shouldn’t be doing it in the first place.
-- Eric Sc...
Influences on Ethics
Capitalism, economics, & marketing
A for-profit corporation's fiduciary duty: To maximize the common sto...
Module 7: Final Q&A; Survey
87
Survey Link:
https://www.surveymonkey.com/s/pycon2014_tutorials
Free Stuff
http://MiningTheSocialWeb.com
Mining the Social Web 2E Chapter 1 (Chimera)
http://bit.ly/13XgNWR
Source Code (G...
Upcoming SlideShare
Loading in …5
×

Mining Social Web APIs with IPython Notebook (PyCon 2014)

3,415 views

Published on

From the tutorial description at https://us.pycon.org/2014/schedule/presentation/134/ -

Description

Social websites such as Twitter, Facebook, LinkedIn, Google+, and GitHub have vast amounts of valuable insights lurking just beneath the surface, and this workshop minimizes the barriers to exploring and mining this valuable data by presenting turn-key examples from the thoroughly revised 2nd Edition of Mining the Social Web.

Abstract

This workshop teaches you fundamental data mining techniques as applied to popular social websites by adapting example code from Mining the Social Web (2nd Edition, O'Reilly 2013) in a tutorial-style step-by-step manner that is designed specifically to accommodate attendees with very little programming or domain experience. This workshop's extensive use of IPython Notebook facilitates interactive learning with turn-key examples against a Vagrant-based virtual machine that takes care of installing all 3rd party dependencies that are needed. The barriers to entry are truly minimal, which allows maximal use of the time to be spent on interactive learning.

The workshop is somewhat broadly designed and acclimates you to mining social data from Twitter, Facebook, LinkedIn, Google+, and GitHub APIs in five corresponding modules with the following memorable approach for each of them:

* Aspire - Set out to answer a question or test a hypothesis as part of a data science experiment
* Acquire - Collect and store the data that you need to answer the question or test the hypothesis
* Analyze - Use fundamental data mining techniques to explore and exploit the data
* Summarize - Present analytical findings in a compact and meaningful way

Each module consists of a brief period in which each attendee will customize the corresponding notebook for the module with their own account credentials with the remainder of the module devoted to learning what data is available from the API and exercises demonstrating analysis of the data—all from a pre-populated IPython Notebook. Time will be set aside at the end of each module for attendees to hack on the code, discuss examples, and ask any lingering questions.

Published in: Software

Mining Social Web APIs with IPython Notebook (PyCon 2014)

  1. 1. Mining Social Web APIs with IPython Notebook Matthew A. Russell - @ptwobrussell - http://MiningTheSocialWeb.com Montréal - 9 April 2014 1
  2. 2. Intro 2
  3. 3. Hello, My Name Is ... Matthew 3 Background in Computer Science Data mining & machine learning CTO @ Digital Reasoning Systems Data mining; machine learning Author @ O'Reilly Media 5 published books on technology Principal @ Zaffra Selective boutique consulting
  4. 4. 4 The only easy day was yesterday -- Motto of the U.S. Navy SEALs
  5. 5. 5 It pays to be a winner -- Motto of the U.S. Navy SEALs
  6. 6. Transforming Curiosity Into Insight 6 An open source software (OSS) project http://bit.ly/MiningTheSocialWeb2E A book http://bit.ly/135dHfs Accessible to (virtually) everyone Virtual machine with turn-key coding templates for data science experiments Think of the book as "premium" support for the OSS project
  7. 7. Table of Contents (1/2) Chapter 1 - Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More Chapter 2 - Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More Chapter 3 - Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More Chapter 4 - Mining Google+: Computing Document Similarity, Extracting Collocations, and More Chapter 5 - Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More Chapter 6 - Mining Mailboxes: Analyzing Who's Talking to Whom About What, How Often, and More 7
  8. 8. Table of Contents (2/2) Chapter 7 - Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More Chapter 8 - Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More Chapter 9 - Twitter Cookbook Appendix A - Information About This Machine's Virtual Machine Experience Appendix B - OAuth Primer Appendix C - Python and IPython Notebook Tips & Tricks 8
  9. 9. Designed for Pedagogy Brief Intro Objectives API Primer Analysis Technique(s) Data Visualization Recap Suggested Exercises Recommended Resources 9
  10. 10. The Social Web Is All the Rage World population: ~7B people Facebook: 1.15B users Twitter: 500M users Google+ 343M users LinkedIn: 238M users ~200M+ blogs (conservative estimate) 10
  11. 11. Overview Intro (5 mins) Module 1 - Virtual Machine Setup (10 mins) Module 2 - Mining Twitter (45 mins) Module 3 - Mining Facebook (30 mins) BREAK (20 mins) Module 4 - Mining LinkedIn (30 mins) Module 5 - Choice: Open Hack (30 mins) Module 6 - Privacy & Ethics; (20 mins) Module 7 - Final Q&A; Surveys (10 mins) 11
  12. 12. Module Format ~10-15 minutes of exposition I talk; you listen ~15 minutes of independent (or collaborative) work You hack while I walk around and help you ~5 minutes of recap with Q&A You ask; I try to answer 12
  13. 13. Workshop Objective To send you away as a social web hacker Broad working knowledge popular social web APIs Hands-on experience hacking on social web data with a common toolkit Not for me talk to you for 3 straight hours 13
  14. 14. Just a Few More Things This workshop is... An adaptation of Mining the Social Web, 2nd Edition More of a guided hacking session where you follow along (vs a preso) Wider than it is deeper There's only so much you can do in a few hours I'm available 24/7 this week (and beyond) to help you be successful 14
  15. 15. Assumptions At some point in your life, you have Programmed with Python Worked with JSON Made requests and processed responses to/from web servers Or you want to learn to do these things now... And you're a quick learner 15
  16. 16. Module 1: Virtual Machine Setup 16
  17. 17. Why do you need a VM? 17 To save time Because installation and configuration management is harder than it first appears So that you can focus on the task at hand instead So that I can support you regardless of your hardware and operating system
  18. 18. But I can do all of that myself... True... If you would rather troubleshoot unexpected installation/configuration issues instead of immediately focusing on the real task at hand At least give it a shot before resorting to your own devices so that you don't have to install specific versions of ~40 Python packages Including scientific computing tools that require underlying C/C++ code to be compiled Which requires specific versions of developer libraries to be installed You get the idea... 18
  19. 19. The Virtual Machine Experience Vagrant A nice abstraction around virtual machine providers One ring to rule them all Virtualbox, VMWare, AWS, ... IPython Notebook The easiest way to program with Python A better REPL (interpreter) Great for hacking 19
  20. 20. What happens when you vagrant up? Vagrant follows the instructions in your Vagrantfile Starts up a Virtualbox instance Uses Chef to provision it Installs OS patches/updates Installs MTSW software dependencies Starts IPython Notebook server on port 8888 20
  21. 21. Why Should I Use IPython Notebook? Because it's great for hacking And hacking is usually the first step Because it's great for collaboration Sharing/publishing results is trivial Because the UX is as easy as working in a notepad Think of it as "executable paper" 21
  22. 22. 22
  23. 23. 23
  24. 24. VM Quick Start Instructions Go to http://MiningTheSocialWeb.com/quick-start/ Follow the instructions And watch the screencasts! Basically: Install Virtualbox & Vagrant Run "vagrant up" in a terminal to start a guest VM Then, go to http://localhost:8888 on your host machine's web browser 24
  25. 25. What Could Be Easier? A hosted version of the VM! But only for a few hours during this workshop Because it costs money to run these servers Go to [See Live Slides for URL] and pick a machine Do not share the URLs outside of this workshop! Please don't try to hack the machines Learn how I arrived at this setup at http://MiningTheSocialWeb.com 25
  26. 26. Module 2: Mining Twitter 26
  27. 27. Objectives 27 Be able to identify Twitter primitives Understand tweet metadata and how to use it Learn how to extract entities such as user mentions, hashtags, and URLs from tweets Apply techniques for performing frequency analysis with Python Be able to plot histograms of Twitter data with IPython Notebook
  28. 28. Twitter Primitives 28 Accounts Types: "Anything" "Following" Relationships Favorites Retweets Replies (Almost) No Privacy Controls
  29. 29. API Requests RESTful requests Everything is a "resource" You GET, PUT, POST, and DELETE resources Standard HTTP "verbs" Example: GET https://api.twitter.com/1.1/statuses/user_timeline.json? screen_name=SocialWebMining Streaming API filters JSON responses Cursors (not quite pagination) 29
  30. 30. Twitter is an Interest Graph 30 Roberto Mercedes Jorge Ana Nina Johnny Araya Rodolfo Hernández
  31. 31. What's in a Tweet? 31 140 Characters ... ... Plus ~5KB of metadata! Authorship Time & location Tweet "entities" Replying, retweeting, favoriting, etc.
  32. 32. What are Tweet Entities? Essentially, the "easy to get at" data in the 140 characters @usermentions #hashtags URLs multiple variations (financial) symbols stock tickers media 32
  33. 33. Data Mining = Curiosity + Stats Curiosity Interests, desires, and intuitions Statistics Counting Comparing Filtering Ranking Hypothesis testing; knowledge discovery 33
  34. 34. Histograms A chart that is handy for frequency analysis They look like bar charts...except they're not bar charts Each value on the x-axis is a range (or "bin") of values Not categorical data Each value on the y-axis is the combined frequency of values in each range 34
  35. 35. 35 Example: Histogram of Retweets
  36. 36. Social Media Analysis Framework A memorable four step process to guide data science experiments: Aspire To test a hypothesis (answer a question) Acquire Get the data Analyze Count things Summarize Plot the results 36
  37. 37. Exercises Review Python idioms in the "Appendix C (Python Tips & Tricks)" notebook Follow the setup instructions in the "Chapter 1 (Mining Twitter)" notebook Fill in Example 1-1 with credentials and begin work Execute each example sequentially Customize queries Explore tweet metadata; count tweet entities; plot histograms of results Explore the "Chapter 9 (Twitter Cookbook)" notebook Think of it as a collection of building blocks 37
  38. 38. Module 3: Mining Facebook 38
  39. 39. Objectives 39 Be able to identify Facebook primitives Learn about Facebook’s Social Graph API and how to make API requests Understand how Open Graph protocol extends Facebook's Social Graph API Be able to analyze likes from Facebook pages and friends
  40. 40. Facebook Primitives Account Types: People & Pages Mutual Connections Likes Shares Comments Extensive Privacy Controls 40
  41. 41. API Requests Social Graph API requests Not RESTful but easy to learn and use Special "field expansion" syntax Example: GET http://graph.facebook.com/ptwobrussell/? fields=id,name,friends.fields(likes.limit(10)) JSON responses Traditional pagination 41
  42. 42. Facebook is an Interest Graph 42 Roberto Mercedes Jorge Ana Nina Johnny Araya Rodolfo Hernández
  43. 43. Facebook API Explorer 43 Go to https://developers.facebook.com/tools/explorer Really, go there right now...
  44. 44. 44 Retrieve Your Likes
  45. 45. Facebook Permissions 45
  46. 46. Facebook Permissions 46
  47. 47. Explore Facebook Pages 47 Names of pages MiningTheSocialWeb CrossFit OReilly Web URLs (OGP extensions to Facebook's Social Graph) http://www.imdb.com/title/tt0117500
  48. 48. Social Media Analysis Framework Recall the same four step process to guide data science experiments: Aspire Acquire Analyze Summarize 48
  49. 49. Social Network Diagram with D3 49
  50. 50. Exercises Copy/paste your access token from the Graph API Explorer into the "Chapter 2 (Mining Facebook)" notebook Paste the value and execute the cell just before Example 2-1 Execute examples sequentially (try to at least make it to Example 2-10) Analyze your likes, your friends and likes from pages of interest If you have time... Remaining examples 50
  51. 51. Module 4: Mining LinkedIn 51
  52. 52. Objectives 52 Learn about LinkedIn’s Developer Platform Understand how clustering works A fundamental type of machine learning Be able to employ geocoding services to arrive at a set of coordinates from a textual reference to a location Visualize geographic data with cartograms
  53. 53. LinkedIn Primitives Account Types: People, Companies The data seems "more closely held" than Facebook or Twitter No FOAF visibility Richest data source Profile descriptions from mutual connections A little messier than it first appears Not necessarily a bad thing 53
  54. 54. API Requests (Strangely) RESTful Requests Not really RESTful Field selector syntax http://api.linkedin.com/v1/people/~:(first-name,last-name,headline,picture-url) XML responses CSV address book download 54
  55. 55. Is LinkedIn an Interest Graph? Fundamentally: yes. But not so much at the developer API level Less trivial to find some of the "pivots" No Skills API (yet?) But the data is there (mostly in profile descriptions) for your direct connections Companies, job titles, job descriptions Lots of richness is tucked away in human language data 55
  56. 56. Clustering An unsupervised machine learning learning technique Think: an algorithm that organizes the data into partitions 56
  57. 57. Example: Clustered Job Titles 57
  58. 58. 3 Steps to Clustering Your Data Normalization Compare (similarity/distance measurement) n-grams, edit distance, and Jaccard are common, but your imagination is the limit Why can't you just compare everything to everything? Dimensionality Reduction Ideally, your clustering algorithm will mitigate the pain k-means is among the most common clustering techniques in use 58
  59. 59. Jaccard Similarity 59
  60. 60. k-Means Explained 1. Randomly pick k points in the data space as initial values that will be used to compute the k clusters: K1, K2, ..., Kk. 2. Assign each of the n points to a cluster by finding the nearest Kn—effectively creating k clusters and requiring k*n comparisons. 3. For each of the k clusters, calculate the centroid, or the mean of the cluster, and reassign its Ki value to be that value. (Hence, you’re computing “k-means” during each iteration of the algorithm.) 4. Repeat steps 2–3 until the members of the clusters do not change between iterations. Generally speaking, relatively few iterations are required for convergence. 60
  61. 61. k-Means: Initialize 61
  62. 62. k-Means: Step 1 62
  63. 63. k-Means: Step 2 63
  64. 64. k-Means: Step 3 64
  65. 65. k-Means: (Fast-Forward) Step 9 65
  66. 66. Geocoding Transforming a location to a set of coordinates Nashville, TN => (36.16783905029297, -86.77816009521484) A harder problem than it first appears The Bing API is especially generous Requires an account sign up: http://bingmapsportal.com Use the API key with the geopy package 66
  67. 67. Introducing: The Dorling Cartogram 67
  68. 68. Social Media Analysis Framework Remember: Use the same four step process to guide data science experiments: Aspire Acquire Analyze Summarize 68
  69. 69. Exercises Follow the instructions in the "Chapter 3 (Mining LinkedIn)" notebook to create an API connection and follow along with the first few examples Download your connections as a CSV file from http://www.linkedin.com/people/ export-settings and save them to your VM A deviation from instructions in Example 3-6 is necessary for remote VMs See http://bit.ly/mtsw-ch03-helper-code Create a Bing Maps portal account and get your API key for Examples 3-8 and beyond Try clustering your contacts in Example 3-12 Try Example 3-13 (visualizing data in Google Earth) at home... 69
  70. 70. Module 5: Choice 70
  71. 71. Objectives 71 To work on "loose ends" or areas of interest from previous modules To hack on code in notebooks not yet encountered To setup the virtual machine on your own box if you haven't yet To collaborate/talk and otherwise make the most of our togetherness
  72. 72. Social Media Analysis Framework Remember: Aspire Acquire Analyze Summarize 72
  73. 73. Recommendations Setup your own development environment if you haven't already Appendix A Text Mining & Natural Language Processing Chapter 4 (Mining Google+) & Chapter 5 (Mining Web Pages) Graph Mining Chapter 7 (Mining GitHub) Analyzing Semantic Markup Chapter 8 (Mining the Semantically Marked-Up Web) 73
  74. 74. Module 6: Privacy & Ethics 74
  75. 75. 75 Know thy data, and know thyself --Matthew A. Russell
  76. 76. 76 If we have data, let’s look at data. If we have opinions, let’s go with mine --Jim Barksdale
  77. 77. 77 In God we trust. All others must bring data --W. Edwards Deming
  78. 78. Communication => Data Communication Senders humans & machines Messages natural language, images, videos, etc. Recipients humans & machines 78
  79. 79. Data Alchemy Data: Documents & document fragments (text messages, etc.) Information: "Assertions", summaries, tags, etc. Knowledge: Aggregated, queryable information Wisdom: “Compressed” knowledge Gold: Money 79
  80. 80. Machine Learning 80 A program that learns (improves) from experience (data) according to some objective Supervised learning Unsupervised learning Reinforcement learning How to do it Program mathematical models and hope for the best... How to do it well Program state-of-the-art mathematical models with sufficient representative data
  81. 81. 81 Knowledge is a process of piling up facts; wisdom lies in their simplification --Martin Fischer
  82. 82. 82 Any sufficiently advanced technology is indistinguishable from magic --Arthur C. Clarke
  83. 83. Is Privacy Already an Illusion? 83 Digital happenings circa 2014 The Cloud Social Media Deep Learning The Internet of Things Internet.org
  84. 84. 84 Civilization is the progress toward a society of privacy... -- Ayn Rand
  85. 85. 85 If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place. -- Eric Schmidt, (former) CEO of Google
  86. 86. Influences on Ethics Capitalism, economics, & marketing A for-profit corporation's fiduciary duty: To maximize the common stock's value How to do it? By transacting commerce How do it well? By advertising more effectively than competitors How to do it really well? With highly relevant personalized ads (recommenders) Terms of Service (ToS) - The legal extent of ethical obligations? 86
  87. 87. Module 7: Final Q&A; Survey 87 Survey Link: https://www.surveymonkey.com/s/pycon2014_tutorials
  88. 88. Free Stuff http://MiningTheSocialWeb.com Mining the Social Web 2E Chapter 1 (Chimera) http://bit.ly/13XgNWR Source Code (GitHub) http://bit.ly/MiningTheSocialWeb2E http://bit.ly/1fVf5ej (numbered examples) Screencasts (Vimeo) http://bit.ly/mtsw2e-screencasts 88

×