More Related Content
Similar to Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber Intelligence Tradecraft, May 2015
Similar to Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber Intelligence Tradecraft, May 2015 (20)
Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber Intelligence Tradecraft, May 2015
- 11. ©2015CarnegieMellonUniversity:11
How to Scale Up Our Ability to
Understand and Analyze?
• Research in mobility, privacy, security
– Anti-phishing
– Smartphone apps, Internet of Things
• Key themes today
– Information visualization
– Machine learning
– Wisdom of crowds
- 16. ©2015CarnegieMellonUniversity:16
How You Represent Info Matters
• What is VIII x XCI = ???
• Another example:
– Game with two players
– Alternately choose numbers between 1-15
– First to get three numbers to 15 wins
– Each number can be taken only once
- 18. ©2015CarnegieMellonUniversity:18
Other Design Principles?
1. Start out going Southwest on ELLSWORTH AVE
Towards BROADWAY by turning right.
2: Turn RIGHT onto BROADWAY.
3. Turn RIGHT onto QUINCY ST.
4. Turn LEFT onto CAMBRIDGE ST.
5. Turn SLIGHT RIGHT onto MASSACHUSETTS AVE.
6. Turn RIGHT onto RUSSELL ST.
- 43. ©2015CarnegieMellonUniversity:43
Can We Have Crowds of
People Help Analyze Data?
• What are better ways of getting
groups of people working together?
• One approach is to use lots of crowd
workers to improve accuracy and time
– Common structure in your problem?
– Can you break it down into small tasks?
- 47. ©2015CarnegieMellonUniversity:47
Wisdom of Crowds Approach
• Mechanics of PhishTank
– Submissions require at least 4 votes
and 70% agreement
– Some votes weighted more
• Total stats (Oct2006 – Feb2011)
– 1.1M URL submissions from volunteers
– 4.3M votes
– resulting in about 646k identified phish
• Why so many votes for only 646k phish?
- 49. ©2015CarnegieMellonUniversity:49
Ways of Smartening the Crowd
• Change the order URLs are shown
– Ex. most recent vs closest to completion
• Change how submissions are shown
– Ex. show one at a time or in groups
• Adjust threshold for labels
– PhishTank is 4 votes and 70%
– Ex. vote weights, algorithm also votes
• Motivating people / allocating work
– Filtering by brand, competitions,
teams of voters, leaderboards
- 60. ©2015CarnegieMellonUniversity:60
Which statement is more important?
Privacy and Security
CrowdVerify
We receive data whenever
you visit a game,
application, or website that
uses Facebook Platform …
When you sign up for
Facebook, you are required
to provide information such
as your name, email
address, birthday, and
gender.
A B
- 61. ©2015CarnegieMellonUniversity:61
Privacy and Security
CrowdVerify on Google’s Policies
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
Scores
Statement ID
Frequency Experiment Sum of Scores
Google 08
Google 13
Google 07
Google 30
Google 10
Google 12
Google 04
Google 16
Google 03
Google 11
Google 05
Google 09
Google 28
Google 02
Google 06
Google 23
Google 14
- 62. ©2015CarnegieMellonUniversity:62
Top Statement - Google
• Google 08: When you use our services
or view content provided by Google, we
may automatically collect and store
certain information in server logs. This
may include: telephony log information
like your phone number, calling-party
number, forwarding numbers, time
and date of calls, duration of calls, SMS
routing information and types of calls.
- 63. ©2015CarnegieMellonUniversity:63
Bottom 3 Statements - Google
• Google 29: Whenever you use our services,
we aim to provide you with access to your
personal information.
• Google 26: For example, you can: Take
information out of many of our services.
• Google 24: People have different privacy
concerns. Our goal is to be clear about what
information we collect, so that you can make
meaningful choices about how it is used.
- 64. ©2015CarnegieMellonUniversity:64
Designing Crowd Systems
• Motivation for contributing?
– Money, altruism, fun?
• Quality control?
– How to ensure good quality? Prevent bozos?
• How is crowd wisdom aggregated?
– How are people and computers organized?
– How are decisions made?
• Skill level required?
– Novice, intermediate, expert?
- 67. ©2015CarnegieMellonUniversity:67
DARPA Red Balloon
• 2009 challenge to find 10 red weather
balloons at unknown locations in the USA
• Winner gets $40k
• Surprise: top team wins
in < 9 hours
– Other teams did well too
– 9 balloons in 9 hours (1)
– 8 balloons (2 teams)
– 7 balloons (5 teams)
- 69. ©2015CarnegieMellonUniversity:69
DARPA Red Balloon
• MIT Team (10 balloons in less than 9 hours)
• Overall team strategy
– Get as many people as possible (over 5k people)
– Use multiple approaches to verify
• How people were organized
– Core team / recruit / spotters
• “Social engineering”
– Recursive financial structure to get more people
– Incentivizes people who might not be able to help
• Technologies used
– Surprisingly little, one centralized web site + google maps
- 74. ©2015CarnegieMellonUniversity:74
More Resources
• A Tour through the Visualization Zoo
– https://queue.acm.org/detail.cfm?id=1805128
• Interactive Dynamics for Visual Analysis
– https://queue.acm.org/detail.cfm?id=2146416
• Tableau Software
– http://www.tableau.com/
• Many Eyes
– http://www-01.ibm.com/software/analytics/many-eyes/
•
- 75. ©2015CarnegieMellonUniversity:75
More Resources
• DARPA Red Balloon Challenge
– http://cacm.acm.org/magazines/2011/4/106587-reflecting-
on-the-darpa-red-balloon-challenge/fulltext
• Pirolli and Card, The Sensemaking Process
and Leverage Points for Analyst Technology
as Identified Through Cognitive Task Analysis
• Apolo, combining data mining with infoviz
– https://www.youtube.com/watch?v=EFxYVWkj1aE