We are meant to measure and manage data with more precision than ever before using Big Data. But companies are getting Hadoopy often with little or no consideration of security. Are we taking on too much risk too fast? This session explains how best to handle the looming Big Data risk in any environment. Better predictions and more intelligent decisions are expected from our biggest data sets, yet do we really trust systems we secure the least? And do we really know why "learning" machines continue to make amusing and sometimes tragic mistakes? Infosec is in this game but with Big Data we appear to be waiting on the sidelines. What have we done about emerging vulnerabilities and threats to Hadoop as it leaves many of our traditional data paradigms behind? This presentation, based on the new book "Realities of Big Data Security" takes the audience through an overview of the hardest big data protection problem areas ahead and into our best solutions for the elephantine challenges here today.
6. Risk “Relativity” and Rules
“He says his tribe doesn’t have
a written language!”
1. Math, Stats, Comp Sci
“A Bunch of Nodes”
2. Behavior
¤ Political
¤ Social
¤ Cultural
7. Risk Mode Examples
1. Simple (Theoretical)
¤ Two Opponents
¤ Engagement Rules
2. Complex (Real)
¤ ∞ Opponents, Related
¤ Ill-defined or Guerilla Rules
Possibilities After First Move
¤ Chess 20 x 20 = 400
¤ Go 361 x 360 = 129,960
Branch Factors
¤ Chess 35
¤ Go 250
10. Induction Fallacy and Probability
Knowledge for Actionable Insights to
Inform Priorities
The wise
proportion belief
to evidence.
11. Behavioral Risk Analysis
Detect Good, Detect Bad
Good
¤ Identity
¤ Location
¤ Velocity
¤ File Execution Spawns Process
¤ Binary Modification
¤ System Call Order
¤ Arguments
Bad
(See Good)
12. 192.168.100.10
May 27, 2014
Behavioral Risk Analysis
Detect Good, Detect Bad
Davi Ottenheimer
@daviottenheimer
#13-452-353342
Galaxy 1
10.10.10.1
Ubuntu/Firefox
Good
¤ Identity
¤ Location
¤ …
13.
14. Find target height (H),
weight (W), position
(P), from level (L), at
time (T) with changed P
to P’, P’’, P’’’ over T1,
T2, T3…
15. Infrastructure
Analytics
Applications
NoSQL DB Hadoop on
Premise
NewSQL DB Cloud
ClusteringMPP DB
MonitoringGraph DB
Crowdsourcing AppDev
Data Transformation
Storage Security
Analytic Platform
BI Platform
Machine
Learning
Location, Ppl,
Events
Search
Crowdsourcing
Business
Analytics
Data Science
Unstructured
Data
Data Viz
Social Analytics
Statistical
Computing
Log Analytics
SMB
Advertising
Finance
Government
Health Security
Education
Legal
HR
Publishing Marketing
ScienceUtilities
OSS
Framework Query Access Workflow Real-Time Stats ML Deployment Search
Data
Sources
Markets / Warehouses User Services Devices/Things “Research”
Thus…The Big Data Market
16. The Difference With Big Data…
Centralized Insights for Action
Rapid Large Varied DATA LAKE for Knowledge
¤ Data Archaeology
¤ Information Harvesting
¤ Information Discovery
¤ Knowledge Extraction
¤ Knowledge Discovery
¤ Multivariate Statistics
¤ Pattern Recognition
¤ Advanced Analysis
¤ Predictive Analysis
¤ Machine Learning
20. Using Landscape & Bioclimatic Features to Predict Lion,
Leopard & Spotted Hyaena Distribution in Tanzania…
http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0096261
21. Example: Web Threat Detection
Adversary Versus Customer
¤ Velocity
¤ Sequence
¤ Origin
¤ Context
32. “No Waypoint Zones”
5 Mile Radii of Major Airports
Reducing Global Risks
DJI Drone Ground Station Blocks
33. Reducing Global Risks
60,000 Routes: Save Money, Save Lives…
¤ 400K Gal/Year Reduced by Paperless Pilot (-35lb)
¤ Data Per GE Engine: 1TB/Day
¤ Data Per Boeing 787 Flight: 500GB
http://www.spatialanalysis.ca/2011/global-connectivity-mapping-out-flight-routes/
http://www.computerweekly.com/news/2240176248/GE-uses-big-data-to-power-machine-services-business
34. We have massive amounts of data.
We know who you are.
http://bigstory.ap.org/article/airlines-promise-return-civility-fee
“We know what your history has been on the airline.
We can customize our offerings.”
“
37. Vast Majority Think They Can
Control Risk
http://pewinternet.org/Reports/2013/Anonymity-online.aspx, http://www.connecture.com/the-connecture-difference/
of Internet users have
taken steps online to
remove or mask their
digital footprints
40. “ONE CLICK” Wrong?
GOOGLE Spooks On Your Tail!
https://twitter.com/jason_kint/status/451716219482025984/photo/1
41. Example: Simple Log Analysis
Meta, Ripples, Tails, Exhausts, Waste, Shadows, etc.
“…we know estimated numbers of people served by
each waste water treatment plant, we can back-
calculate daily [drug] loads…”
- Dr Kasprzyk-Horder
1.5B gallons/day
Wastewater from
Chicago & Suburbs
¤ Environmental Risks
¤ Diseases
¤ Drugs
http://phys.org/news/2012-03-wastewater-clues-illicit-drug.html
54. The Snow Den Lesson
http://www.flyingpenguin.com/?p=18259
Source
Observation
1854: CHOLERA VORONOI
1854: GHOST MAP OF LONDON
RSAC 2012:
BREACH DATA
Dr. John Snow
1813-1858
55. “Treat’em Like Cows Not Pets”
(__)
(xx)
/-------/
/ | ||
* ||----||
^^ ^^
Systematic Treatment of Illness
1. Identify Sick ASAP
2. Keep Adequate Records
3. Evaluate Daily Sick
4. Adapt Until Noted Improvement
Easily Identified
Routine Treatment
Minimum Judgment
56. Signs of Hadoop Illness
¤ Kerberos (Randomness, Scalability)
¤ Job Ticket / Service Delegation
¤ Data Node Authority (non-ACL)
¤ API Lack of Multi-Tenancy Awareness
¤ Local Disk Map Output Access via HTTP Service
57. ClientCompute as Cows
Job Tracker Name Node
Name Node
(checkpoint)
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Slaves
Masters
HDFSMapReduce
secondary
Client
58. “We’re Not Cowputers, We’re
Physical”
“Runaway Job! Kill -9”
Job Tracker Name Node
Task
Tracker
59. REDUCEMAP
Identify Sick ASAP
Data Node
Task Tracker
HDFS
BlockData Node
Task Tracker
HDFS
Block
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
HDFS
Block
HDFS
Block
HDFS
Block
Output File
Split
Split
Split
Task
Job Tracker
JSON
RPC Read
Data
NameNode
60. Keep Adequate Records,
Evaluate Daily…
Archive
Devices
Networks
Investigate &
Analyze
Visualize
Respond
Alert &
Report
Record Sort Collect
Real Time
Data Lake
<NOUN>
• Users
• Apps
• Content
<ADJ>
• Time
• Alias
• Property
GRC
61. 1 Admin : 30,000+ Nodes
Adapt Until Noted Improvement
switch switch
name nodejob tracker name node client A
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
switch
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Rack 1 Rack 2 Rack 3 Rack 4 Rack n
secondary
switch switch
A1 A2
A3
B1
B2B3
1/2 PETABYTE
client B
62. Ethernet
Adapt Until Noted Improvement
HAWQ Zookeeper HBaseImpala
2nd NameNode
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Spark
NameNode
DataNode
Task TrackerJob Tracker
63. Ethernet
Compute Node Compute Node Compute Node
Compute NodeCompute Node Compute Node
NameNode
Adapt Until Noted Improvement
name
node
name
node
name
node
name
node
datanode
Task Tracker
Zookeeper HBaseImpala SparkHAWQ DataNode
Job Tracker