SlideShare a Scribd company logo
1 of 68
1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Davi Ottenheimer
Senior Director of Trust, EMC
Protecting Big Data
at Scale
2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Agenda
1. Risk Context
2. Knowledge
3. Controls
4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
RISK CONTEXT
Define “Risk”
Risk “Relativity” and Rules
“He says his tribe doesn’t have
a written language!”
1. Math, Stats, Comp Sci
“A Bunch of Nodes”
2. Behavior
¤ Political
¤ Social
¤ Cultural
Risk Mode Examples
1. Simple (Theoretical)
¤ Two Opponents
¤ Engagement Rules
2. Complex (Real)
¤ ∞ Opponents, Related
¤ Ill-defined or Guerilla Rules
Possibilities After First Move
¤ Chess 20 x 20 = 400
¤ Go 361 x 360 = 129,960
Branch Factors
¤ Chess 35
¤ Go 250
Complexity As Defense
http://arxiv.org/ftp/arxiv/papers/1401/1401.6444.pdf
Structure Leads to
Knowledge
(e.g. Why Wait at Stoplights If They
Rely On Basic Structure?)
Induction Fallacy and Probability
Knowledge for Actionable Insights to
Inform Priorities
The wise
proportion belief
to evidence.
Behavioral Risk Analysis
Detect Good, Detect Bad
Good
¤ Identity
¤ Location
¤ Velocity
¤ File Execution Spawns Process
¤ Binary Modification
¤ System Call Order
¤ Arguments
Bad
(See Good)
192.168.100.10
May 27, 2014
Behavioral Risk Analysis
Detect Good, Detect Bad
Davi Ottenheimer
@daviottenheimer
#13-452-353342
Galaxy 1
10.10.10.1
Ubuntu/Firefox
Good
¤ Identity
¤ Location
¤ …
Find target height (H),
weight (W), position
(P), from level (L), at
time (T) with changed P
to P’, P’’, P’’’ over T1,
T2, T3…
Infrastructure
Analytics
Applications
NoSQL DB Hadoop on
Premise
NewSQL DB Cloud
ClusteringMPP DB
MonitoringGraph DB
Crowdsourcing AppDev
Data Transformation
Storage Security
Analytic Platform
BI Platform
Machine
Learning
Location, Ppl,
Events
Search
Crowdsourcing
Business
Analytics
Data Science
Unstructured
Data
Data Viz
Social Analytics
Statistical
Computing
Log Analytics
SMB
Advertising
Finance
Government
Health Security
Education
Legal
HR
Publishing Marketing
ScienceUtilities
OSS
Framework Query Access Workflow Real-Time Stats ML Deployment Search
Data
Sources
Markets / Warehouses User Services Devices/Things “Research”
Thus…The Big Data Market
The Difference With Big Data…
Centralized Insights for Action
Rapid Large Varied DATA LAKE for Knowledge
¤ Data Archaeology
¤ Information Harvesting
¤ Information Discovery
¤ Knowledge Extraction
¤ Knowledge Discovery
¤ Multivariate Statistics
¤ Pattern Recognition
¤ Advanced Analysis
¤ Predictive Analysis
¤ Machine Learning
KNOWLEDGE
Two
Big Data
Knowledge Stories
1
“500PB/Month by 2016”
Finding Threats in Data Lakes…1
Using Landscape & Bioclimatic Features to Predict Lion,
Leopard & Spotted Hyaena Distribution in Tanzania…
http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0096261
Example: Web Threat Detection
Adversary Versus Customer
¤ Velocity
¤ Sequence
¤ Origin
¤ Context
Example: Web Threat Detection
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3478593/
“Hyperspectral Remote Sensing Can Detect, Map and
Predict Spatial Spread of Invasive Species”
https://earthdata.nasa.gov/
The
Floods
The Fires
The Lightning
http://www.blitzortung.org/
Aggregate Blitz Bomb Census
7th Oct 1940 to 6th Jun 1941
Reducing Global Risks
Minneapolis Pedestrian Deaths
Boats = iOS
Ferry Passengers = Android
Apple
Store
Reducing Global Risks
SF and Beijing Commuters
“No Waypoint Zones”
5 Mile Radii of Major Airports
Reducing Global Risks
DJI Drone Ground Station Blocks
Reducing Global Risks
60,000 Routes: Save Money, Save Lives…
¤ 400K Gal/Year Reduced by Paperless Pilot (-35lb)
¤ Data Per GE Engine: 1TB/Day
¤ Data Per Boeing 787 Flight: 500GB
http://www.spatialanalysis.ca/2011/global-connectivity-mapping-out-flight-routes/
http://www.computerweekly.com/news/2240176248/GE-uses-big-data-to-power-machine-services-business
We have massive amounts of data.
We know who you are.
http://bigstory.ap.org/article/airlines-promise-return-civility-fee
“We know what your history has been on the airline.
We can customize our offerings.”
“
2
Two
Big Data
Knowledge Stories
http://worldoceanreview.com/en/wor-2/fisheries/illegal-fishing/
http://www.un.org/apps/news/story.asp?NewsID=44250
Stopping Threats to Data
Lakes…
2
Vast Majority Think They Can
Control Risk
http://pewinternet.org/Reports/2013/Anonymity-online.aspx, http://www.connecture.com/the-connecture-difference/
of Internet users have
taken steps online to
remove or mask their
digital footprints
Imaging
Molecular Diagnostics
Medications
Lab Tests
Family History
GeneticsMedical History
Mobile SensorsEnvironment
Clinical Narratives
…How Wrong Are They?
TAO Wrong?
“ONE CLICK” Wrong?
GOOGLE Spooks On Your Tail!
https://twitter.com/jason_kint/status/451716219482025984/photo/1
Example: Simple Log Analysis
Meta, Ripples, Tails, Exhausts, Waste, Shadows, etc.
“…we know estimated numbers of people served by
each waste water treatment plant, we can back-
calculate daily [drug] loads…”
- Dr Kasprzyk-Horder
1.5B gallons/day
Wastewater from
Chicago & Suburbs
¤ Environmental Risks
¤ Diseases
¤ Drugs
http://phys.org/news/2012-03-wastewater-clues-illicit-drug.html
http://gizmodo.com/meth-in-london-heroin-in-zagreb-the-answer-is-found-i-1508209127
http://www.thesundaytimes.co.uk/sto/news/uk_news/Health/article1409450.ece
Italy
London
FinlandCroatia
Oregon Canada
Example: Simple Log Analysis
“Cocaine so widely used it has contaminated
Britain’s drinking water”
“…infinite auto-
transcription
inevitably becomes
intrigue, drama,
and…murder”
“World Memory” Italo Calvino, 1968
http://www.kdnuggets.com/2013/11/big-data-or-bad-data-mit-event-nov-15.html
http://www.scribd.com/doc/4930515/memory-of-the-world-calvino
Draw a Simple Rule?
¤ Knowledge v Surveillance
¤ Helpful v Creepy
Risk Intelligence Maturity Scale
“No Fish in Too Clear Water”
BINARY RANKED MEANING
ZERO
POINT EXACT
INTELLIGENCEERROR MARGIN
2
Two
Big Data
Knowledge Stories
1
CONTROLS
Scale-Out Compute Environment
“Controlled” Hadoop
Distributed OS (MR -> YARN)
Distributed Store (HDFS)
Script
Pig
SQL
Hive
Tez
HCatalog
Sync
Sqoop
ODBC
REST
NoSQL
Hbase
Accumulo
GRC
Falcon
Stream
Storm
Security
Knox
Rhino
Sentry
XASec
Ops
Ambari
Zookeeper
Oozie
Flume
Search
Solr
ML
Mahout
AAA
CIA
Authentication Kerberos + “Tokens”
Authorization Some ACLs
Audit Scattered Logs
Confidentiality Go Fish
…
Cloudera:
Access, Data, Perimeter, Visibility
XA Secure:
Policy, Audit, Access, Encryption
Scale-Out Compute Environment
“Controlled” Hadoop
Security
Knox
Rhino
Sentry
XASec
AAA
CIA
user process
hdfs namenode, datanode, secondary namenode
mapred jobtracker, tasktracker, child tasks
group users
hadoop hdfs, mapred
Daemons run as single user (hadoop)
sudo -u hdfs hadoop fs -rmr /
Authentication
Scale-Out Compute Environment
“Controlled” Hadoop
1. Data Shared
2. Networks Open
3. Nodes Distributed
Scale-Out Compute Environment
“Controlled” Hadoop (Realities)
4. Web Services Open
5. Access Controls Open
6. Clients Unauthenticated
Transactional Integrity Concerns?
¤ Tweet Errors
¤ Balance Sheet Errors
http://people.cs.clemson.edu/~steve/Spiro/arianesiam.htm, http://www.vuw.ac.nz/staff/stephen_marshall/SE/Failures/SE_Ariane.html
“concern software exception is allowed, or even required, to cause
processor halt on mission-critical equipment”
The 1996 Ariane 5 Lesson
“software should be assumed to be faulty”
One-Chance to Get it Right
The Snow Den Lesson
http://www.flyingpenguin.com/?p=18259
Source
Observation
1854: CHOLERA VORONOI
1854: GHOST MAP OF LONDON
RSAC 2012:
BREACH DATA
Dr. John Snow
1813-1858
“Treat’em Like Cows Not Pets”
(__)
(xx)
/-------/
/ | ||
* ||----||
^^ ^^
Systematic Treatment of Illness
1. Identify Sick ASAP
2. Keep Adequate Records
3. Evaluate Daily Sick
4. Adapt Until Noted Improvement
Easily Identified
Routine Treatment
Minimum Judgment
Signs of Hadoop Illness
¤ Kerberos (Randomness, Scalability)
¤ Job Ticket / Service Delegation
¤ Data Node Authority (non-ACL)
¤ API Lack of Multi-Tenancy Awareness
¤ Local Disk Map Output Access via HTTP Service
ClientCompute as Cows
Job Tracker Name Node
Name Node
(checkpoint)
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Slaves
Masters
HDFSMapReduce
secondary
Client
“We’re Not Cowputers, We’re
Physical”
“Runaway Job! Kill -9”
Job Tracker Name Node
Task
Tracker
REDUCEMAP
Identify Sick ASAP
Data Node
Task Tracker
HDFS
BlockData Node
Task Tracker
HDFS
Block
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
HDFS
Block
HDFS
Block
HDFS
Block
Output File
Split
Split
Split
Task
Job Tracker
JSON
RPC Read
Data
NameNode
Keep Adequate Records,
Evaluate Daily…
Archive
Devices
Networks
Investigate &
Analyze
Visualize
Respond
Alert &
Report
Record Sort Collect
Real Time
Data Lake
<NOUN>
• Users
• Apps
• Content
<ADJ>
• Time
• Alias
• Property
GRC
1 Admin : 30,000+ Nodes
Adapt Until Noted Improvement
switch switch
name nodejob tracker name node client A
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
switch
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Data Node &
Task Tracker
Rack 1 Rack 2 Rack 3 Rack 4 Rack n
secondary
switch switch
A1 A2
A3
B1
B2B3
1/2 PETABYTE
client B
Ethernet
Adapt Until Noted Improvement
HAWQ Zookeeper HBaseImpala
2nd NameNode
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Data Node + Compute Node
Spark
NameNode
DataNode
Task TrackerJob Tracker
Ethernet
Compute Node Compute Node Compute Node
Compute NodeCompute Node Compute Node
NameNode
Adapt Until Noted Improvement
name
node
name
node
name
node
name
node
datanode
Task Tracker
Zookeeper HBaseImpala SparkHAWQ DataNode
Job Tracker
AliceBob
DataNode
NameNode
Adapt Until LAFS
name
node
name
node
name
node
name
node
datanode
Zookeeper HBaseImpala SparkHAWQ
Work~
Workb Worka
Task TrackerJob Tracker
 
S1341dfeqaas2ia1kjg3af
Homeb

Homea

¤ NoSQL (HBase) Performance
¤ Evolution
// specify which visibilities we are allowed to see
Authorizations auths = new Authorizations("public"); Scanner scan =
conn.createScanner("table", auths);
scan.setRange(new Range(“user100",“user200"));
scan.fetchFamily("attributes");
for(Entry<Key,Value> entry : scan) {
String row = entry.getKey().getRow(); Value value = entry.getValue();
}
Accumulo Cell-Level Access – Dist Key/Values
2006 – Google BigTable
2008 – NSA
2011 – accumulo.apache.org
Adapt Until Classified
Protocols
Scale-Out Control Environment
Operating
Environment
Intra-CommunicationClient/App Layer Network Layer
SingleFS/Volume
Gig-e
10 Gig-e
NFSCIFS
FT
P
HTTP
HDFS for
Hadoop
REST for
Object
Scale-Out Controls
¤ Multi-Tenancy Aware
¤ Full-ACL File Systems
¤ Kerberos Authentication
¤ High-Resilience Architecture
¤ Name Node Continuous Availability
¤ Data Protection (BC/DR, Snapshots, etc.)
¤ SEC 17a-4 Compliant WORM
Trusted Hadoop
68© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Davi Ottenheimer
Senior Director of Trust, EMC
Protecting Big Data
at Scale

More Related Content

Viewers also liked

4Developers 2015: Szybciej niż Struś Pędziwiatr - WebSockets w aplikacjach we...
4Developers 2015: Szybciej niż Struś Pędziwiatr - WebSockets w aplikacjach we...4Developers 2015: Szybciej niż Struś Pędziwiatr - WebSockets w aplikacjach we...
4Developers 2015: Szybciej niż Struś Pędziwiatr - WebSockets w aplikacjach we...PROIDEA
 
4Developers 2015: Przejrzysty i testowalny kod na Androidzie? Spróbujmy z Cle...
4Developers 2015: Przejrzysty i testowalny kod na Androidzie? Spróbujmy z Cle...4Developers 2015: Przejrzysty i testowalny kod na Androidzie? Spróbujmy z Cle...
4Developers 2015: Przejrzysty i testowalny kod na Androidzie? Spróbujmy z Cle...PROIDEA
 
PLNOG14: Zmiany w prawie konsumenckim i ochronie prywatności w 2015 r. - Artu...
PLNOG14: Zmiany w prawie konsumenckim i ochronie prywatności w 2015 r. - Artu...PLNOG14: Zmiany w prawie konsumenckim i ochronie prywatności w 2015 r. - Artu...
PLNOG14: Zmiany w prawie konsumenckim i ochronie prywatności w 2015 r. - Artu...PROIDEA
 
4Developers 2015: CQRS - Prosta architektura dla nieprostego systemu! - Mateu...
4Developers 2015: CQRS - Prosta architektura dla nieprostego systemu! - Mateu...4Developers 2015: CQRS - Prosta architektura dla nieprostego systemu! - Mateu...
4Developers 2015: CQRS - Prosta architektura dla nieprostego systemu! - Mateu...PROIDEA
 
CONFidence 2014: Arkadiusz Bolibok,Paweł Goleń: Evaluation of Transactional C...
CONFidence 2014: Arkadiusz Bolibok,Paweł Goleń: Evaluation of Transactional C...CONFidence 2014: Arkadiusz Bolibok,Paweł Goleń: Evaluation of Transactional C...
CONFidence 2014: Arkadiusz Bolibok,Paweł Goleń: Evaluation of Transactional C...PROIDEA
 
4Developers 2015: Dlaczego wybraliśmy Godot Engine dla naszych przyszłych gie...
4Developers 2015: Dlaczego wybraliśmy Godot Engine dla naszych przyszłych gie...4Developers 2015: Dlaczego wybraliśmy Godot Engine dla naszych przyszłych gie...
4Developers 2015: Dlaczego wybraliśmy Godot Engine dla naszych przyszłych gie...PROIDEA
 
PLNOG14: Jak budowaliśmy kolejną serwerownię - Sylwester Biernacki
PLNOG14: Jak budowaliśmy kolejną serwerownię - Sylwester BiernackiPLNOG14: Jak budowaliśmy kolejną serwerownię - Sylwester Biernacki
PLNOG14: Jak budowaliśmy kolejną serwerownię - Sylwester BiernackiPROIDEA
 
4Developers 2015: Parę słów o odpowiedzialności projektanta UX - Igor Farafonow
4Developers 2015: Parę słów o odpowiedzialności projektanta UX - Igor Farafonow4Developers 2015: Parę słów o odpowiedzialności projektanta UX - Igor Farafonow
4Developers 2015: Parę słów o odpowiedzialności projektanta UX - Igor FarafonowPROIDEA
 
4Developers 2015: Gamedev-grade debugging - Leszek Godlewski
4Developers 2015: Gamedev-grade debugging - Leszek Godlewski4Developers 2015: Gamedev-grade debugging - Leszek Godlewski
4Developers 2015: Gamedev-grade debugging - Leszek GodlewskiPROIDEA
 
4Developers 2015: Rozpraszanie offline aplikacji zcentralizowanej online - Łu...
4Developers 2015: Rozpraszanie offline aplikacji zcentralizowanej online - Łu...4Developers 2015: Rozpraszanie offline aplikacji zcentralizowanej online - Łu...
4Developers 2015: Rozpraszanie offline aplikacji zcentralizowanej online - Łu...PROIDEA
 
PLNOG14: Ceph w praktyce - Paweł Stefański
PLNOG14: Ceph w praktyce - Paweł StefańskiPLNOG14: Ceph w praktyce - Paweł Stefański
PLNOG14: Ceph w praktyce - Paweł StefańskiPROIDEA
 
PLNOG14: Projektowanie sieci Data Center - Tomasz Jarlaczyk
PLNOG14: Projektowanie sieci Data Center - Tomasz JarlaczykPLNOG14: Projektowanie sieci Data Center - Tomasz Jarlaczyk
PLNOG14: Projektowanie sieci Data Center - Tomasz JarlaczykPROIDEA
 
PLNOG14: Czy można żyć bez systemu ochrony przed atakami DDoS - Marek Janik
PLNOG14: Czy można żyć bez systemu ochrony przed atakami DDoS - Marek JanikPLNOG14: Czy można żyć bez systemu ochrony przed atakami DDoS - Marek Janik
PLNOG14: Czy można żyć bez systemu ochrony przed atakami DDoS - Marek JanikPROIDEA
 
4Developers 2015: Sprytniejsze testowanie kodu Java ze Spock Framework - Marc...
4Developers 2015: Sprytniejsze testowanie kodu Java ze Spock Framework - Marc...4Developers 2015: Sprytniejsze testowanie kodu Java ze Spock Framework - Marc...
4Developers 2015: Sprytniejsze testowanie kodu Java ze Spock Framework - Marc...PROIDEA
 
PLNOG14: The benefits of "OPEN" in networking for operators - Joerg Ammon, Br...
PLNOG14: The benefits of "OPEN" in networking for operators - Joerg Ammon, Br...PLNOG14: The benefits of "OPEN" in networking for operators - Joerg Ammon, Br...
PLNOG14: The benefits of "OPEN" in networking for operators - Joerg Ammon, Br...PROIDEA
 
4Developers 2015: .NET Poza VS - Jakub Gutkowski
4Developers 2015: .NET Poza VS - Jakub Gutkowski4Developers 2015: .NET Poza VS - Jakub Gutkowski
4Developers 2015: .NET Poza VS - Jakub GutkowskiPROIDEA
 
4Developers 2015: Twoja własna profesjonalna kontrolka WPF - tak jak robią to...
4Developers 2015: Twoja własna profesjonalna kontrolka WPF - tak jak robią to...4Developers 2015: Twoja własna profesjonalna kontrolka WPF - tak jak robią to...
4Developers 2015: Twoja własna profesjonalna kontrolka WPF - tak jak robią to...PROIDEA
 
New microsoft word document
New microsoft word documentNew microsoft word document
New microsoft word documentquannhung00
 
PLNOG15: How to effectively build the networks with 1.1 POPC programme? - Mar...
PLNOG15: How to effectively build the networks with 1.1 POPC programme? - Mar...PLNOG15: How to effectively build the networks with 1.1 POPC programme? - Mar...
PLNOG15: How to effectively build the networks with 1.1 POPC programme? - Mar...PROIDEA
 

Viewers also liked (20)

4Developers 2015: Szybciej niż Struś Pędziwiatr - WebSockets w aplikacjach we...
4Developers 2015: Szybciej niż Struś Pędziwiatr - WebSockets w aplikacjach we...4Developers 2015: Szybciej niż Struś Pędziwiatr - WebSockets w aplikacjach we...
4Developers 2015: Szybciej niż Struś Pędziwiatr - WebSockets w aplikacjach we...
 
4Developers 2015: Przejrzysty i testowalny kod na Androidzie? Spróbujmy z Cle...
4Developers 2015: Przejrzysty i testowalny kod na Androidzie? Spróbujmy z Cle...4Developers 2015: Przejrzysty i testowalny kod na Androidzie? Spróbujmy z Cle...
4Developers 2015: Przejrzysty i testowalny kod na Androidzie? Spróbujmy z Cle...
 
PLNOG14: Zmiany w prawie konsumenckim i ochronie prywatności w 2015 r. - Artu...
PLNOG14: Zmiany w prawie konsumenckim i ochronie prywatności w 2015 r. - Artu...PLNOG14: Zmiany w prawie konsumenckim i ochronie prywatności w 2015 r. - Artu...
PLNOG14: Zmiany w prawie konsumenckim i ochronie prywatności w 2015 r. - Artu...
 
4Developers 2015: CQRS - Prosta architektura dla nieprostego systemu! - Mateu...
4Developers 2015: CQRS - Prosta architektura dla nieprostego systemu! - Mateu...4Developers 2015: CQRS - Prosta architektura dla nieprostego systemu! - Mateu...
4Developers 2015: CQRS - Prosta architektura dla nieprostego systemu! - Mateu...
 
CONFidence 2014: Arkadiusz Bolibok,Paweł Goleń: Evaluation of Transactional C...
CONFidence 2014: Arkadiusz Bolibok,Paweł Goleń: Evaluation of Transactional C...CONFidence 2014: Arkadiusz Bolibok,Paweł Goleń: Evaluation of Transactional C...
CONFidence 2014: Arkadiusz Bolibok,Paweł Goleń: Evaluation of Transactional C...
 
4Developers 2015: Dlaczego wybraliśmy Godot Engine dla naszych przyszłych gie...
4Developers 2015: Dlaczego wybraliśmy Godot Engine dla naszych przyszłych gie...4Developers 2015: Dlaczego wybraliśmy Godot Engine dla naszych przyszłych gie...
4Developers 2015: Dlaczego wybraliśmy Godot Engine dla naszych przyszłych gie...
 
PLNOG14: Jak budowaliśmy kolejną serwerownię - Sylwester Biernacki
PLNOG14: Jak budowaliśmy kolejną serwerownię - Sylwester BiernackiPLNOG14: Jak budowaliśmy kolejną serwerownię - Sylwester Biernacki
PLNOG14: Jak budowaliśmy kolejną serwerownię - Sylwester Biernacki
 
4Developers 2015: Parę słów o odpowiedzialności projektanta UX - Igor Farafonow
4Developers 2015: Parę słów o odpowiedzialności projektanta UX - Igor Farafonow4Developers 2015: Parę słów o odpowiedzialności projektanta UX - Igor Farafonow
4Developers 2015: Parę słów o odpowiedzialności projektanta UX - Igor Farafonow
 
4Developers 2015: Gamedev-grade debugging - Leszek Godlewski
4Developers 2015: Gamedev-grade debugging - Leszek Godlewski4Developers 2015: Gamedev-grade debugging - Leszek Godlewski
4Developers 2015: Gamedev-grade debugging - Leszek Godlewski
 
4Developers 2015: Rozpraszanie offline aplikacji zcentralizowanej online - Łu...
4Developers 2015: Rozpraszanie offline aplikacji zcentralizowanej online - Łu...4Developers 2015: Rozpraszanie offline aplikacji zcentralizowanej online - Łu...
4Developers 2015: Rozpraszanie offline aplikacji zcentralizowanej online - Łu...
 
PLNOG14: Ceph w praktyce - Paweł Stefański
PLNOG14: Ceph w praktyce - Paweł StefańskiPLNOG14: Ceph w praktyce - Paweł Stefański
PLNOG14: Ceph w praktyce - Paweł Stefański
 
PLNOG14: Projektowanie sieci Data Center - Tomasz Jarlaczyk
PLNOG14: Projektowanie sieci Data Center - Tomasz JarlaczykPLNOG14: Projektowanie sieci Data Center - Tomasz Jarlaczyk
PLNOG14: Projektowanie sieci Data Center - Tomasz Jarlaczyk
 
PLNOG14: Czy można żyć bez systemu ochrony przed atakami DDoS - Marek Janik
PLNOG14: Czy można żyć bez systemu ochrony przed atakami DDoS - Marek JanikPLNOG14: Czy można żyć bez systemu ochrony przed atakami DDoS - Marek Janik
PLNOG14: Czy można żyć bez systemu ochrony przed atakami DDoS - Marek Janik
 
4Developers 2015: Sprytniejsze testowanie kodu Java ze Spock Framework - Marc...
4Developers 2015: Sprytniejsze testowanie kodu Java ze Spock Framework - Marc...4Developers 2015: Sprytniejsze testowanie kodu Java ze Spock Framework - Marc...
4Developers 2015: Sprytniejsze testowanie kodu Java ze Spock Framework - Marc...
 
PLNOG14: The benefits of "OPEN" in networking for operators - Joerg Ammon, Br...
PLNOG14: The benefits of "OPEN" in networking for operators - Joerg Ammon, Br...PLNOG14: The benefits of "OPEN" in networking for operators - Joerg Ammon, Br...
PLNOG14: The benefits of "OPEN" in networking for operators - Joerg Ammon, Br...
 
4Developers 2015: .NET Poza VS - Jakub Gutkowski
4Developers 2015: .NET Poza VS - Jakub Gutkowski4Developers 2015: .NET Poza VS - Jakub Gutkowski
4Developers 2015: .NET Poza VS - Jakub Gutkowski
 
4Developers 2015: Twoja własna profesjonalna kontrolka WPF - tak jak robią to...
4Developers 2015: Twoja własna profesjonalna kontrolka WPF - tak jak robią to...4Developers 2015: Twoja własna profesjonalna kontrolka WPF - tak jak robią to...
4Developers 2015: Twoja własna profesjonalna kontrolka WPF - tak jak robią to...
 
New microsoft word document
New microsoft word documentNew microsoft word document
New microsoft word document
 
PLNOG15: How to effectively build the networks with 1.1 POPC programme? - Mar...
PLNOG15: How to effectively build the networks with 1.1 POPC programme? - Mar...PLNOG15: How to effectively build the networks with 1.1 POPC programme? - Mar...
PLNOG15: How to effectively build the networks with 1.1 POPC programme? - Mar...
 
Top Tips to enhance your Wardrobe
Top Tips to enhance your WardrobeTop Tips to enhance your Wardrobe
Top Tips to enhance your Wardrobe
 

Similar to CONFidence 2014: Davi Ottenheimer Protecting big data at scale

InSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD
 
Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4GV prasad
 
Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4Pragati Singh
 
Artificial Intelligence for Goods: Cases and Tools
Artificial Intelligence for Goods: Cases and ToolsArtificial Intelligence for Goods: Cases and Tools
Artificial Intelligence for Goods: Cases and ToolsOleksandr Krakovetskyi
 
Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with RStephen Withington
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
OWASP Free Training - SF2014 - Keary and Manico
OWASP Free Training - SF2014 - Keary and ManicoOWASP Free Training - SF2014 - Keary and Manico
OWASP Free Training - SF2014 - Keary and ManicoEoin Keary
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and butest
 
Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018suresh sood
 
Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with RJeffrey Breen
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 
Data Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake FansData Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake FansJameel Syed
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data Sharjeel Imtiaz
 
Blue Canopy Semantic Web Approach v25 brief
Blue Canopy Semantic Web Approach v25 briefBlue Canopy Semantic Web Approach v25 brief
Blue Canopy Semantic Web Approach v25 briefNick Savage
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
Data Management - NA CACS 2009
Data Management - NA CACS 2009Data Management - NA CACS 2009
Data Management - NA CACS 2009CISA1567
 

Similar to CONFidence 2014: Davi Ottenheimer Protecting big data at scale (20)

InSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & ResponseInSTEDD: Collaboration in Disease Surveillance & Response
InSTEDD: Collaboration in Disease Surveillance & Response
 
InSTEDD HISA Conference
InSTEDD HISA ConferenceInSTEDD HISA Conference
InSTEDD HISA Conference
 
Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4
 
Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4
 
Big data tutorial
Big data tutorialBig data tutorial
Big data tutorial
 
Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4
 
Artificial Intelligence for Goods: Cases and Tools
Artificial Intelligence for Goods: Cases and ToolsArtificial Intelligence for Goods: Cases and Tools
Artificial Intelligence for Goods: Cases and Tools
 
EventShop Demo
EventShop DemoEventShop Demo
EventShop Demo
 
Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with R
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
OWASP Free Training - SF2014 - Keary and Manico
OWASP Free Training - SF2014 - Keary and ManicoOWASP Free Training - SF2014 - Keary and Manico
OWASP Free Training - SF2014 - Keary and Manico
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
 
Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018
 
Tapping the Data Deluge with R
Tapping the Data Deluge with RTapping the Data Deluge with R
Tapping the Data Deluge with R
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Data Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake FansData Science Provenance: From Drug Discovery to Fake Fans
Data Science Provenance: From Drug Discovery to Fake Fans
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data
 
Blue Canopy Semantic Web Approach v25 brief
Blue Canopy Semantic Web Approach v25 briefBlue Canopy Semantic Web Approach v25 brief
Blue Canopy Semantic Web Approach v25 brief
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
Data Management - NA CACS 2009
Data Management - NA CACS 2009Data Management - NA CACS 2009
Data Management - NA CACS 2009
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

CONFidence 2014: Davi Ottenheimer Protecting big data at scale

  • 1. 1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Davi Ottenheimer Senior Director of Trust, EMC Protecting Big Data at Scale
  • 2. 2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
  • 3. 3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Agenda 1. Risk Context 2. Knowledge 3. Controls
  • 4. 4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. RISK CONTEXT
  • 6. Risk “Relativity” and Rules “He says his tribe doesn’t have a written language!” 1. Math, Stats, Comp Sci “A Bunch of Nodes” 2. Behavior ¤ Political ¤ Social ¤ Cultural
  • 7. Risk Mode Examples 1. Simple (Theoretical) ¤ Two Opponents ¤ Engagement Rules 2. Complex (Real) ¤ ∞ Opponents, Related ¤ Ill-defined or Guerilla Rules Possibilities After First Move ¤ Chess 20 x 20 = 400 ¤ Go 361 x 360 = 129,960 Branch Factors ¤ Chess 35 ¤ Go 250
  • 9. Structure Leads to Knowledge (e.g. Why Wait at Stoplights If They Rely On Basic Structure?)
  • 10. Induction Fallacy and Probability Knowledge for Actionable Insights to Inform Priorities The wise proportion belief to evidence.
  • 11. Behavioral Risk Analysis Detect Good, Detect Bad Good ¤ Identity ¤ Location ¤ Velocity ¤ File Execution Spawns Process ¤ Binary Modification ¤ System Call Order ¤ Arguments Bad (See Good)
  • 12. 192.168.100.10 May 27, 2014 Behavioral Risk Analysis Detect Good, Detect Bad Davi Ottenheimer @daviottenheimer #13-452-353342 Galaxy 1 10.10.10.1 Ubuntu/Firefox Good ¤ Identity ¤ Location ¤ …
  • 13.
  • 14. Find target height (H), weight (W), position (P), from level (L), at time (T) with changed P to P’, P’’, P’’’ over T1, T2, T3…
  • 15. Infrastructure Analytics Applications NoSQL DB Hadoop on Premise NewSQL DB Cloud ClusteringMPP DB MonitoringGraph DB Crowdsourcing AppDev Data Transformation Storage Security Analytic Platform BI Platform Machine Learning Location, Ppl, Events Search Crowdsourcing Business Analytics Data Science Unstructured Data Data Viz Social Analytics Statistical Computing Log Analytics SMB Advertising Finance Government Health Security Education Legal HR Publishing Marketing ScienceUtilities OSS Framework Query Access Workflow Real-Time Stats ML Deployment Search Data Sources Markets / Warehouses User Services Devices/Things “Research” Thus…The Big Data Market
  • 16. The Difference With Big Data… Centralized Insights for Action Rapid Large Varied DATA LAKE for Knowledge ¤ Data Archaeology ¤ Information Harvesting ¤ Information Discovery ¤ Knowledge Extraction ¤ Knowledge Discovery ¤ Multivariate Statistics ¤ Pattern Recognition ¤ Advanced Analysis ¤ Predictive Analysis ¤ Machine Learning
  • 19. “500PB/Month by 2016” Finding Threats in Data Lakes…1
  • 20. Using Landscape & Bioclimatic Features to Predict Lion, Leopard & Spotted Hyaena Distribution in Tanzania… http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0096261
  • 21. Example: Web Threat Detection Adversary Versus Customer ¤ Velocity ¤ Sequence ¤ Origin ¤ Context
  • 22. Example: Web Threat Detection
  • 24. “Hyperspectral Remote Sensing Can Detect, Map and Predict Spatial Spread of Invasive Species” https://earthdata.nasa.gov/
  • 27. The Lightning http://www.blitzortung.org/ Aggregate Blitz Bomb Census 7th Oct 1940 to 6th Jun 1941
  • 29. Boats = iOS Ferry Passengers = Android
  • 31. Reducing Global Risks SF and Beijing Commuters
  • 32. “No Waypoint Zones” 5 Mile Radii of Major Airports Reducing Global Risks DJI Drone Ground Station Blocks
  • 33. Reducing Global Risks 60,000 Routes: Save Money, Save Lives… ¤ 400K Gal/Year Reduced by Paperless Pilot (-35lb) ¤ Data Per GE Engine: 1TB/Day ¤ Data Per Boeing 787 Flight: 500GB http://www.spatialanalysis.ca/2011/global-connectivity-mapping-out-flight-routes/ http://www.computerweekly.com/news/2240176248/GE-uses-big-data-to-power-machine-services-business
  • 34. We have massive amounts of data. We know who you are. http://bigstory.ap.org/article/airlines-promise-return-civility-fee “We know what your history has been on the airline. We can customize our offerings.” “
  • 37. Vast Majority Think They Can Control Risk http://pewinternet.org/Reports/2013/Anonymity-online.aspx, http://www.connecture.com/the-connecture-difference/ of Internet users have taken steps online to remove or mask their digital footprints
  • 38. Imaging Molecular Diagnostics Medications Lab Tests Family History GeneticsMedical History Mobile SensorsEnvironment Clinical Narratives …How Wrong Are They?
  • 40. “ONE CLICK” Wrong? GOOGLE Spooks On Your Tail! https://twitter.com/jason_kint/status/451716219482025984/photo/1
  • 41. Example: Simple Log Analysis Meta, Ripples, Tails, Exhausts, Waste, Shadows, etc. “…we know estimated numbers of people served by each waste water treatment plant, we can back- calculate daily [drug] loads…” - Dr Kasprzyk-Horder 1.5B gallons/day Wastewater from Chicago & Suburbs ¤ Environmental Risks ¤ Diseases ¤ Drugs http://phys.org/news/2012-03-wastewater-clues-illicit-drug.html
  • 43. “…infinite auto- transcription inevitably becomes intrigue, drama, and…murder” “World Memory” Italo Calvino, 1968 http://www.kdnuggets.com/2013/11/big-data-or-bad-data-mit-event-nov-15.html http://www.scribd.com/doc/4930515/memory-of-the-world-calvino
  • 44. Draw a Simple Rule? ¤ Knowledge v Surveillance ¤ Helpful v Creepy
  • 45. Risk Intelligence Maturity Scale “No Fish in Too Clear Water” BINARY RANKED MEANING ZERO POINT EXACT INTELLIGENCEERROR MARGIN
  • 48. Scale-Out Compute Environment “Controlled” Hadoop Distributed OS (MR -> YARN) Distributed Store (HDFS) Script Pig SQL Hive Tez HCatalog Sync Sqoop ODBC REST NoSQL Hbase Accumulo GRC Falcon Stream Storm Security Knox Rhino Sentry XASec Ops Ambari Zookeeper Oozie Flume Search Solr ML Mahout AAA CIA
  • 49. Authentication Kerberos + “Tokens” Authorization Some ACLs Audit Scattered Logs Confidentiality Go Fish … Cloudera: Access, Data, Perimeter, Visibility XA Secure: Policy, Audit, Access, Encryption Scale-Out Compute Environment “Controlled” Hadoop Security Knox Rhino Sentry XASec AAA CIA
  • 50. user process hdfs namenode, datanode, secondary namenode mapred jobtracker, tasktracker, child tasks group users hadoop hdfs, mapred Daemons run as single user (hadoop) sudo -u hdfs hadoop fs -rmr / Authentication Scale-Out Compute Environment “Controlled” Hadoop
  • 51. 1. Data Shared 2. Networks Open 3. Nodes Distributed Scale-Out Compute Environment “Controlled” Hadoop (Realities) 4. Web Services Open 5. Access Controls Open 6. Clients Unauthenticated
  • 52. Transactional Integrity Concerns? ¤ Tweet Errors ¤ Balance Sheet Errors
  • 53. http://people.cs.clemson.edu/~steve/Spiro/arianesiam.htm, http://www.vuw.ac.nz/staff/stephen_marshall/SE/Failures/SE_Ariane.html “concern software exception is allowed, or even required, to cause processor halt on mission-critical equipment” The 1996 Ariane 5 Lesson “software should be assumed to be faulty” One-Chance to Get it Right
  • 54. The Snow Den Lesson http://www.flyingpenguin.com/?p=18259 Source Observation 1854: CHOLERA VORONOI 1854: GHOST MAP OF LONDON RSAC 2012: BREACH DATA Dr. John Snow 1813-1858
  • 55. “Treat’em Like Cows Not Pets” (__) (xx) /-------/ / | || * ||----|| ^^ ^^ Systematic Treatment of Illness 1. Identify Sick ASAP 2. Keep Adequate Records 3. Evaluate Daily Sick 4. Adapt Until Noted Improvement Easily Identified Routine Treatment Minimum Judgment
  • 56. Signs of Hadoop Illness ¤ Kerberos (Randomness, Scalability) ¤ Job Ticket / Service Delegation ¤ Data Node Authority (non-ACL) ¤ API Lack of Multi-Tenancy Awareness ¤ Local Disk Map Output Access via HTTP Service
  • 57. ClientCompute as Cows Job Tracker Name Node Name Node (checkpoint) Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Slaves Masters HDFSMapReduce secondary Client
  • 58. “We’re Not Cowputers, We’re Physical” “Runaway Job! Kill -9” Job Tracker Name Node Task Tracker
  • 59. REDUCEMAP Identify Sick ASAP Data Node Task Tracker HDFS BlockData Node Task Tracker HDFS Block Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker HDFS Block HDFS Block HDFS Block Output File Split Split Split Task Job Tracker JSON RPC Read Data NameNode
  • 60. Keep Adequate Records, Evaluate Daily… Archive Devices Networks Investigate & Analyze Visualize Respond Alert & Report Record Sort Collect Real Time Data Lake <NOUN> • Users • Apps • Content <ADJ> • Time • Alias • Property GRC
  • 61. 1 Admin : 30,000+ Nodes Adapt Until Noted Improvement switch switch name nodejob tracker name node client A Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker switch Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker Rack 1 Rack 2 Rack 3 Rack 4 Rack n secondary switch switch A1 A2 A3 B1 B2B3 1/2 PETABYTE client B
  • 62. Ethernet Adapt Until Noted Improvement HAWQ Zookeeper HBaseImpala 2nd NameNode Data Node + Compute Node Data Node + Compute Node Data Node + Compute Node Data Node + Compute Node Data Node + Compute Node Data Node + Compute Node Spark NameNode DataNode Task TrackerJob Tracker
  • 63. Ethernet Compute Node Compute Node Compute Node Compute NodeCompute Node Compute Node NameNode Adapt Until Noted Improvement name node name node name node name node datanode Task Tracker Zookeeper HBaseImpala SparkHAWQ DataNode Job Tracker
  • 64. AliceBob DataNode NameNode Adapt Until LAFS name node name node name node name node datanode Zookeeper HBaseImpala SparkHAWQ Work~ Workb Worka Task TrackerJob Tracker   S1341dfeqaas2ia1kjg3af Homeb  Homea 
  • 65. ¤ NoSQL (HBase) Performance ¤ Evolution // specify which visibilities we are allowed to see Authorizations auths = new Authorizations("public"); Scanner scan = conn.createScanner("table", auths); scan.setRange(new Range(“user100",“user200")); scan.fetchFamily("attributes"); for(Entry<Key,Value> entry : scan) { String row = entry.getKey().getRow(); Value value = entry.getValue(); } Accumulo Cell-Level Access – Dist Key/Values 2006 – Google BigTable 2008 – NSA 2011 – accumulo.apache.org Adapt Until Classified
  • 66. Protocols Scale-Out Control Environment Operating Environment Intra-CommunicationClient/App Layer Network Layer SingleFS/Volume Gig-e 10 Gig-e NFSCIFS FT P HTTP HDFS for Hadoop REST for Object
  • 67. Scale-Out Controls ¤ Multi-Tenancy Aware ¤ Full-ACL File Systems ¤ Kerberos Authentication ¤ High-Resilience Architecture ¤ Name Node Continuous Availability ¤ Data Protection (BC/DR, Snapshots, etc.) ¤ SEC 17a-4 Compliant WORM Trusted Hadoop
  • 68. 68© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Davi Ottenheimer Senior Director of Trust, EMC Protecting Big Data at Scale