SlideShare a Scribd company logo
1 of 42
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Agenda
• Two kinds of security failure
– Buried treasure
• But what could go wrong?
– Horror stories
• Sharing into controlled environments
– Views, masking and fine-grained control
• Sharing without sharing
– When masking is not sufficient
• Summary
© 2014 MapR Technologies 3
Locked Up Tight – The Cheapside Hoard
• Between 1640 and 1666 somebody hid
a cache of jewels under the floor
of 30-32 Cheapside Road
• They never came back for them …
• The hoard was found by workmen in 1910
• Did the owners forget where they were?
• Why didn’t their heirs or partners recover them?
© 2014 MapR Technologies 4
The Other Kind of Security Failure
• Security can fail when there is a leak
– Enigma decryption
– Retail data compromise
– Klaus Fuchs
• Security also fails when data is not shared
– AKA siloing
– The many threads of 9/11
– The Cheapside hoard
– Invisible technological opportunity cost
© 2014 MapR Technologies 5
Netflix
• Shared anonymized data
• Huge boost in state of the art for some kinds of
recommendations
• Anonymization shown to be weak barrier
• Lawsuit, security clamp-down everywhere
© 2014 MapR Technologies 6
Reference Data Attack
Netflix
Opaque id
[{date,movie,rating}...]
IMDB
Opaque id
[{date,movie}...]
Combined
database
© 2014 MapR Technologies 7
The Moral
• If there is something to correlate, anonymization may fail
• When I say “may”, you should read “will”
© 2014 MapR Technologies 8
NY Cab
• Hack license and medallion number hashed using MD-5
• No correlation data to work with
• But cab (medallion) numbers have only a few forms
• So we can generate hashes for all 20 million (or so) medallions
© 2014 MapR Technologies 9
So What?
• What correlations are there?
• NYC medallions are public information anyway
• Taxis operate in the public realm
© 2014 MapR Technologies 10
So What?
© 2014 MapR Technologies 11
Paparrazo + Timestamp + Taxi = Who and Where
See http://gawker.com/the-public-nyc-taxicab-database-that-accidentally-track-1646724546
http://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/
© 2014 MapR Technologies 12
Extended Moral
• Correlations are more common than we thought
• Masking PII is not sufficient for public datasets
• Theoretically, no solution is possible
• Pragmatically, never bet against cleverness
• Must change the game
© 2014 MapR Technologies 13
Alternative Strategies
Public disclosure + Simple masking
Public disclosure + Simple masking
Public disclosure + Simple masking
© 2014 MapR Technologies 14
Key Elements of Masking
• Opaque or format preserving?
• Random or reversible or one-way?
• Simple omission?
• Right to be forgotten?
© 2014 MapR Technologies 15
Releasing Public Data
• Why?
– Required
– For research
– For support
• How?
– New technology based on KPI-preserving random data
• Three use cases
© 2014 MapR Technologies 16
Secure Development is Hard
System
knowledge
Observed
data
Training
algorithm
Model
New
measurements
Model
Anomaly
scores
Model
deployment
© 2014 MapR Technologies 17
Secure Development is Hard
System
knowledge
Observed
data
Training
algorithm
Model
New
measurements
Model
Anomaly
scores
Model
deployment
Outside collaborators
are outside the security
perimeter
They can’t see the data
and they can’t tune new
algorithms to fit reality
© 2014 MapR Technologies 18
How To Make Realistic Data
System
under test
Live
data
Failure
signatures
Fake
data
Failure
signatures
© 2014 MapR Technologies 19
Parametric Simulation
Match here
Live
data
System
under test
Failure
signatures
Fake
data
Failure
signatures
Fake
data
System
under test
Failure
signatures
Parametric matching of failure signatures
allows emulation of complex data properties
Matching on KPI’s and failure modes
guarantees practical fidelity
© 2014 MapR Technologies 20
The Method
• Pick realistic and important KPI’s and failure measures
– False positive rate
– Scale invariant score distribution
– Internal performance metrics (# of candidates searched, similar)
• Build emulation roughly based on real system
• Tune data spec to match KPI’s using real models
• Export data spec to alternative models
• Re-tune data spec to match on alternative models
© 2014 MapR Technologies 21
Example #1 – Query failure
• Performance index is query failure with particular stack signature
• Tuning knobs include
– Table sizes
– Data distributions
– (potentially) field value realism
– (potentially) field cross correlations
© 2014 MapR Technologies 22
The Original Conversation
Them Us
Hive broke, fix it.
© 2014 MapR Technologies 23
The Original Conversation
Them Us
Hive broke, fix it. Sure! Can I see the data?
No.
© 2014 MapR Technologies 24
The Original Conversation
Them Us
Hive broke, fix it. Sure! Can I see the data?
No. OK. Can I see the stack trace?
No.
© 2014 MapR Technologies 25
The Original Conversation
Them Us
Hive broke, fix it. Sure! Can I see the data?
No. OK. Can I see the stack trace?
No. Can I log in to the system?
No.
© 2014 MapR Technologies 26
The Original Conversation
Them Us
Hive broke, fix it. Sure! Can I see the data?
No. OK. Can I see the stack trace?
No. Can I log in to the system?
No. What do you want me to do?
Fix it.
© 2014 MapR Technologies 27
The Broken Query
© 2014 MapR Technologies 28
A Simpler Example Schema
sales
sales_id
customer_id
time_id
store_id
item_id
PK
FK
FK
FK
FK
quantity
unit_price
discount
customer
customer_idPK
name
street1
city
state
zip
time
time_idPK
year
month
time
day
quarter
store
store_idPK
name
street
city
state
zip
region
item
item_idPK
SKU
description
© 2014 MapR Technologies 29
A Simpler Example
sales
sales_id
customer_id
time_id
store_id
item_id
PK
FK
FK
FK
FK
quantity
unit_price
discount
customer
customer_idPK
name
street1
city
state
zip
time
time_idPK
year
month
time
day
quarter
store
store_idPK
name
street
city
state
zip
region
item
item_idPK
SKU
description
[
{"name":"customer_id", "class":"id"},
{"name":"name", "class":"name", "type":"first_last"},
{"name":"street", "class":"address"},
{"class":"flatten", "value": {
"class":"zip", "fields":"city,state,zip"}}
]
[
{"name":"sales_id", "class":"id"},
{"name":"customer_id", "class":"foreign-key", "size":"$customers"},
{"name":"time_id", "class":"foreign-key", "size":"$times"},
{"name":"store_id", "class":"foreign-key", "size":"$stores"},
{"name":"item_id", "class":"foreign-key", "size":"$items"},
{"name":"quantity", "class":"int", "skew":0.5},
{"name":"unit_price", "class":"gamma", "dof":1, "scale":10},
{"name":"discount", "class":"uniform", "min":0, "max":20},
{"name":"exact_time", "class":"event",
"start": "2014-01-01", "format":"yyyy-MM-dd HH:mm:ss",
"rate": "10/d"}
]
© 2014 MapR Technologies 30
Data Flow
Python:
generate.py
synth:
items
synth:
times
synth:
sales
synth:
stores
synth:
customers
csv:
items
csv:
times
csv:
sales
csv:
stores
csv:
customers
templates
© 2014 MapR Technologies 31
Sample Data
customer_id,name,street,zip,city,state
0,"Mark Long","8578 Pied River Flats","02630","BARNSTABLE","MA"
1,"Chris Lanier","90018 Lost Treasure Corner","06083","ENFIELD","CT"
2,"Bryant Brandon","30712 Bright Shadow Stroll","93922","CARMEL","CA"
3,"Norman Horn","66871 Dewy Bird Shoal","59727","DIVIDE","MT"
4,"Carmen Nowell","6053 Velvet Barn Glen","29329","CONVERSE","SC"
© 2014 MapR Technologies 32
Results
• We had to match size, number of records, rough levels of skew
• Bug was in query planner
– For particular values of relative table size, planner messed up
• Once we had the fault, we could slim down the tables
– Final example had 3 tables, 1000 records in larges
© 2014 MapR Technologies 33
Common Point of Compromise
• Scenario:
– Merchant 0 is compromised, leaks account data during compromise
– Fraud committed elsewhere during exploit
– High background level of fraud
– Limited detection rate for exploits
• Goal:
– Find merchant 0
• Meta-goal:
– Screen algorithms for this task without leaking sensitive data
© 2014 MapR Technologies 34
Simulation Setup
0 20 40 60 80 100
0100300500
day
count
Compromise period
Exploit period
compromises
frauds
© 2014 MapR Technologies 35
Simulation Strategy
• For each consumer
– Pick consumer parameters such as transaction rate, preferences
– Generate transactions until end of sim-time
• If merchant 0 during compromise time, possibly mark as compromised
• For all transactions, possible mark as fraud, probability depends on history
• Merchants are selected using hierarchical Pittman-Yor
• Restate data
– Flatten transaction streams
– Sort by time
• Tunables
– Compromise probability, transaction rates, background fraud, detection
probability
© 2014 MapR Technologies 36
Performance Indicators to Match
• User and merchant population
• Transaction count/consumer
• Merchant propensity skew
• Level of detected fraud
• Spectrum of meta-model scores
© 2014 MapR Technologies 37
© 2014 MapR Technologies 38
Real bad guys
© 2014 MapR Technologies 39
Results
• We matched general mechanism, rough transaction rates
• Model was tuned on synthetic data, tested on live data
• We found real bad guys on the first try
© 2014 MapR Technologies 40
Summary
• Security can fail through too much and
too little access
• Sharing widely can have significant benefits and
substantial risks
• New levels of control available for masking and filtering
of big data via Drill views
• Synthetic data with KPI matching provides sharing of
realistic data without risk
© 2014 MapR Technologies 41
Questions
© 2014 MapR Technologies 42
Thank You
@mapr maprtech
tdunning@mapr.com
tdunning@apache.org
Ted Dunning, ChiefApplicationArchitect
MapRTechnologies
maprtech
mapr-technologies

More Related Content

What's hot

My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveTed Dunning
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?Ted Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationTed Dunning
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and RecommendationsTed Dunning
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsMapR Technologies
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 

What's hot (20)

My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and Recommendations
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
Dunning ml-conf-2014
Dunning ml-conf-2014Dunning ml-conf-2014
Dunning ml-conf-2014
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 

Similar to Sharing Sensitive Data Securely

Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15MLconf
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsMapR Technologies
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationTed Dunning
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
Using Sequence Statistics to Fight Advanced Persistent Threats
Using Sequence Statistics to Fight Advanced Persistent ThreatsUsing Sequence Statistics to Fight Advanced Persistent Threats
Using Sequence Statistics to Fight Advanced Persistent ThreatsDataWorks Summit/Hadoop Summit
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matterDataWorks Summit
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with ChaosMapR Technologies
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With ChaosDataWorks Summit
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentDataWorks Summit
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentMapR Technologies
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadooplucenerevolution
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionDataWorks Summit
 
Chicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted DunningChicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted DunningMapR Technologies
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningJohn Mulhall
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterDataWorks Summit
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 

Similar to Sharing Sensitive Data Securely (20)

Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Using Sequence Statistics to Fight Advanced Persistent Threats
Using Sequence Statistics to Fight Advanced Persistent ThreatsUsing Sequence Statistics to Fight Advanced Persistent Threats
Using Sequence Statistics to Fight Advanced Persistent Threats
 
Polyvalent Recommendations
Polyvalent RecommendationsPolyvalent Recommendations
Polyvalent Recommendations
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matter
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with Chaos
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With Chaos
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
 
Realistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure DevelopmentRealistic Synthetic Generation Allows Secure Development
Realistic Synthetic Generation Allows Secure Development
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoop
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
 
MapR & Skytree:
MapR & Skytree: MapR & Skytree:
MapR & Skytree:
 
Chicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted DunningChicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted Dunning
 
HUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_DunningHUG_Ireland_Streaming_Ted_Dunning
HUG_Ireland_Streaming_Ted_Dunning
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 

More from Ted Dunning (9)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Recently uploaded

8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 

Recently uploaded (20)

8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 

Sharing Sensitive Data Securely

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2014 MapR Technologies 2 Agenda • Two kinds of security failure – Buried treasure • But what could go wrong? – Horror stories • Sharing into controlled environments – Views, masking and fine-grained control • Sharing without sharing – When masking is not sufficient • Summary
  • 3. © 2014 MapR Technologies 3 Locked Up Tight – The Cheapside Hoard • Between 1640 and 1666 somebody hid a cache of jewels under the floor of 30-32 Cheapside Road • They never came back for them … • The hoard was found by workmen in 1910 • Did the owners forget where they were? • Why didn’t their heirs or partners recover them?
  • 4. © 2014 MapR Technologies 4 The Other Kind of Security Failure • Security can fail when there is a leak – Enigma decryption – Retail data compromise – Klaus Fuchs • Security also fails when data is not shared – AKA siloing – The many threads of 9/11 – The Cheapside hoard – Invisible technological opportunity cost
  • 5. © 2014 MapR Technologies 5 Netflix • Shared anonymized data • Huge boost in state of the art for some kinds of recommendations • Anonymization shown to be weak barrier • Lawsuit, security clamp-down everywhere
  • 6. © 2014 MapR Technologies 6 Reference Data Attack Netflix Opaque id [{date,movie,rating}...] IMDB Opaque id [{date,movie}...] Combined database
  • 7. © 2014 MapR Technologies 7 The Moral • If there is something to correlate, anonymization may fail • When I say “may”, you should read “will”
  • 8. © 2014 MapR Technologies 8 NY Cab • Hack license and medallion number hashed using MD-5 • No correlation data to work with • But cab (medallion) numbers have only a few forms • So we can generate hashes for all 20 million (or so) medallions
  • 9. © 2014 MapR Technologies 9 So What? • What correlations are there? • NYC medallions are public information anyway • Taxis operate in the public realm
  • 10. © 2014 MapR Technologies 10 So What?
  • 11. © 2014 MapR Technologies 11 Paparrazo + Timestamp + Taxi = Who and Where See http://gawker.com/the-public-nyc-taxicab-database-that-accidentally-track-1646724546 http://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset/
  • 12. © 2014 MapR Technologies 12 Extended Moral • Correlations are more common than we thought • Masking PII is not sufficient for public datasets • Theoretically, no solution is possible • Pragmatically, never bet against cleverness • Must change the game
  • 13. © 2014 MapR Technologies 13 Alternative Strategies Public disclosure + Simple masking Public disclosure + Simple masking Public disclosure + Simple masking
  • 14. © 2014 MapR Technologies 14 Key Elements of Masking • Opaque or format preserving? • Random or reversible or one-way? • Simple omission? • Right to be forgotten?
  • 15. © 2014 MapR Technologies 15 Releasing Public Data • Why? – Required – For research – For support • How? – New technology based on KPI-preserving random data • Three use cases
  • 16. © 2014 MapR Technologies 16 Secure Development is Hard System knowledge Observed data Training algorithm Model New measurements Model Anomaly scores Model deployment
  • 17. © 2014 MapR Technologies 17 Secure Development is Hard System knowledge Observed data Training algorithm Model New measurements Model Anomaly scores Model deployment Outside collaborators are outside the security perimeter They can’t see the data and they can’t tune new algorithms to fit reality
  • 18. © 2014 MapR Technologies 18 How To Make Realistic Data System under test Live data Failure signatures Fake data Failure signatures
  • 19. © 2014 MapR Technologies 19 Parametric Simulation Match here Live data System under test Failure signatures Fake data Failure signatures Fake data System under test Failure signatures Parametric matching of failure signatures allows emulation of complex data properties Matching on KPI’s and failure modes guarantees practical fidelity
  • 20. © 2014 MapR Technologies 20 The Method • Pick realistic and important KPI’s and failure measures – False positive rate – Scale invariant score distribution – Internal performance metrics (# of candidates searched, similar) • Build emulation roughly based on real system • Tune data spec to match KPI’s using real models • Export data spec to alternative models • Re-tune data spec to match on alternative models
  • 21. © 2014 MapR Technologies 21 Example #1 – Query failure • Performance index is query failure with particular stack signature • Tuning knobs include – Table sizes – Data distributions – (potentially) field value realism – (potentially) field cross correlations
  • 22. © 2014 MapR Technologies 22 The Original Conversation Them Us Hive broke, fix it.
  • 23. © 2014 MapR Technologies 23 The Original Conversation Them Us Hive broke, fix it. Sure! Can I see the data? No.
  • 24. © 2014 MapR Technologies 24 The Original Conversation Them Us Hive broke, fix it. Sure! Can I see the data? No. OK. Can I see the stack trace? No.
  • 25. © 2014 MapR Technologies 25 The Original Conversation Them Us Hive broke, fix it. Sure! Can I see the data? No. OK. Can I see the stack trace? No. Can I log in to the system? No.
  • 26. © 2014 MapR Technologies 26 The Original Conversation Them Us Hive broke, fix it. Sure! Can I see the data? No. OK. Can I see the stack trace? No. Can I log in to the system? No. What do you want me to do? Fix it.
  • 27. © 2014 MapR Technologies 27 The Broken Query
  • 28. © 2014 MapR Technologies 28 A Simpler Example Schema sales sales_id customer_id time_id store_id item_id PK FK FK FK FK quantity unit_price discount customer customer_idPK name street1 city state zip time time_idPK year month time day quarter store store_idPK name street city state zip region item item_idPK SKU description
  • 29. © 2014 MapR Technologies 29 A Simpler Example sales sales_id customer_id time_id store_id item_id PK FK FK FK FK quantity unit_price discount customer customer_idPK name street1 city state zip time time_idPK year month time day quarter store store_idPK name street city state zip region item item_idPK SKU description [ {"name":"customer_id", "class":"id"}, {"name":"name", "class":"name", "type":"first_last"}, {"name":"street", "class":"address"}, {"class":"flatten", "value": { "class":"zip", "fields":"city,state,zip"}} ] [ {"name":"sales_id", "class":"id"}, {"name":"customer_id", "class":"foreign-key", "size":"$customers"}, {"name":"time_id", "class":"foreign-key", "size":"$times"}, {"name":"store_id", "class":"foreign-key", "size":"$stores"}, {"name":"item_id", "class":"foreign-key", "size":"$items"}, {"name":"quantity", "class":"int", "skew":0.5}, {"name":"unit_price", "class":"gamma", "dof":1, "scale":10}, {"name":"discount", "class":"uniform", "min":0, "max":20}, {"name":"exact_time", "class":"event", "start": "2014-01-01", "format":"yyyy-MM-dd HH:mm:ss", "rate": "10/d"} ]
  • 30. © 2014 MapR Technologies 30 Data Flow Python: generate.py synth: items synth: times synth: sales synth: stores synth: customers csv: items csv: times csv: sales csv: stores csv: customers templates
  • 31. © 2014 MapR Technologies 31 Sample Data customer_id,name,street,zip,city,state 0,"Mark Long","8578 Pied River Flats","02630","BARNSTABLE","MA" 1,"Chris Lanier","90018 Lost Treasure Corner","06083","ENFIELD","CT" 2,"Bryant Brandon","30712 Bright Shadow Stroll","93922","CARMEL","CA" 3,"Norman Horn","66871 Dewy Bird Shoal","59727","DIVIDE","MT" 4,"Carmen Nowell","6053 Velvet Barn Glen","29329","CONVERSE","SC"
  • 32. © 2014 MapR Technologies 32 Results • We had to match size, number of records, rough levels of skew • Bug was in query planner – For particular values of relative table size, planner messed up • Once we had the fault, we could slim down the tables – Final example had 3 tables, 1000 records in larges
  • 33. © 2014 MapR Technologies 33 Common Point of Compromise • Scenario: – Merchant 0 is compromised, leaks account data during compromise – Fraud committed elsewhere during exploit – High background level of fraud – Limited detection rate for exploits • Goal: – Find merchant 0 • Meta-goal: – Screen algorithms for this task without leaking sensitive data
  • 34. © 2014 MapR Technologies 34 Simulation Setup 0 20 40 60 80 100 0100300500 day count Compromise period Exploit period compromises frauds
  • 35. © 2014 MapR Technologies 35 Simulation Strategy • For each consumer – Pick consumer parameters such as transaction rate, preferences – Generate transactions until end of sim-time • If merchant 0 during compromise time, possibly mark as compromised • For all transactions, possible mark as fraud, probability depends on history • Merchants are selected using hierarchical Pittman-Yor • Restate data – Flatten transaction streams – Sort by time • Tunables – Compromise probability, transaction rates, background fraud, detection probability
  • 36. © 2014 MapR Technologies 36 Performance Indicators to Match • User and merchant population • Transaction count/consumer • Merchant propensity skew • Level of detected fraud • Spectrum of meta-model scores
  • 37. © 2014 MapR Technologies 37
  • 38. © 2014 MapR Technologies 38 Real bad guys
  • 39. © 2014 MapR Technologies 39 Results • We matched general mechanism, rough transaction rates • Model was tuned on synthetic data, tested on live data • We found real bad guys on the first try
  • 40. © 2014 MapR Technologies 40 Summary • Security can fail through too much and too little access • Sharing widely can have significant benefits and substantial risks • New levels of control available for masking and filtering of big data via Drill views • Synthetic data with KPI matching provides sharing of realistic data without risk
  • 41. © 2014 MapR Technologies 41 Questions
  • 42. © 2014 MapR Technologies 42 Thank You @mapr maprtech tdunning@mapr.com tdunning@apache.org Ted Dunning, ChiefApplicationArchitect MapRTechnologies maprtech mapr-technologies