SlideShare a Scribd company logo
1 of 44
Rob Murphy
Adversarial Modeling
Graph, Machine Learning, Text Analytics and Agile DM
1 Context of Problem
2 Machine Learning
3 Graph Theory
4 Text Analytics
5 All Together (Agile / agile)
2© DataStax, All Rights Reserved.
Who am I ?
© DataStax, All Rights Reserved. 3
Rob Murphy, Vanguard Solution Architect, Datastax
rmurphy@datastax.com
• Data focused software engineer
• 3 years with DataStax
• 11+ years in Computational Science and general science
informatics
• 18+ years designing and building data driven/centric systems
• Old school Agile guy
• “Data Scientist” at heart
Where does this work come from?
© DataStax, All Rights Reserved. 4
• Thesis research
• Pre-DataStax work supporting various U.S. Federal Agencies
• Work in direct support of DataStax customers
• NO SECRET SAUCE SHARED HERE
Problem Space
It is a very very big problem space…
Identity Theft / Synthetic Identities
• 2014 and 2015 saw high-profile breaches of several retailers where tens of millions of customer
records were stolen.
• The theft of twenty one million security clearance records discovered in June of 2015 by the
U.S. Office of Personnel Management (Office of Personnel Management)
• Stolen data are bought, sold and traded actively providing enriched data sources for fraudulent
activities.
• Everything we do is online providing a de-personalized and highly efficient platform for fraud.
• Coordinated and sophisticated networks of people exist to share data, share operational
knowledge and actively coordinate efforts to subvert fraud protections in place.
© DataStax, All Rights Reserved. 6
© DataStax, All Rights Reserved. 7
Synthetic Identities
• Real identities are modified and/or
combined to form multiple synthetic
identities
• “New” identities are real enough in key
properties that they pass review of
many business and informatics
systems
“Bad Actors”
• Can be a first-person problem (they are who they are)
• Or, assumed / synthetic identities
• Difficult to detect; not all “bad actor” data is in “the system”
• Sophisticated actors have very subtle if non-existent predictive attributes
• Everyone has patterns
© DataStax, All Rights Reserved. 8
Thinking like an adversary
• Dedicated individuals and groups of individuals are actively working to identify, subvert,
avoid and exploit any logical, physical or process controls in place.
• Weaknesses in physical, system or process controls are shared and exploited en mass
• Changes to controls are recognized and behaviors modified
• Organizations that want and need to detect and prevent fraud must see some of their
customers, stakeholders or applicants as adversaries
• Think more like a bank; funds are behind lock and key with more substantial protection as
the amount grows
• To respond to and engage with adversaries, you have to be agile, capable and approach the
work understanding the purpose; to make fraudulent activities challenging to the point they
are not worth pursuing (very very big goal)
© DataStax, All Rights Reserved. 9
Assumptions of Adversarial Modeling
• Dedicated individuals and groups of individuals are actively working to identify, subvert,
avoid and exploit any logical, physical or process controls in place.
• Adversarial Modeling as a process must be grounded in data mining, data modeling and software
engineering methodologies while embracing change in the most dynamic and natural way
possible.
• Any process that creates silos around capabilities and communications adds complexity and
inefficiency to the fight.
• Data mining alone, as a technology ecosystem or focused process, will not be sufficient
when engaged with an adversary.
• Software engineering as a capability and the related processes and technologies must be part of
the larger, adversarial effort.
• One technology or tool is incapable of the sensitivity needed to quickly and proactively
identify fraudulent patterns; the adversary is committed to exploiting any opportunity and
leverage it until is it no longer an option. An ecosystem is needed in this fight.
© DataStax, All Rights Reserved. 10
Machine Learning
© DataStax, All Rights Reserved. 12
Lighting from below
Eye makeup
Eye makeup
RAGE!!!!
Attribute based thinking
Supervised Learning, Right?
• NO!!!!
• Mostly No.
• Maybe…
• Yes if you are willing to experiment with unsupervised learning derived
(“experimental”) labels and dig in.
• First lessons learned? Don’t assume anything about the problem,
explore the data first then define the technical problem.
© DataStax, All Rights Reserved. 13
Why not supervised learning?
• There are more cold or warm-start problems in this space than not.
• Data are incorrectly labeled more often than not.
• Why? There is always more fraud than you think there is.
• Supervised learning algorithms are not accurate when “fraud” and “not fraud”
look exactly the same.
• Data are many times not labeled at all.
© DataStax, All Rights Reserved. 14
Unsupervised Learning
• High-dimension data is the norm
• Exploratory Data Analysis is mandatory, you must understand the context and data
• Principal Component Analysis is your friend
• Clustering is your very best friend
• Clusters very often do not map to ‘labels’ (if they exist)
• Experimental labels generated through unsupervised learning can be incredibly useful
© DataStax, All Rights Reserved. 15
© DataStax, All Rights Reserved. 16
Visualization
• Visualization of clusters leverages a
powerful computing engine, the
human brain
• Patterns in data are often only
apparent when visualized well
Back to Supervised Learning (sometimes)
• Experimental labels facilitate a cycle of effective learning but difficult explain to process
bound organizations (government)
• Stick to human understandable algorithms for final predictions
• Tree-based algorithms
• Logistic regression
• Naïve Bayes
• “Black Box” algorithms are very effective as a guide or ‘b-team’ review
• Neural Networks
© DataStax, All Rights Reserved. 17
“Fit” of Machine Learning
• Highly effective for mature fraud detection systems / organizations (well labeled data)
• Less effective for cold and/or warm-start problems
• Require a holistic and dynamic approach to building a ‘ground truth’ of clearly and cleanly labeled
data for classification
• Absolutely requires a solid data mining approach with supportive business practices to research
and validate data mining work.
• Very important for detecting non-networked synthetic identities and “bad actors”, worth the
effort to invest in a solid data mining process
© DataStax, All Rights Reserved. 18
Graph Theory
© DataStax, All Rights Reserved. 20
G = (V, E)
Property Graph
© DataStax, All Rights Reserved. 21
Vertex
Edge
https://markorodriguez.com/2011/02/08/property-graph-algorithms/
name = Rob
Person Event
name = Cassandra Summit
year = 2016
attends
Networks mean relationships
• Coordinated fraud means networks exist
• Network detection is possible around key areas where efficiency is needed for financial
gain
• Key vertex labels, by pattern, are highly predictive
• Graph visualization provides engages the human computer in pattern detection
• Graph density coefficient (~ degree distribution)
• Community detection
© DataStax, All Rights Reserved. 22
© DataStax, All Rights Reserved. 23
© DataStax, All Rights Reserved. 24
Network Discovery
• Networks of fraud / activity are easier
to discover.
• Easily understood visually and by the
“business” subject matter experts.
• Various discovery algorithms and
patterns.
• Not rocket science!!!
g.V("{member_id=0, community_id=374707, ~label=caseApp,
group_id=1}").repeat(__.bothE().subgraph('subGraph').inV()).
times(50).cap('subGraph').next()
© DataStax, All Rights Reserved. 25
Vertex Degree
© DataStax, All Rights Reserved. 26
Text Analytics
Text Analytics (a little secret sauce?)
• Sentiment Analysis
• Classification / Categorization
• Topic extraction
• Similarity (Search)
© DataStax, All Rights Reserved. 28
Documents, form fields, narratives…
• How similar are documents from different identities?
• How similar are form fields and narratives?
• Are key features/attributes of the identity represented in the
text?
• Text becomes a “top level” entity for Machine Learning and
Graph
© DataStax, All Rights Reserved. 29
© DataStax, All Rights Reserved. 30
Cosine Similarity
• “Math” to determine how similar text is
to other text in a corpus
• Run-time computation can be
expensive if not optimized
• Produces similarity score as ideal
input to machine learning / graph
databases
© DataStax, All Rights Reserved. 31
Full-text search
• Scalable, distributed and efficient
• Cosine similarity as core ‘similarity’
driver
• Highly tunable for keywords and other
search factors
• Useful for run-time retrieval and
similarity determination
© DataStax, All Rights Reserved. 32
Text + Graph
• Document similarity to corpus
determined at ingest/runtime
• Similarity threshold determined
• High similarity score documents /
text are ‘linked’ via an edge
© DataStax, All Rights Reserved. 33
Text + ML
• Document similarity to corpus
determined at ingest/runtime
• Similarity becomes a feature and
incorporated into the data mining
process
Agile / agile
© DataStax, All Rights Reserved. 35
KDD
• Knowledge Discovery in Databases
• First widely adopted Data Mining
Process
• Waterfall with some ability to return to
previous steps
• Better suited to reporting and
traditional statistical analysis
© DataStax, All Rights Reserved. 36
CRISP-DM
• Cross Industry Standard Process for
Data Mining (CRISP-DM)
• Was published in 2000 as the output
of a group of private industry
practitioners and software engineers
from Daimler-Benz, SPSS and NCR
• Established as the de-facto process
model for data mining
(KDNuggets.com, 2014).
© DataStax, All Rights Reserved. 37
Scrum
• “Gateway Drug” for most agile teams
• Pervasive adoption
• Some haters (have to admit it)
• LOTS of tooling
• LOST of community knowledge
• WORKING PRODUCT BASED
Adversarial Modeling (needs a team!)
• Software engineering / application development skills are mandatory
• Data science skills are mandatory
• Domain knowledge skills are mandatory
• No longer the work of skill silos
• Cross functional teams bridge the skills gaps between engineering and data focused individuals
• Highly effective team-based approach
• Adversarial thinking requires rapid response times and agility
© DataStax, All Rights Reserved. 38
© DataStax, All Rights Reserved. 39
Agile – DM???
• Focus on CROSS FUNCTIONAL
TEAMS
• DEPLOYABLE “Product” ready at the
end of every iteration
• “Agility” for rapid response to changes
in Adversary's behavior
• Tool rich environment
• Can look like Kanban, XP and others.
A platform approach; ensembles on many levels
Scale, availability, flexibility…
© DataStax, All Rights Reserved. 41
DSE Graph
NetworkX
Ensemble of data “models” and tools
© DataStax, All Rights Reserved. 42
Ensemble of approaches
© DataStax, All Rights Reserved. 43
No single model…
• No single approach proved to be
wholly effective
• Graph and Text stand alone but also
greatly enrich Machine Learning
• Together, an ensemble of data
models, predictive models and
approaches proved to be highly
effective
Thank you!
Rob Murphy – rmurphy@datastax.com

More Related Content

What's hot

Building and Maintaining Bulletproof Systems with DataStax
Building and Maintaining Bulletproof Systems with DataStaxBuilding and Maintaining Bulletproof Systems with DataStax
Building and Maintaining Bulletproof Systems with DataStaxDataStax
 
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016DataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesMapR Technologies
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Data Con LA
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionDataWorks Summit
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
 
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...DataStax
 
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...DataStax
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرdatastack
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introductionFrans van Noort
 

What's hot (20)

Building and Maintaining Bulletproof Systems with DataStax
Building and Maintaining Bulletproof Systems with DataStaxBuilding and Maintaining Bulletproof Systems with DataStax
Building and Maintaining Bulletproof Systems with DataStax
 
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
 
Data Privacy at Scale
Data Privacy at ScaleData Privacy at Scale
Data Privacy at Scale
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column EncryptionProtect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
 
Big Data Platform Industrialization
Big Data Platform Industrialization Big Data Platform Industrialization
Big Data Platform Industrialization
 
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 

Viewers also liked

Webinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph DatabasesWebinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph DatabasesDataStax
 
PageRank for anomaly detection - Hadoop Summit
PageRank for anomaly detection - Hadoop SummitPageRank for anomaly detection - Hadoop Summit
PageRank for anomaly detection - Hadoop SummitOfer Mendelevitch
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...DataStax
 
The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...
The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...
The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...Daniele Gianni
 
CISummit 2013: Busting Fraud Rings - The Cases of Healthcare & Financial Serv...
CISummit 2013: Busting Fraud Rings - The Cases of Healthcare & Financial Serv...CISummit 2013: Busting Fraud Rings - The Cases of Healthcare & Financial Serv...
CISummit 2013: Busting Fraud Rings - The Cases of Healthcare & Financial Serv...Steven Wardell
 
Graph Analysis & HPC Techniques for Realizing Urban OS
Graph Analysis & HPC Techniques for Realizing Urban OSGraph Analysis & HPC Techniques for Realizing Urban OS
Graph Analysis & HPC Techniques for Realizing Urban OShisato matsuo
 
Company Deck E-Cube Energy #EnergyAnalytics #EnergyEfficiency
Company Deck E-Cube Energy #EnergyAnalytics #EnergyEfficiencyCompany Deck E-Cube Energy #EnergyAnalytics #EnergyEfficiency
Company Deck E-Cube Energy #EnergyAnalytics #EnergyEfficiencyUmesh Bhutoria
 
Black box approach- Using Technology & Data to drive Energy Efficiency Invest...
Black box approach- Using Technology & Data to drive Energy Efficiency Invest...Black box approach- Using Technology & Data to drive Energy Efficiency Invest...
Black box approach- Using Technology & Data to drive Energy Efficiency Invest...Umesh Bhutoria
 
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...DataStax
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...DataStax
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...DataStax
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesDataStax
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseMo Patel
 
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...DataStax
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Andrea Dal Pozzolo
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016DataStax
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformDataStax
 
Building Killr Applications with DSE
Building Killr Applications with DSEBuilding Killr Applications with DSE
Building Killr Applications with DSEDataStax
 

Viewers also liked (20)

Webinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph DatabasesWebinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph Databases
 
PageRank for anomaly detection - Hadoop Summit
PageRank for anomaly detection - Hadoop SummitPageRank for anomaly detection - Hadoop Summit
PageRank for anomaly detection - Hadoop Summit
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...
The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...
The DEVS-Driven Modeling Language: Syntax and Semantics Definition by Meta-Mo...
 
CISummit 2013: Busting Fraud Rings - The Cases of Healthcare & Financial Serv...
CISummit 2013: Busting Fraud Rings - The Cases of Healthcare & Financial Serv...CISummit 2013: Busting Fraud Rings - The Cases of Healthcare & Financial Serv...
CISummit 2013: Busting Fraud Rings - The Cases of Healthcare & Financial Serv...
 
Graph Analysis & HPC Techniques for Realizing Urban OS
Graph Analysis & HPC Techniques for Realizing Urban OSGraph Analysis & HPC Techniques for Realizing Urban OS
Graph Analysis & HPC Techniques for Realizing Urban OS
 
Company Deck E-Cube Energy #EnergyAnalytics #EnergyEfficiency
Company Deck E-Cube Energy #EnergyAnalytics #EnergyEfficiencyCompany Deck E-Cube Energy #EnergyAnalytics #EnergyEfficiency
Company Deck E-Cube Energy #EnergyAnalytics #EnergyEfficiency
 
Black box approach- Using Technology & Data to drive Energy Efficiency Invest...
Black box approach- Using Technology & Data to drive Energy Efficiency Invest...Black box approach- Using Technology & Data to drive Energy Efficiency Invest...
Black box approach- Using Technology & Data to drive Energy Efficiency Invest...
 
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data PlatformWebinar: Transforming Customer Experience Through an Always-On Data Platform
Webinar: Transforming Customer Experience Through an Always-On Data Platform
 
Building Killr Applications with DSE
Building Killr Applications with DSEBuilding Killr Applications with DSE
Building Killr Applications with DSE
 

Similar to Rob Murphy Adversarial Modeling Graph, ML, Text Analytics and Agile DM

ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata StrategiesDATAVERSITY
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managersNitin T Bhat
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMONeo4j
 
DiscoverText Product Overview
DiscoverText Product OverviewDiscoverText Product Overview
DiscoverText Product OverviewStuart Shulman
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarDatameer
 
How DITA Got Her Groove Back: Going Mapless with Don Day
How DITA Got Her Groove Back: Going Mapless with Don DayHow DITA Got Her Groove Back: Going Mapless with Don Day
How DITA Got Her Groove Back: Going Mapless with Don DayInformation Development World
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AIGary Allemann
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Patterns for Successful Data Science Projects (Spark AI Summit)
Patterns for Successful Data Science Projects (Spark AI Summit)Patterns for Successful Data Science Projects (Spark AI Summit)
Patterns for Successful Data Science Projects (Spark AI Summit)Bill Chambers
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causationPeter Varhol
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
Lesson 3 ai in the enterprise
Lesson 3   ai in the enterpriseLesson 3   ai in the enterprise
Lesson 3 ai in the enterpriseankit_ppt
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Ali Alkan
 

Similar to Rob Murphy Adversarial Modeling Graph, ML, Text Analytics and Agile DM (20)

ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata Strategies
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMO
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
DiscoverText Product Overview
DiscoverText Product OverviewDiscoverText Product Overview
DiscoverText Product Overview
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
How DITA Got Her Groove Back: Going Mapless with Don Day
How DITA Got Her Groove Back: Going Mapless with Don DayHow DITA Got Her Groove Back: Going Mapless with Don Day
How DITA Got Her Groove Back: Going Mapless with Don Day
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Msst 2019 v4
Msst 2019 v4Msst 2019 v4
Msst 2019 v4
 
Patterns for Successful Data Science Projects (Spark AI Summit)
Patterns for Successful Data Science Projects (Spark AI Summit)Patterns for Successful Data Science Projects (Spark AI Summit)
Patterns for Successful Data Science Projects (Spark AI Summit)
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Lesson 3 ai in the enterprise
Lesson 3   ai in the enterpriseLesson 3   ai in the enterprise
Lesson 3 ai in the enterprise
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
 

More from DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceDataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
 
Innovation Around Data and AI for Fraud Detection
Innovation Around Data and AI for Fraud DetectionInnovation Around Data and AI for Fraud Detection
Innovation Around Data and AI for Fraud DetectionDataStax
 

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 
Innovation Around Data and AI for Fraud Detection
Innovation Around Data and AI for Fraud DetectionInnovation Around Data and AI for Fraud Detection
Innovation Around Data and AI for Fraud Detection
 

Recently uploaded

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 

Recently uploaded (20)

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 

Rob Murphy Adversarial Modeling Graph, ML, Text Analytics and Agile DM

  • 1. Rob Murphy Adversarial Modeling Graph, Machine Learning, Text Analytics and Agile DM
  • 2. 1 Context of Problem 2 Machine Learning 3 Graph Theory 4 Text Analytics 5 All Together (Agile / agile) 2© DataStax, All Rights Reserved.
  • 3. Who am I ? © DataStax, All Rights Reserved. 3 Rob Murphy, Vanguard Solution Architect, Datastax rmurphy@datastax.com • Data focused software engineer • 3 years with DataStax • 11+ years in Computational Science and general science informatics • 18+ years designing and building data driven/centric systems • Old school Agile guy • “Data Scientist” at heart
  • 4. Where does this work come from? © DataStax, All Rights Reserved. 4 • Thesis research • Pre-DataStax work supporting various U.S. Federal Agencies • Work in direct support of DataStax customers • NO SECRET SAUCE SHARED HERE
  • 5. Problem Space It is a very very big problem space…
  • 6. Identity Theft / Synthetic Identities • 2014 and 2015 saw high-profile breaches of several retailers where tens of millions of customer records were stolen. • The theft of twenty one million security clearance records discovered in June of 2015 by the U.S. Office of Personnel Management (Office of Personnel Management) • Stolen data are bought, sold and traded actively providing enriched data sources for fraudulent activities. • Everything we do is online providing a de-personalized and highly efficient platform for fraud. • Coordinated and sophisticated networks of people exist to share data, share operational knowledge and actively coordinate efforts to subvert fraud protections in place. © DataStax, All Rights Reserved. 6
  • 7. © DataStax, All Rights Reserved. 7 Synthetic Identities • Real identities are modified and/or combined to form multiple synthetic identities • “New” identities are real enough in key properties that they pass review of many business and informatics systems
  • 8. “Bad Actors” • Can be a first-person problem (they are who they are) • Or, assumed / synthetic identities • Difficult to detect; not all “bad actor” data is in “the system” • Sophisticated actors have very subtle if non-existent predictive attributes • Everyone has patterns © DataStax, All Rights Reserved. 8
  • 9. Thinking like an adversary • Dedicated individuals and groups of individuals are actively working to identify, subvert, avoid and exploit any logical, physical or process controls in place. • Weaknesses in physical, system or process controls are shared and exploited en mass • Changes to controls are recognized and behaviors modified • Organizations that want and need to detect and prevent fraud must see some of their customers, stakeholders or applicants as adversaries • Think more like a bank; funds are behind lock and key with more substantial protection as the amount grows • To respond to and engage with adversaries, you have to be agile, capable and approach the work understanding the purpose; to make fraudulent activities challenging to the point they are not worth pursuing (very very big goal) © DataStax, All Rights Reserved. 9
  • 10. Assumptions of Adversarial Modeling • Dedicated individuals and groups of individuals are actively working to identify, subvert, avoid and exploit any logical, physical or process controls in place. • Adversarial Modeling as a process must be grounded in data mining, data modeling and software engineering methodologies while embracing change in the most dynamic and natural way possible. • Any process that creates silos around capabilities and communications adds complexity and inefficiency to the fight. • Data mining alone, as a technology ecosystem or focused process, will not be sufficient when engaged with an adversary. • Software engineering as a capability and the related processes and technologies must be part of the larger, adversarial effort. • One technology or tool is incapable of the sensitivity needed to quickly and proactively identify fraudulent patterns; the adversary is committed to exploiting any opportunity and leverage it until is it no longer an option. An ecosystem is needed in this fight. © DataStax, All Rights Reserved. 10
  • 12. © DataStax, All Rights Reserved. 12 Lighting from below Eye makeup Eye makeup RAGE!!!! Attribute based thinking
  • 13. Supervised Learning, Right? • NO!!!! • Mostly No. • Maybe… • Yes if you are willing to experiment with unsupervised learning derived (“experimental”) labels and dig in. • First lessons learned? Don’t assume anything about the problem, explore the data first then define the technical problem. © DataStax, All Rights Reserved. 13
  • 14. Why not supervised learning? • There are more cold or warm-start problems in this space than not. • Data are incorrectly labeled more often than not. • Why? There is always more fraud than you think there is. • Supervised learning algorithms are not accurate when “fraud” and “not fraud” look exactly the same. • Data are many times not labeled at all. © DataStax, All Rights Reserved. 14
  • 15. Unsupervised Learning • High-dimension data is the norm • Exploratory Data Analysis is mandatory, you must understand the context and data • Principal Component Analysis is your friend • Clustering is your very best friend • Clusters very often do not map to ‘labels’ (if they exist) • Experimental labels generated through unsupervised learning can be incredibly useful © DataStax, All Rights Reserved. 15
  • 16. © DataStax, All Rights Reserved. 16 Visualization • Visualization of clusters leverages a powerful computing engine, the human brain • Patterns in data are often only apparent when visualized well
  • 17. Back to Supervised Learning (sometimes) • Experimental labels facilitate a cycle of effective learning but difficult explain to process bound organizations (government) • Stick to human understandable algorithms for final predictions • Tree-based algorithms • Logistic regression • Naïve Bayes • “Black Box” algorithms are very effective as a guide or ‘b-team’ review • Neural Networks © DataStax, All Rights Reserved. 17
  • 18. “Fit” of Machine Learning • Highly effective for mature fraud detection systems / organizations (well labeled data) • Less effective for cold and/or warm-start problems • Require a holistic and dynamic approach to building a ‘ground truth’ of clearly and cleanly labeled data for classification • Absolutely requires a solid data mining approach with supportive business practices to research and validate data mining work. • Very important for detecting non-networked synthetic identities and “bad actors”, worth the effort to invest in a solid data mining process © DataStax, All Rights Reserved. 18
  • 20. © DataStax, All Rights Reserved. 20 G = (V, E)
  • 21. Property Graph © DataStax, All Rights Reserved. 21 Vertex Edge https://markorodriguez.com/2011/02/08/property-graph-algorithms/ name = Rob Person Event name = Cassandra Summit year = 2016 attends
  • 22. Networks mean relationships • Coordinated fraud means networks exist • Network detection is possible around key areas where efficiency is needed for financial gain • Key vertex labels, by pattern, are highly predictive • Graph visualization provides engages the human computer in pattern detection • Graph density coefficient (~ degree distribution) • Community detection © DataStax, All Rights Reserved. 22
  • 23. © DataStax, All Rights Reserved. 23
  • 24. © DataStax, All Rights Reserved. 24 Network Discovery • Networks of fraud / activity are easier to discover. • Easily understood visually and by the “business” subject matter experts. • Various discovery algorithms and patterns. • Not rocket science!!! g.V("{member_id=0, community_id=374707, ~label=caseApp, group_id=1}").repeat(__.bothE().subgraph('subGraph').inV()). times(50).cap('subGraph').next()
  • 25. © DataStax, All Rights Reserved. 25 Vertex Degree
  • 26. © DataStax, All Rights Reserved. 26
  • 28. Text Analytics (a little secret sauce?) • Sentiment Analysis • Classification / Categorization • Topic extraction • Similarity (Search) © DataStax, All Rights Reserved. 28
  • 29. Documents, form fields, narratives… • How similar are documents from different identities? • How similar are form fields and narratives? • Are key features/attributes of the identity represented in the text? • Text becomes a “top level” entity for Machine Learning and Graph © DataStax, All Rights Reserved. 29
  • 30. © DataStax, All Rights Reserved. 30 Cosine Similarity • “Math” to determine how similar text is to other text in a corpus • Run-time computation can be expensive if not optimized • Produces similarity score as ideal input to machine learning / graph databases
  • 31. © DataStax, All Rights Reserved. 31 Full-text search • Scalable, distributed and efficient • Cosine similarity as core ‘similarity’ driver • Highly tunable for keywords and other search factors • Useful for run-time retrieval and similarity determination
  • 32. © DataStax, All Rights Reserved. 32 Text + Graph • Document similarity to corpus determined at ingest/runtime • Similarity threshold determined • High similarity score documents / text are ‘linked’ via an edge
  • 33. © DataStax, All Rights Reserved. 33 Text + ML • Document similarity to corpus determined at ingest/runtime • Similarity becomes a feature and incorporated into the data mining process
  • 35. © DataStax, All Rights Reserved. 35 KDD • Knowledge Discovery in Databases • First widely adopted Data Mining Process • Waterfall with some ability to return to previous steps • Better suited to reporting and traditional statistical analysis
  • 36. © DataStax, All Rights Reserved. 36 CRISP-DM • Cross Industry Standard Process for Data Mining (CRISP-DM) • Was published in 2000 as the output of a group of private industry practitioners and software engineers from Daimler-Benz, SPSS and NCR • Established as the de-facto process model for data mining (KDNuggets.com, 2014).
  • 37. © DataStax, All Rights Reserved. 37 Scrum • “Gateway Drug” for most agile teams • Pervasive adoption • Some haters (have to admit it) • LOTS of tooling • LOST of community knowledge • WORKING PRODUCT BASED
  • 38. Adversarial Modeling (needs a team!) • Software engineering / application development skills are mandatory • Data science skills are mandatory • Domain knowledge skills are mandatory • No longer the work of skill silos • Cross functional teams bridge the skills gaps between engineering and data focused individuals • Highly effective team-based approach • Adversarial thinking requires rapid response times and agility © DataStax, All Rights Reserved. 38
  • 39. © DataStax, All Rights Reserved. 39 Agile – DM??? • Focus on CROSS FUNCTIONAL TEAMS • DEPLOYABLE “Product” ready at the end of every iteration • “Agility” for rapid response to changes in Adversary's behavior • Tool rich environment • Can look like Kanban, XP and others.
  • 40. A platform approach; ensembles on many levels
  • 41. Scale, availability, flexibility… © DataStax, All Rights Reserved. 41 DSE Graph NetworkX
  • 42. Ensemble of data “models” and tools © DataStax, All Rights Reserved. 42
  • 43. Ensemble of approaches © DataStax, All Rights Reserved. 43 No single model… • No single approach proved to be wholly effective • Graph and Text stand alone but also greatly enrich Machine Learning • Together, an ensemble of data models, predictive models and approaches proved to be highly effective
  • 44. Thank you! Rob Murphy – rmurphy@datastax.com

Editor's Notes

  1. Networks are what make synthetic identity fraud so effective
  2. From “The Enemy Within” Attributes = features