SlideShare a Scribd company logo
Spark, GraphX, and blockchains: Building a
behavioral analytics platform for forensics,
fraud, and finance
Agenda
● Why the blockchain and analytics (business)
● Blockchain data complexity
● Existing challenges
● New approach
● What to look for
● Use cases
BlockCypher Background
CRYPTIV
BlockCypher: Blockchain Web Services
Our Customers
Who We Are
BlockCypher Blockchain Web Services
IDENTITY
Data Endpoint
Webhooks Websockets
New Transaction
Endpoint
Address Wallet API
Multisig Transaction
TRANSACTION
Payment Forwarding
API
Confidence Factor API
Microtransaction API
Asset API
Contract API
Address Balance
Endpoint
ANALYTICS
Analytics Engines and
Parameters
Create/Get Analytics
Job
Anomaly and Fraud
Detection
Outsource Infrastructure
BlockCypher Allows You to...
6+
months
less time
35%+
less
costs
Why the blockchain and
blockchain analytics?
Why the Blockchain?
Decentralized
Transactions
Transparency
Security
Why Blockchain Analytics?
Market
Adoption
Regulation
7x+
<2y
Why Blockchain Analytics?
Technology Challenges
● data searching
● high-speed access to
data for analytics
Blockchain Today
What makes a blockchain?
(the 5 minute version)
The Problem Statement
How do you create a:
● Useful
● Decentralized
● Resilient
network?
01- What Makes a Blockchain
A Useful Network
● Founded on Cryptography to provide concrete, resilient ownership of tokens;
● Monetary policy built into the network
● A powerful scripting language to extend capabilities further:
○ Multisignature
○ Smart Contracts
01- What Makes a Blockchain
A Decentralized Network
01- What Makes a Blockchain
A Decentralized Network
01- What Makes a Blockchain
A Decentralized Network
01- What Makes a Blockchain
A Decentralized Network
01- What Makes a Blockchain
A Decentralized Network
01- What Makes a Blockchain
A Decentralized Network
01- What Makes a Blockchain
A Resilient Network
The Byzantine Generals Problem
01- What Makes a Blockchain
Consensus… But of What?
A Global Transaction Ledger
01- What Makes a Blockchain
Input
Input
Input
Output
Output
ID: Hash
Metadata
A Global Transaction Ledger
01- What Makes a Blockchain
Input
Input
Input
Output
Output
ID: Hash
Metadata
A Global Transaction Ledger
01- What Makes a Blockchain
Input
Input
Input
Output
Output
ID: Hash
Metadata
ID: Hash (PoW)
Previous Hash
Metadata
A “Block Chain”
01- What Makes a Blockchain
ID: Hash (PoW)
Previous Hash
Metadata
ID: Hash (PoW)
Previous Hash
Metadata
Hash (PoW)
vious Hash
tadata
Verifiable, including Proof-of-Work and all
consensus rules, back to genesis
Inputs and Outputs
01- What Makes a Blockchain
Input
Input
Input
Output
Output
ID: Hash
Metadata
Input
Input
Input
Output
Output
ID: Hash
Metadata
Input
Input
Input
Output
Output
ID: Hash
Metadata
Inputs and Outputs
Outputs and inputs include script: code executed by all fully validating nodes to
determine spending authority.
01- What Makes a Blockchain
Input
Input
Input
Output
Output
ID: Hash
Metadata
Pseudonymity in Bitcoin
Ownership identifiers are addresses:
● Most commonly owned by a single entity
● Sometimes a representation of not-yet-exposed code (script)
01- What Makes a Blockchain
When are Identities not Identities?
1. HD Wallets
01- What Makes a Blockchain
k1
k2
k3
kn
A1
A2
A3
An
When are Identities not Identities?
1. HD Wallets
2. Pay to Script Hash (output bound to script):
a. Multisignature
01- What Makes a Blockchain
A1
A2
A3
OP_2 [Pubkey] [Pubkey] [Pubkey] OP_3 OP_CHECKMULTISIG
When are Identities not Identities?
1. HD Wallets
2. Pay to Script Hash (output bound to script):
a. Multisignature
01- What Makes a Blockchain
A1
A2
A3
OP_2 [Pubkey] [Pubkey] [Pubkey] OP_3 OP_CHECKMULTISIG
H(x)
Aapparent
Current Tools are Lacking
Agents still use manual investigation and traversal
Traditional tools lack the flexibility to evolve alongside bitcoin’s usage patterns and
identity distributions
Behavioral Analytics bridges the gap: focus on patterns of movements, rather than
easily-manipulated properties
02- Review of Previous Work
You need a multi-layered solution
02- Review of Previous Work
Structured
Queries
Graph
Traversal
Hand-tuned
Heuristics
Machine
Learning
Don’t discount the human element
02- Review of Previous Work
Structured
Queries
Graph
Traversal
Hand-tuned
Heuristics
Machine
Learning
Shared Spending Authority
Known wallet/service behaviors
1bones2iaA7eijMrCUYYrY5xurwZNyYqU
Building a Practical Blockchain Analytics Platform
for Forensics
1. Representing the Graph
Structured
Queries
Graph
Traversal
Hand-tuned
Heuristics
Machine
Learning
Graphs make natural blockchain representations...
Great for representing broader interactions, relationships, and trends
(plus they look cool)
03- Representing the Graph
But not all graphs are created equal.
Transformation to the Address-Address graph creates a loss of resolution and
introduces assumptions
03- Representing the Graph
This is especially true with Spark
03- Representing the Graph
Implication: Operations will
vary significantly in both
parallelism and data locality
based on edge versus
vertex property distinctions.
Different graphs serve different purposes...
Address-Address
Address-Tx
Output-Output
Input-Output
Transaction-Transaction
03- Representing the Graph
Tx1
Tx2
Tx4
Tx3
… and have different properties
03- Representing the Graph
Directed Acyclic Bipartite
Address/Address Y N N
Address/Transaction Y N Y
Input/Output Y Y Y
Output/Output Y Y N
Transaction/Transaction Y Y N
Why Spark + GraphX?
Single node graph databases great for performance re: traversal, etc.
But Spark+GraphX gives:
● One infrastructure for graphs, ML models, computation
● Access to graph properties in training, eg. in-degree
● Compute + Storage scalability
03- Representing the Graph
2. Building the ML Pipeline
Structured
Queries
Graph
Traversal
Hand-tuned
Heuristics
Machine
Learning
04- Building the ML Pipeline
What you think will be a
problem
A Venn Diagram: Starting a ML Project
What will actually be a
problem
Algorithm Selection
Scaling
Data Quality
Feature Selection
Skynet Training Set
Determination
04- Building the ML Pipeline
A simplifying recommendation
Start with unsupervised methods and iterate:
● Faster ramp-up
● Fewer parameters to tune
● Less overfitting risk
04- Building the ML Pipeline
It begins and ends with data
Data
SourcesData
SourcesData
SourcesData
Sources
Feature Extraction
Selection,
Transformation,
Scaling
Model
Training &
Application
Feature extraction is harder than it looks...
Which features to include?
Balance risk of overfitting with need to capture relevant parameters.
04- Building the ML Pipeline
… but it all comes down to Var(x)
When considering a variable, ask two questions:
● What variance does this variable introduce to the system?
● Is this variance correlated to the result I want?
04- Building the ML Pipeline
Blockchain data is only one part of a larger picture
02- Bitcoin’s Data Set
Raw Blockchain Data
100GB (est.)
6MB/hr (approx.)
Denormalization
Network and Transient
Data
Derived Data
~1TB
Big graphs need lots of tuning
● Edge/Vertex distinctions
○ Balance Locality, Parallelism
● Partition Strategies
● Data Denormalization
○ Eg. Reducing number, value of transactions between addresses to properties
03- Representing the Graph
Blockchain data has some nuance...
Global consensus follows the chain with the most work:
04- Building the ML Pipeline
Blockchain data has some nuance...
Global consensus follows the chain with the most work:
04- Building the ML Pipeline
Blockchain data has some nuance...
Global consensus follows the chain with the most work:
04- Building the ML Pipeline
… but there are solutions.
A processing delay can address virtually all rewrite cases, but hinders real-time
analytics efforts.
The answer: Lambda Architecture
04- Building the ML Pipeline
Lambda Architecture
04- Building the ML Pipeline
Real-time Input
Batching
ML
Model
ML Pipeline Model Update
Feat.
Extract
Model
Application
Data
Transform
Result
Why it works: Aggregates are slow to move...
… but what is true in the aggregate cannot predict individual measurements.
04- Building the ML Pipeline
Example: Tx Clustering and Anomaly
Detection
Lambda Architecture
05- Tx Clustering Example
Real-time Input
Batching
ML
Model
ML Pipeline Model Update
Feat.
Extract
Model
Application
Data
Transform
Result
Focus: Batching Process
05- Tx Clustering Example
Real-time Input
Batching
ML
Model
ML Pipeline Model Update
Feat.
Extract
Model
Application
Data
Transform
Result
Delay in processing addresses mutability concerns
05- Tx Clustering Example
Blockchain
Artificial
Delay
Processing
Latency
Trained Model
Coverage
Trained Model
Usage
time
Feature extraction from live node data
05- Tx Clustering Example
Input
Input
Input
Output
Output
ID: Hash
Metadata
{
~20 features, including:
● Transaction shape
(inputs, outputs)
● Value distribution
● Input types and ages
Cassandra Data store
Data is fed into a kmeans pipeline
07- Tx Clustering Example
Spark
Pull latest
Tx Batch
Tx
Feature
Extractor
Feature
Scaler
Inc.
Update
Model
Cassandra Data store
Spark 2.0 ML pipelines give you pipeline persistence
05- Tx Clustering Example
Spark
Pull latest
Tx Batch
Tx
Feature
Extractor
Feature
Scaler
Inc.
Update
Model
Cassandra Data store
Streaming pipeline reuses data components
05- Tx Clustering Example
Spark
Feature
Scaler
Tx
Feature
Extractor
Streaming input
Trained
ML Model
Cassandra Data store
Trained model used for real-time anomaly detection
05- Tx Clustering Example
Feature
Scaler
Tx
Feature
Extractor
Trained
ML Model
Streaming input
Tx Type &
Anomaly result
Cassandra Data store
Also used to color graphs for efficient traversal decisionmaking
05- Tx Clustering Example
Feature
Scaler
Tx
Feature
Extractor
Trained
ML Model
Streaming input
Transaction
Type Result
Cassandra Data store
Also used to color graphs for efficient traversal decisionmaking
05- Tx Clustering Example
Feature
Scaler
Tx
Feature
Extractor
Trained
ML Model
Streaming input
Transaction
Type Result
Cassandra Data store
Also used to color graphs for efficient traversal decisionmaking
05- Tx Clustering Example
Feature
Scaler
Tx
Feature
Extractor
Trained
ML Model
Streaming input
Transaction
Type Result
What to look for in Blockchain Analytics
❏ Infrastructure to ingest and transform metadata
❏ Multi-layered solution
❏ Graphs that address multiple purposes
❏ Resources for tuning graphs
❏ Real-time analytics (e.g. Lambda architecture)
Use Cases: Cybercrime
Ransomware $1B in 2016
Approaches
● Reactive - Follow the money
● Proactive - Anomaly detection
Source: FBI, SonicWall
Demo
Use Cases
Use Case(s) Industry
AML and financial crimes, regulatory compliance Financial Services
Provider/patient demographic data analysis, Claims fraud Healthcare
Mobile payments, Identity management Telecommunications
Supply chain analytics Manufacturing
www.blockcypher.com
@blockcypher
Data Transformation Machine Learning Modeling
Private Chain
...

More Related Content

What's hot

Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Flink Forward
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog Food
Databricks
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
DataWorks Summit/Hadoop Summit
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Databricks
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Databricks
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkKeeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache Spark
Databricks
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Spark Summit
 
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
Databricks
 
Operationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer NoriOperationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer Nori
Databricks
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks Runtime
Databricks
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko Korndorf
Spark Summit
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at Scale
Flink Forward
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Big Data Spain
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to One
Serg Masyutin
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Spark Summit
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
Toby Matejovsky
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
Rahul Kumar
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Flink Forward
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
Databricks
 

What's hot (20)

Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog Food
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
 
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkKeeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache Spark
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
 
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
A Practical Approach to Building a Streaming Processing Pipeline for an Onlin...
 
Operationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer NoriOperationalizing Machine Learning at Scale with Sameer Nori
Operationalizing Machine Learning at Scale with Sameer Nori
 
Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks Runtime
 
Spark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko KorndorfSpark Summit EU talk by Heiko Korndorf
Spark Summit EU talk by Heiko Korndorf
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at Scale
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to One
 
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
Online Security Analytics on Large Scale Video Surveillance System by Yu Cao ...
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 

Viewers also liked

Blockchain Cloudminds: Human-Machine Pooled-Mind DACs
Blockchain Cloudminds: Human-Machine Pooled-Mind DACsBlockchain Cloudminds: Human-Machine Pooled-Mind DACs
Blockchain Cloudminds: Human-Machine Pooled-Mind DACs
Melanie Swan
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
台灣資料科學年會
 
Philosophy of Deep Learning
Philosophy of Deep LearningPhilosophy of Deep Learning
Philosophy of Deep Learning
Melanie Swan
 
「資料視覺化」有志一同場次 at 2016 台灣資料科學年會
「資料視覺化」有志一同場次 at 2016 台灣資料科學年會「資料視覺化」有志一同場次 at 2016 台灣資料科學年會
「資料視覺化」有志一同場次 at 2016 台灣資料科學年會
台灣資料科學年會
 
Blockchain Economic Theory
Blockchain Economic TheoryBlockchain Economic Theory
Blockchain Economic Theory
Melanie Swan
 
Blockchain Payment Channels Explained
Blockchain Payment Channels ExplainedBlockchain Payment Channels Explained
Blockchain Payment Channels Explained
Melanie Swan
 
Artificial Intelligence & Blockchain Synergy
Artificial Intelligence & Blockchain SynergyArtificial Intelligence & Blockchain Synergy
Artificial Intelligence & Blockchain Synergy
BICA Labs
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
台灣資料科學年會
 
[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人
台灣資料科學年會
 
[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務
台灣資料科學年會
 
Blockchain Consensus Protocols
Blockchain Consensus ProtocolsBlockchain Consensus Protocols
Blockchain Consensus Protocols
Melanie Swan
 
[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法
台灣資料科學年會
 
Future of AI: Blockchain and Deep Learning
Future of AI: Blockchain and Deep LearningFuture of AI: Blockchain and Deep Learning
Future of AI: Blockchain and Deep Learning
Melanie Swan
 
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
台灣資料科學年會
 
[系列活動] Python爬蟲實戰
[系列活動] Python爬蟲實戰[系列活動] Python爬蟲實戰
[系列活動] Python爬蟲實戰
台灣資料科學年會
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
台灣資料科學年會
 
[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業
台灣資料科學年會
 
[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路
台灣資料科學年會
 
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
台灣資料科學年會
 
給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班
台灣資料科學年會
 

Viewers also liked (20)

Blockchain Cloudminds: Human-Machine Pooled-Mind DACs
Blockchain Cloudminds: Human-Machine Pooled-Mind DACsBlockchain Cloudminds: Human-Machine Pooled-Mind DACs
Blockchain Cloudminds: Human-Machine Pooled-Mind DACs
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
 
Philosophy of Deep Learning
Philosophy of Deep LearningPhilosophy of Deep Learning
Philosophy of Deep Learning
 
「資料視覺化」有志一同場次 at 2016 台灣資料科學年會
「資料視覺化」有志一同場次 at 2016 台灣資料科學年會「資料視覺化」有志一同場次 at 2016 台灣資料科學年會
「資料視覺化」有志一同場次 at 2016 台灣資料科學年會
 
Blockchain Economic Theory
Blockchain Economic TheoryBlockchain Economic Theory
Blockchain Economic Theory
 
Blockchain Payment Channels Explained
Blockchain Payment Channels ExplainedBlockchain Payment Channels Explained
Blockchain Payment Channels Explained
 
Artificial Intelligence & Blockchain Synergy
Artificial Intelligence & Blockchain SynergyArtificial Intelligence & Blockchain Synergy
Artificial Intelligence & Blockchain Synergy
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
 
[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人[系列活動] 一天搞懂對話機器人
[系列活動] 一天搞懂對話機器人
 
[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務
 
Blockchain Consensus Protocols
Blockchain Consensus ProtocolsBlockchain Consensus Protocols
Blockchain Consensus Protocols
 
[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法
 
Future of AI: Blockchain and Deep Learning
Future of AI: Blockchain and Deep LearningFuture of AI: Blockchain and Deep Learning
Future of AI: Blockchain and Deep Learning
 
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
 
[系列活動] Python爬蟲實戰
[系列活動] Python爬蟲實戰[系列活動] Python爬蟲實戰
[系列活動] Python爬蟲實戰
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業[系列活動] 使用 R 語言建立自己的演算法交易事業
[系列活動] 使用 R 語言建立自己的演算法交易事業
 
[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路[系列活動] 一日搞懂生成式對抗網路
[系列活動] 一日搞懂生成式對抗網路
 
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
 
給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班
 

Similar to Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for Forensics, Fraud, and Finance with Karen Hsu

A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
HostedbyConfluent
 
New Business Models enabled by Blockchain
New Business Models enabled by BlockchainNew Business Models enabled by Blockchain
New Business Models enabled by Blockchain
Slash
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
BATbern
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
Ambassador Labs
 
How to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, StripeHow to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, Stripe
HostedbyConfluent
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
confluent
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
HostedbyConfluent
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
Chester Chen
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
Matt Lucas
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
Rittman Analytics
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
Renzo Tomà
 
J.burke HackMiami6
J.burke HackMiami6J.burke HackMiami6
J.burke HackMiami6
Jesse Burke
 
New Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQLNew Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQL
confluent
 
IBM presents: Hyperledger Fabric Hands On Workshop - part 1
IBM presents: Hyperledger Fabric Hands On Workshop - part 1IBM presents: Hyperledger Fabric Hands On Workshop - part 1
IBM presents: Hyperledger Fabric Hands On Workshop - part 1
Grant Steinfeld
 
Microservices at ibotta pitfalls and learnings
Microservices at ibotta pitfalls and learningsMicroservices at ibotta pitfalls and learnings
Microservices at ibotta pitfalls and learnings
Matthew Reynolds
 
Monitoring Large-Scale Apache Spark Clusters at Databricks
Monitoring Large-Scale Apache Spark Clusters at DatabricksMonitoring Large-Scale Apache Spark Clusters at Databricks
Monitoring Large-Scale Apache Spark Clusters at Databricks
Anyscale
 
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2
 
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Michael Noll
 
Using bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDUsing bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-RED
Lionel Mommeja
 

Similar to Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for Forensics, Fraud, and Finance with Karen Hsu (20)

A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
 
New Business Models enabled by Blockchain
New Business Models enabled by BlockchainNew Business Models enabled by Blockchain
New Business Models enabled by Blockchain
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
 
How to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, StripeHow to mutate your immutable log | Andrey Falko, Stripe
How to mutate your immutable log | Andrey Falko, Stripe
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 
J.burke HackMiami6
J.burke HackMiami6J.burke HackMiami6
J.burke HackMiami6
 
New Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQLNew Approaches for Fraud Detection on Apache Kafka and KSQL
New Approaches for Fraud Detection on Apache Kafka and KSQL
 
IBM presents: Hyperledger Fabric Hands On Workshop - part 1
IBM presents: Hyperledger Fabric Hands On Workshop - part 1IBM presents: Hyperledger Fabric Hands On Workshop - part 1
IBM presents: Hyperledger Fabric Hands On Workshop - part 1
 
Microservices at ibotta pitfalls and learnings
Microservices at ibotta pitfalls and learningsMicroservices at ibotta pitfalls and learnings
Microservices at ibotta pitfalls and learnings
 
Monitoring Large-Scale Apache Spark Clusters at Databricks
Monitoring Large-Scale Apache Spark Clusters at DatabricksMonitoring Large-Scale Apache Spark Clusters at Databricks
Monitoring Large-Scale Apache Spark Clusters at Databricks
 
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
 
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
 
Using bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDUsing bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-RED
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 

Recently uploaded (20)

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 

Spark, GraphX, and Blockchains: Building a Behavioral Analytics Platform for Forensics, Fraud, and Finance with Karen Hsu

  • 1. Spark, GraphX, and blockchains: Building a behavioral analytics platform for forensics, fraud, and finance
  • 2. Agenda ● Why the blockchain and analytics (business) ● Blockchain data complexity ● Existing challenges ● New approach ● What to look for ● Use cases
  • 5. BlockCypher: Blockchain Web Services Our Customers Who We Are
  • 6. BlockCypher Blockchain Web Services IDENTITY Data Endpoint Webhooks Websockets New Transaction Endpoint Address Wallet API Multisig Transaction TRANSACTION Payment Forwarding API Confidence Factor API Microtransaction API Asset API Contract API Address Balance Endpoint ANALYTICS Analytics Engines and Parameters Create/Get Analytics Job Anomaly and Fraud Detection
  • 7. Outsource Infrastructure BlockCypher Allows You to... 6+ months less time 35%+ less costs
  • 8. Why the blockchain and blockchain analytics?
  • 11. Why Blockchain Analytics? Technology Challenges ● data searching ● high-speed access to data for analytics Blockchain Today
  • 12. What makes a blockchain? (the 5 minute version)
  • 13. The Problem Statement How do you create a: ● Useful ● Decentralized ● Resilient network? 01- What Makes a Blockchain
  • 14. A Useful Network ● Founded on Cryptography to provide concrete, resilient ownership of tokens; ● Monetary policy built into the network ● A powerful scripting language to extend capabilities further: ○ Multisignature ○ Smart Contracts 01- What Makes a Blockchain
  • 15. A Decentralized Network 01- What Makes a Blockchain
  • 16. A Decentralized Network 01- What Makes a Blockchain
  • 17. A Decentralized Network 01- What Makes a Blockchain
  • 18. A Decentralized Network 01- What Makes a Blockchain
  • 19. A Decentralized Network 01- What Makes a Blockchain
  • 20. A Decentralized Network 01- What Makes a Blockchain
  • 21. A Resilient Network The Byzantine Generals Problem 01- What Makes a Blockchain
  • 23. A Global Transaction Ledger 01- What Makes a Blockchain Input Input Input Output Output ID: Hash Metadata
  • 24. A Global Transaction Ledger 01- What Makes a Blockchain Input Input Input Output Output ID: Hash Metadata
  • 25. A Global Transaction Ledger 01- What Makes a Blockchain Input Input Input Output Output ID: Hash Metadata ID: Hash (PoW) Previous Hash Metadata
  • 26. A “Block Chain” 01- What Makes a Blockchain ID: Hash (PoW) Previous Hash Metadata ID: Hash (PoW) Previous Hash Metadata Hash (PoW) vious Hash tadata Verifiable, including Proof-of-Work and all consensus rules, back to genesis
  • 27. Inputs and Outputs 01- What Makes a Blockchain Input Input Input Output Output ID: Hash Metadata Input Input Input Output Output ID: Hash Metadata Input Input Input Output Output ID: Hash Metadata
  • 28. Inputs and Outputs Outputs and inputs include script: code executed by all fully validating nodes to determine spending authority. 01- What Makes a Blockchain Input Input Input Output Output ID: Hash Metadata
  • 29. Pseudonymity in Bitcoin Ownership identifiers are addresses: ● Most commonly owned by a single entity ● Sometimes a representation of not-yet-exposed code (script) 01- What Makes a Blockchain
  • 30. When are Identities not Identities? 1. HD Wallets 01- What Makes a Blockchain k1 k2 k3 kn A1 A2 A3 An
  • 31. When are Identities not Identities? 1. HD Wallets 2. Pay to Script Hash (output bound to script): a. Multisignature 01- What Makes a Blockchain A1 A2 A3 OP_2 [Pubkey] [Pubkey] [Pubkey] OP_3 OP_CHECKMULTISIG
  • 32. When are Identities not Identities? 1. HD Wallets 2. Pay to Script Hash (output bound to script): a. Multisignature 01- What Makes a Blockchain A1 A2 A3 OP_2 [Pubkey] [Pubkey] [Pubkey] OP_3 OP_CHECKMULTISIG H(x) Aapparent
  • 33. Current Tools are Lacking Agents still use manual investigation and traversal Traditional tools lack the flexibility to evolve alongside bitcoin’s usage patterns and identity distributions Behavioral Analytics bridges the gap: focus on patterns of movements, rather than easily-manipulated properties 02- Review of Previous Work
  • 34. You need a multi-layered solution 02- Review of Previous Work Structured Queries Graph Traversal Hand-tuned Heuristics Machine Learning
  • 35. Don’t discount the human element 02- Review of Previous Work Structured Queries Graph Traversal Hand-tuned Heuristics Machine Learning Shared Spending Authority Known wallet/service behaviors 1bones2iaA7eijMrCUYYrY5xurwZNyYqU
  • 36. Building a Practical Blockchain Analytics Platform for Forensics 1. Representing the Graph Structured Queries Graph Traversal Hand-tuned Heuristics Machine Learning
  • 37. Graphs make natural blockchain representations... Great for representing broader interactions, relationships, and trends (plus they look cool) 03- Representing the Graph
  • 38. But not all graphs are created equal. Transformation to the Address-Address graph creates a loss of resolution and introduces assumptions 03- Representing the Graph
  • 39. This is especially true with Spark 03- Representing the Graph Implication: Operations will vary significantly in both parallelism and data locality based on edge versus vertex property distinctions.
  • 40. Different graphs serve different purposes... Address-Address Address-Tx Output-Output Input-Output Transaction-Transaction 03- Representing the Graph Tx1 Tx2 Tx4 Tx3
  • 41. … and have different properties 03- Representing the Graph Directed Acyclic Bipartite Address/Address Y N N Address/Transaction Y N Y Input/Output Y Y Y Output/Output Y Y N Transaction/Transaction Y Y N
  • 42. Why Spark + GraphX? Single node graph databases great for performance re: traversal, etc. But Spark+GraphX gives: ● One infrastructure for graphs, ML models, computation ● Access to graph properties in training, eg. in-degree ● Compute + Storage scalability 03- Representing the Graph
  • 43. 2. Building the ML Pipeline Structured Queries Graph Traversal Hand-tuned Heuristics Machine Learning
  • 44. 04- Building the ML Pipeline What you think will be a problem A Venn Diagram: Starting a ML Project What will actually be a problem Algorithm Selection Scaling Data Quality Feature Selection Skynet Training Set Determination
  • 45. 04- Building the ML Pipeline A simplifying recommendation Start with unsupervised methods and iterate: ● Faster ramp-up ● Fewer parameters to tune ● Less overfitting risk
  • 46. 04- Building the ML Pipeline It begins and ends with data Data SourcesData SourcesData SourcesData Sources Feature Extraction Selection, Transformation, Scaling Model Training & Application
  • 47. Feature extraction is harder than it looks... Which features to include? Balance risk of overfitting with need to capture relevant parameters. 04- Building the ML Pipeline
  • 48. … but it all comes down to Var(x) When considering a variable, ask two questions: ● What variance does this variable introduce to the system? ● Is this variance correlated to the result I want? 04- Building the ML Pipeline
  • 49. Blockchain data is only one part of a larger picture 02- Bitcoin’s Data Set Raw Blockchain Data 100GB (est.) 6MB/hr (approx.) Denormalization Network and Transient Data Derived Data ~1TB
  • 50. Big graphs need lots of tuning ● Edge/Vertex distinctions ○ Balance Locality, Parallelism ● Partition Strategies ● Data Denormalization ○ Eg. Reducing number, value of transactions between addresses to properties 03- Representing the Graph
  • 51. Blockchain data has some nuance... Global consensus follows the chain with the most work: 04- Building the ML Pipeline
  • 52. Blockchain data has some nuance... Global consensus follows the chain with the most work: 04- Building the ML Pipeline
  • 53. Blockchain data has some nuance... Global consensus follows the chain with the most work: 04- Building the ML Pipeline
  • 54. … but there are solutions. A processing delay can address virtually all rewrite cases, but hinders real-time analytics efforts. The answer: Lambda Architecture 04- Building the ML Pipeline
  • 55. Lambda Architecture 04- Building the ML Pipeline Real-time Input Batching ML Model ML Pipeline Model Update Feat. Extract Model Application Data Transform Result
  • 56. Why it works: Aggregates are slow to move... … but what is true in the aggregate cannot predict individual measurements. 04- Building the ML Pipeline
  • 57. Example: Tx Clustering and Anomaly Detection
  • 58. Lambda Architecture 05- Tx Clustering Example Real-time Input Batching ML Model ML Pipeline Model Update Feat. Extract Model Application Data Transform Result
  • 59. Focus: Batching Process 05- Tx Clustering Example Real-time Input Batching ML Model ML Pipeline Model Update Feat. Extract Model Application Data Transform Result
  • 60. Delay in processing addresses mutability concerns 05- Tx Clustering Example Blockchain Artificial Delay Processing Latency Trained Model Coverage Trained Model Usage time
  • 61. Feature extraction from live node data 05- Tx Clustering Example Input Input Input Output Output ID: Hash Metadata { ~20 features, including: ● Transaction shape (inputs, outputs) ● Value distribution ● Input types and ages
  • 62. Cassandra Data store Data is fed into a kmeans pipeline 07- Tx Clustering Example Spark Pull latest Tx Batch Tx Feature Extractor Feature Scaler Inc. Update Model
  • 63. Cassandra Data store Spark 2.0 ML pipelines give you pipeline persistence 05- Tx Clustering Example Spark Pull latest Tx Batch Tx Feature Extractor Feature Scaler Inc. Update Model
  • 64. Cassandra Data store Streaming pipeline reuses data components 05- Tx Clustering Example Spark Feature Scaler Tx Feature Extractor Streaming input Trained ML Model
  • 65. Cassandra Data store Trained model used for real-time anomaly detection 05- Tx Clustering Example Feature Scaler Tx Feature Extractor Trained ML Model Streaming input Tx Type & Anomaly result
  • 66. Cassandra Data store Also used to color graphs for efficient traversal decisionmaking 05- Tx Clustering Example Feature Scaler Tx Feature Extractor Trained ML Model Streaming input Transaction Type Result
  • 67. Cassandra Data store Also used to color graphs for efficient traversal decisionmaking 05- Tx Clustering Example Feature Scaler Tx Feature Extractor Trained ML Model Streaming input Transaction Type Result
  • 68. Cassandra Data store Also used to color graphs for efficient traversal decisionmaking 05- Tx Clustering Example Feature Scaler Tx Feature Extractor Trained ML Model Streaming input Transaction Type Result
  • 69. What to look for in Blockchain Analytics ❏ Infrastructure to ingest and transform metadata ❏ Multi-layered solution ❏ Graphs that address multiple purposes ❏ Resources for tuning graphs ❏ Real-time analytics (e.g. Lambda architecture)
  • 70. Use Cases: Cybercrime Ransomware $1B in 2016 Approaches ● Reactive - Follow the money ● Proactive - Anomaly detection Source: FBI, SonicWall
  • 71. Demo
  • 72. Use Cases Use Case(s) Industry AML and financial crimes, regulatory compliance Financial Services Provider/patient demographic data analysis, Claims fraud Healthcare Mobile payments, Identity management Telecommunications Supply chain analytics Manufacturing
  • 74.
  • 75. Data Transformation Machine Learning Modeling Private Chain ...