OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Analyzing Blockchain and Bitcoin
Transaction Data as Graph
Oracle Code | 2018-06-12 | Funkhaus Berlin
Karin Patenge |  karin.patenge@oracle.com
Business Development Manager Technology
Oracle Deutschland B.V. & Co. KG
Hans Viehmann |  hans.viehmann@oracle.com
Product Manager Spatial and Graph Technologies
Oracle Corporation

• This presentation is based on the works of:
• Zhe (Alan) Wu
• Architect for Graph and Semantic
Technologies @ Oracle Corporation
• Email: alan.wu@oracle.com
Acknowledgement
@kpatenge @alanzwu @SpatialHannes

Agenda
• Modeling of Bitcoin Transactions
• Questions of Interest
• Data Processing Workflow
• Summary
• Q&A

Setting the Scene: Analyze Bitcoin Transaction Data

Setting the Scene: Interesting Patterns in Bitcoin Transaction
Data
Source: http://blockchain.info

What does a Bitcoin Transaction look like?
• A transaction has input(s) and output(s)
– An input comes from an output of a(nother) transaction
TX hash: 6f7cf9580f1c2dfb3c4d5d043cdbb128c640e3f20161245aa7372e9666168516
TX outputSum : 10000000000
-- TX Input from: ff3dc8b461305acc5900d31602f2dafebfc406e5b050b14a352294f0965e0bf6:0
-- TX Input from: 2db69558056d0132d9848851fd20329be9cd590fa5ae2b3c55f58931f42e27f7:0
-- TX Output value: 10000000000
-- TX Output scriPubAddr: 12higDjoCCNXSA95xZMWUdPvXNmkAduhWv
Note: 1,000,000 is 0.01 BTC

What does a Bitcoin Transaction look like?
• A transaction has input(s) and output(s)
–An input comes from an output of a(nother) transaction
TX9
TX1
TX8
TX3
Addr X
Addr K
Addr L
Addr Y
Addr Z
$
$
$
$ $
$
$
$

What does a Graph look like?
• A graph has vertices (entities), edges (relationships), and properties
–Also known as linked data
TX9
TX1
TX8
TX3
Addr X
Addr K
Addr L
Addr Y
Addr Z
$
$
$
$ $
$
$
$

• Model 1
– Vertices: Transaction, Address
– Edges: Transaction references
(TX  TX, TX  Addr)
• Model 2
– Vertices: Transaction, Address
– Edges: Transaction‘s indirect
reference to Address
(Addr  TX  Addr)
• Model 3
– Vertices: Address
– Edges: Address to Address
payment (Addr  Addr)
Modeling Bitcoin Transactions as a Graph
TX
9
TX
1
TX
8
TX
3
Addr
X
Addr
K
Addr
L
Addr
Y
Addr
Z
$
$
$
$ $
$
$
$
TX
9
TX
1
TX
8
TX
3
Addr
X
Addr
K
Addr
L
Addr
Y
Addr
Z
$
$
$
$ $
$
$
$
TX
9
TX
1
TX
8
TX
3
Addr
X
Addr
K
Addr
L
Addr
Y
Addr
Z
$
$
$
$ $
$
$
$

• Graph Model 3
–What is Addr X´s contribution to
Addr K?
– Given an input address i, output
address o
-> Contribution of i to o is:
Bitcoin Transactions as a Graph: Money Flow
TX9
TX1
TX8
TX3
Addr X
Addr K
Addr L
Addr Y
Addr Z
$
$
$
$
$
$
$
$
o
i i
i
Amount
Amount
Amount
•


Functions of a Graph Database
Bitcoin Transactions as a Graph: Workflow
Graph
Generation
& Loading
Data
Preparation
Graph
Querying &
Analysis
Graph
Visualization
Retrieving
& Parsing
Data

Modeling Data as Graphs
The more connected the data is, the better a Graph fits
Oracle NoSQL DB with Big Data Spatial and GraphGraphic source: http://www.ateam-oracle.com/intro-to-graphs-at-oracle/

• A set of nodes (aka vertices)
– each vertex has a unique identifier
– each vertex has a set of in/out edges
– each vertex has a collection of key-value
properties
• A set of edges
– each edge has a unique identifier
– each edge has a head/tail vertex
– each edge has a label denoting type of
relationship between two vertices
– each edge has a collection of key-value properties
• Blueprints Java APIs
• Implementations
– Oracle (Spatial and Graph, Big Data Spatial and
Graph), Neo4j, DataStax (Titan), InfiniteGraph,
Dex, Sail, MongoDB, …
What is a Property Graph?
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model

Property Graph Support
Graph Data Access Layer (DAL)
Graph Analytics
Blueprints & Lucene/SolrCloud RDF (RDF/XML, N-
Triples, N-Quads,
TriG,N3,JSON)
REST/Web
Service/Notebooks
Java,Groovy,Python,…
Java APIs
Java APIs/JDBC/SQL/PLSQL
Property Graph
formats
GraphML
GML
GraphSON
Flat FilesScalable and Persistent Storage Management
Parallel In-Memory Graph
Analytics (PGX) /
Graph Querying (PGQL)
Oracle NoSQL
Database
Oracle RDBMS Apache HBase
Apache
Spark

Demo Environment
• Available for free:
Oracle Big Data Lite VM 4.11 running in Oracle VirtualBox
– Oracle NoSQL Database (kvlite: unclustered -> 1 node, no replication)
– Big Data Spatial and Graph (BDSG) 2.4
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
• Property Graph Analytics Engine (PGX), Property Graph Query Language (PGQL)
• Gremlin, Apache Groovy (Shell)
• Zeppelin Notebook with PGX Interpreter
– Property Graph Format
• Oracle Flat Files
– Cytoscape 3.6.0
• Big Data Spatial and Graph 2.4 support installed

Definition Bitcoin transaction data sample
[oracle@bigdatalite data]$ head –n 5 btc.opv
1,bt_addr,1,1111111111111111111114oLvT2,,
2,bt_addr,1,11126yHiXjavR3oNVwV2GRNso2ah4MnZtm,,
3,bt_addr,1,11128BtJwtyW4q9eRe3zts6BB4jg4uKLv8,,
4,bt_addr,1,111HnjYiCubyhPjtmZ7jEQjYcYBpKZHvJ,,
5,bt_addr,1,111KHWctzJ8tsTbittCDVzmTHVjxQR2g4,,
[oracle@bigdatalite data]$
Oracle Flat File Format: Vertices
Field # Name Description
1 vertex_ID An integer that uniquely identifies the
vertex
2 key_name The name of the key in the key-value pair
3 value_type 1=String, 2=Integer, 3=Float, ...
4 value The encoded, non-null value of key_name
when it is neither numeric nor date
5 value The encoded, non-null value of key_name
when it is numeric
6 value The encoded, nonnull value of key_name
when it is a timestamp (date)
Source: http://blockchain.info

Definition Bitcoin transaction data sample
[oracle@bigdatalite data]$ head –n 5 btc.ope
1,317335,91594,contrib,trans_hash,1,4391b11d991e7c9ad4
f9a1a5a7ea9ed7f234643b0c883f49511e1394a5ab8ff5,,
1,317335,91594,contrib,amount,3,,5.0E9,
2,357443,91594,contrib,amount,3,,5.0E9,
3,352850,91594,contrib,amount,3,,5.0E9,
4,308829,91594,contrib,amount,3,,5.0E9,
5,314511,11714,contrib,trans_hash,1,2e8250e9f3f8043cda
d60f747982275fee2a1836ebb48b2f620d03371be8e3f6,,
5,314511,11714,contrib,amount,3,,5.0E9,
[oracle@bigdatalite data]$
Oracle Flat File Format: Edges
Field # Name Description
1 edge_ID An integer that uniquely identifies the edge
2 source_vertex_ID The vertex_ID of the outgoing tail of the edge
3 dest_vertex_ID The vertex_ID of the incoming head of the edge
4 edge_label The encoded label of the edge, which describes the
relationship between the two vertices
5 key_name The encoded name of the key in a KV pair
6 value_type 1=String, 2=Integer, 3=Double, ...
7 value The encoded, nonnull value of key_name when it is
neither numeric nor timestamp (date)
numeric
a timestamp (date)

Graph Generation and Loading using Vertices & Edges files
// Start Groovy Shell connecting to Oracle NoSQL DB
cd /opt/oracle/oracle-spatial-
graph/property_graph/dal/groovy
./gremlin-opg-nosql.sh
server = new ArrayList();
server.add("bigdatalite.localdomain:5000");
// Create a graph config with graph name "btc"
// Name of key-value store is "kvstore"
// Make sure to add all vertex/edge properties needed
cfg = GraphConfigBuilder.forPropertyGraphNosql()
.setName("btc")
.setStoreName("kvstore")
.setHosts(server)
.addVertexProperty("bt_addr", PropertyType.STRING, "NA")
.addEdgeProperty("amount", PropertyType.FLOAT, 1.0f)
.hasEdgeLabel(true)
.setLoadEdgeLabel(true)
.setMaxNumConnections(2)
.build();
// Create an instance of the graph
opg = OraclePropertyGraph.getInstance(cfg);
opg.getKVStoreConfig();
// Prepare for data load
opg.setClearTableDOP(2);
opg.clearRepository();
// Create an instance for the graph data loader
opgdl=OraclePropertyGraphDataLoader.getInstance();
// Flat files with vertices & edges of Bitcoin txs
vfile="/home/oracle/Documents/BTC/data/btc.opv";
efile="/home/oracle/Documents/BTC/data/btc.ope
// Load data into the graph
opgdl.loadData(opg, vfile, efile, 2);
// Do some checks
// Count vertices and edges
opg.countVertices();
opg.countEdges();
// Get vertices and edges
opg.getVertices();
opg.getEdges();
...
// Shut down instance and close shell
opg.shutdown();
:q

PGX – Graph Analytics Engine
• Toolkit for In-Memory, Parallel Graph
Analysis containing
– PGX shell
– Analyst API with a large collection of built-in
Graph algorithms
– and more
• Developed by Oracle Labs
– http://www.oracle.com/technetwork/oracle-
labs/parallel-graph-analytix/overview/index.html
– https://event.cwi.nl/grades/2018/07-VanRest.pdf
– https://docs.oracle.com/cd/E56133_01/latest/tutorials
/index.html
PGQL – Property Graph Query Language
• SQL-like Graph Pattern Matching
– WHERE clause set of comma-separated
constraints
• Developed by Oracle Labs
– http://pgql-lang.org/
• Proposed for standardization
Graph Querying and Analysis

Analyze Bitcoin Transaction Data using PGX
• Start PGX server
/opt/oracle/oracle-spatial-
graph/property_graph/pgx/bin/start-server
• Start / Return to Groovy Shell
// Create in-memory analyst session
session=Pgx.createSession("session_ID_1");
analyst=session.createAnalyst();
// Read the graph from Oracle NoSQL DB into memory
pgxGraph =
session.readGraphWithProperties(opg.getConfig());
// Working with In-Memory Analyst
// Execute Page Rank
rank=analyst.pagerank(pgxGraph, 0.0001, 0.85, 100);
// Get top 10 vertices
rank.getTopKValues(10);
// BetweenNess Centrality
bc=analyst.vertexBetweennessCentrality(pgxGraph);
// Get top 10 vertices
bc.getTopKValues(10);
...

Analyze Bitcoin Transaction Data using PGX
Using Zeppelin Notebook with PGX Interpreter

• Topology constraints
▪ (n)–[e]–>(m)
▪ (n)–[e1]–>(m1), (n)–[e2]–>(m2)
▪ (n1)-[e1]->(n2)-[e2]->(n3)-[e3]->(n4)
▪ (n1)-[e1]->(n2)<-[e2]-(n3)
• Label matching
▪ (x:Person) -[e:likes]-> (y:Person)
▪ (:Person) -[:likes]-> (:Person)
▪ (x:Student|Professor) -[e:likes|knows]->
(y:Student|Professor)
• Value constraints
▪ (x) -> (y), x.name = 'John’, y.age > 25
• In-Line constraints
▪ (n WITH name = 'John' OR name = 'James', type =
'Person') -[e WITH type = 'workAt', workHours <
40]-> ()
• …
Syntax form Examples
Basic form (n)-[e]->(m)
Omit variable name of the source
vertex
()-[e]->(m)
Omit variable name of the destination
vertex
(n)-[e]->()
Omit variable names in both vertices ()-[e]->()
Omit variable name in edge (n)-->(m)
Omit variable name in edge
(alternative, one dash)
(n)->(m)
Querying Property Graph Data using PGQL

Query Bitcoin Transaction Data using PGQL
// Some PGQL queries
// Explore relationships in the graph
pgxResultSet = pgxGraph.queryPgql("SELECT e.label(),
count(*) WHERE (n) -[e]-> (m) GROUP BY e.label() ORDER BY
count(*) DESC");
pgxResultSet.print();
// Find top most collaborative Bitcoin addresses
pgxResultSet = pgxGraph.queryPgql("SELECT n, count(*) WHERE
(n) -[e:contrib]-> (m) GROUP BY n ORDER BY count(*) DESC
LIMIT 10");
pgxResultSet.print(3);
// Find top least collaborative Bitcoin addresses
pgxResultSet = pgxGraph.queryPgql("SELECT n, count(*) WHERE
(n) -[e:contrib]-> (m) GROUP BY n ORDER BY count(*) ASC");
// InDegree count
pgxResultSet = pgxGraph.queryPgql("SELECT y.id(),
y.bt_addr, x.inDegree() WHERE (x) -> (y), x.inDegree() >
1000 ORDER BY x.inDegree() DESC");
...
https://blogs.oracle.com/bigdataspatialgraph/how-many-ways-to-run-property-graph-query-language-pgql-in-bdsg-i

Query Bitcoin Transaction Data using PGQL
Using Zeppelin Notebook with PGX Interpreter

Visualize Bitcoin Transaction Data using Cytoscape

Pattern Analysis 01

Pattern Analysis 02: Addresses with incoming TX´s only

Pattern Analysis 03: Degree of Centrality

Summary
• Graph databases are powerful tools, complementing relational databases
– Especially strong for analysis of graph topology and connectedness
• Graph analytics offer new insight
– Especially relationships, dependencies and behavioural patterns
• Oracle Property Graph technology offers
– Comprehensive analytics through various APIs, integration with relational database
– Scaleable, parallel in-memory processing
– Secure and scaleable graph storage using Oracle NoSQL, HBase or Oracle Database
• Available both on-premise or in the Cloud
Graph capabilities in Oracle Big Data Spatial and Graph

Property Graph running in the Oracle Cloud

Rich set of built-in parallel graph
algorithms
… and parallel graph mutation
operations
Additional Information: PGX - Built-in Package

• Getting Started – Creating a Property Graph on
Oracle Database by Arthur Dayton (Vlamis
Software Solutions)
https://blogs.oracle.com/oraclespatial/getting-
started-creating-a-property-graph-on-oracle-
database
• Improve your Meetup Experience using Graph
Analytics by Karin Patenge (Oracle)
https://de.slideshare.net/kpatenge
• Big Data Spatial and Graph In-Memory Analyst
Java API:
https://docs.oracle.com/bigdata/bda411/PGXJV/toc.h
tm
• Oracle Big Data Spatial and Graph on
Oracle.com: www.oracle.com/database/big-data-
spatial-and-graph
• OTN product page (white papers, software
downloads, documentation, tutorials):
www.oracle.com/technetwork/database/database-
technologies/bigdata-spatialandgraph
• Oracle Big Data Lite Virtual Machine - a free
sandbox to get started:
www.oracle.com/technetwork/database/bigdata-
appliance/oracle-bigdatalite-2104726.html
• Hands On Lab for Big Data Spatial:
tinyurl.com/BDSG-HOL
• Blog – Examples, Tips & Tricks:
blogs.oracle.com/bigdataspatialgraph
Resources on Oracle‘s Property Graph Support

OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph

Similar to OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph (20)

More from Karin Patenge

More from Karin Patenge (17)

Recently uploaded

Recently uploaded (20)

OracleCode_Berlin_Jun2018_AnalyzeBitcoinTransactionDataUsingAsGraph