Phisher Detection in Ethererum Transaction Networks

Phisher Detection in Ethereum
Transaction Networks
Yash Jhaveri, Kavish Shah, Varun Mehta,
Khushi Naik, Hitansh Surani, Andre Lebecki

The Problem
● Phishing scams in Ethereum involve malicious actors creating fake
addresses or platforms to deceive users into transferring funds to
them.
● The Ethereum blockchain records all transactions, and these are
represented as a graph where each address is a node, and each
transaction is an edge connecting two nodes.
● In this graph, phishing addresses are nodes that represent
fraudulent accounts. The challenge in detecting phishing lies in
understanding the complex relationships between
nodes—speciﬁcally how legitimate and phishing addresses are
connected.
● Traditional phishing detection methods often fail to capture the
intricate patterns in these relationships, making it diﬃcult to
distinguish between legitimate and fraudulent addresses.
● We propose a solution based on Graph Convolutional Networks
(GCNs), which are well-suited for handling large, sparse graphs like
Ethereum’s.

Dataset Overview
● The dataset used in the study contains Ethereum transaction
data, including both legitimate and fraudulent activities.
● With 13.5 million edges (transactions) and 3 million nodes
(addresses) with only about 1,165 phishing nodes, the Ethereum
dataset was highly imbalanced.
● This imbalance makes it challenging for detection models to
identify phishing addresses effectively, as the majority of
addresses are legitimate.
● To tackle the imbalance problem, we carefully curated the node
selection process and re-sampled the illicit node transactions to
ensure that the dataset is more balanced.
● The dataset also highlights the challenge of dealing with sparse
graphs, where most nodes have only a few connections, making
it harder to detect patterns indicative of phishing.
Dataset Nodes Edges Illicit Nodes
Ethereum 2,973,489 13,551,303 1,165

Key Node Properties
Objective: Identify patterns and trends for each node.
Key Features Extracted and Used:
● Indegree:
○ Number of transactions received by the node.
● Outdegree:
○ Number of transactions sent by the node.
● Degree:
○ Total number of transactions in which the node is involved.
● Instrength:
○ Total amount of cryptocurrency received by the node.
● Outstrength:
○ Total amount of cryptocurrency sent by the node.
● Strength:
○ Total amount of cryptocurrency transacted.
● Number of Neighbours:
○ The number of other nodes interacting with this node.

Previous Methods: RiWalk
What is RiWalk?
● A random-walk-based embedding method that captures structural
and contextual information of nodes.
● These walks generate feature vectors that represent the node’s
connections and its local environment within the graph.
● RiWalk is chosen over other embedding algorithms, like node2vec,
because it provides high-quality embeddings before training a
neural network or classiﬁers.
● It is also highly effective in handling large, sparse graphs like
Ethereum’s transaction network.

Embedding Integration Workﬂow:
Step 1: Generate node embeddings using RiWalk.
Step 2: Merge these embeddings with engineered node
features.
Step 3: Input the combined features into classiﬁers.

Baseline Models - RF and LR
● RiWalk embeddings of the Ethereum transaction graph were fed
into two classic classiﬁers:
○ Logistic Regression model (linear, predicting the probability
of an address being phishing)
○ Random Forest (50 trees, max_depth=5, max_features=10)
● Logistic Regression achieved 96.8 % overall accuracy but
performed poorly on the rare phishing class—61.5 % precision,
just 13.7 % recall (F1 = 0.225)—meaning it missed over 86 % of
actual phishers.
● Random Forest raised overall accuracy to 97.2 % and phishing
precision to 76.1 % with 23.2 % recall (F1 = 0.355), yet still failed
to detect more than three-quarters of phishing addresses.

Model Accuracy % Precision % Recall % F1-Score
Logistic
Regression
96.8 61.5 13.7 0.225
Random Forest 97.2 76.1 23.2 0.355
Conclusion:
● Both models achieve high weighted accuracy thanks to
the dominant non-phishing class.
● However, neither can reliably recall the minority phishing
nodes—highlighting the need for graph-based approaches
that leverage structural information.

Graph Convolutional Networks
● GCN is a neural network based approach that works well with
graph data and takes a graph as an input
● Matrix Multiplication is the core operation of GCNs
● It tries to capture the features of different nodes in the
surrounding network
● Can be used to extract embeddings which can then be passed
through a neural network or any other ML algorithm

Our Approach
● Just like the Baseline algorithm, we use GCN to process
the graph structure and get the embeddings
● We use Random Forest and Logistic Regression algorithms
for apples-to-apples comparison
● We also, experiment with Neural Networks for the
classiﬁcation task

Challenges: Architecture Selection and Training Time
● Selecting the right number of layers and tuning the
parameters to fetch the most optimal results took a lot of
experimentation and revision efforts
● One major challenge faced while experimentation was the
slow training and testing speed of the model
○ To resolve this, we went deeper into the architecture
and found that for large sparse datasets, the
multiplication of sparse matrices increased compute
time without adding value to the model

Preferred Approach: Mid-size GCN + Sparse
Operation
After several experimentations:
● Architecture:
○ 3 convolutional layers
○ batch normalization implemented after conv1 and
conv2
● Training Time:
○ The adjacency matrix representing the graph is
converted into a sparse tensor and sparse operations
were applied for memory eﬃciency and faster
computation

Results for Sparse Operations
● Reduced training and testing time by 16% in a
GPU Environment.
● Did not have much effect on the scores

Evaluating Performance by Comparing
RiWalk and GCN Embeddings
RiWalk + RF RiWalk + LR GCN + RF GCN + LR
Test F1-Score 0.36 0.23 0.62 0.57
Test Accuracy 97.2% 96.8% 97% 96.5%
Though in terms of accuracy, RiWalk Embedding might outperform GCN
Embeddings, however the more important metric to consider in this binary
classiﬁcation problem is the F1-Score since it represents the overall
performance of the model over both the classes, thereby handling the
imbalance in the dataset as well

Best Model Performance - GCN + NN
We pass on the GCN embeddings to a neural network consisting
of a ReLU activation function followed by a sigmoid function to
classify our nodes

Conclusion
● Understood the problem of phishing in
cryptocurrencies and how eﬃcient methods are
required for global large scale adoption of crypto
● Tackled imbalance problem using the sampling
techniques
● Explored RiWalk and GCN techniques to tackle
graph-based challenges
● GCN + NN outperforms even though RiWalk gave
better accuracy
● The use of sparse operations reduced the time by
a factor of 16% in a TPU environment

Phisher Detection in Ethererum Transaction Networks

More Related Content

Similar to Phisher Detection in Ethererum Transaction Networks

Recently uploaded

Phisher Detection in Ethererum Transaction Networks