Phisher Detection in Ethereum
Transaction Networks
Yash Jhaveri, Kavish Shah, Varun Mehta,
Khushi Naik, Hitansh Surani, Andre Lebecki
The Problem
● Phishing scams in Ethereum involve malicious actors creating fake
addresses or platforms to deceive users into transferring funds to
them.
● The Ethereum blockchain records all transactions, and these are
represented as a graph where each address is a node, and each
transaction is an edge connecting two nodes.
● In this graph, phishing addresses are nodes that represent
fraudulent accounts. The challenge in detecting phishing lies in
understanding the complex relationships between
nodes—specifically how legitimate and phishing addresses are
connected.
● Traditional phishing detection methods often fail to capture the
intricate patterns in these relationships, making it difficult to
distinguish between legitimate and fraudulent addresses.
● We propose a solution based on Graph Convolutional Networks
(GCNs), which are well-suited for handling large, sparse graphs like
Ethereum’s.
Dataset Overview
● The dataset used in the study contains Ethereum transaction
data, including both legitimate and fraudulent activities.
● With 13.5 million edges (transactions) and 3 million nodes
(addresses) with only about 1,165 phishing nodes, the Ethereum
dataset was highly imbalanced.
● This imbalance makes it challenging for detection models to
identify phishing addresses effectively, as the majority of
addresses are legitimate.
● To tackle the imbalance problem, we carefully curated the node
selection process and re-sampled the illicit node transactions to
ensure that the dataset is more balanced.
● The dataset also highlights the challenge of dealing with sparse
graphs, where most nodes have only a few connections, making
it harder to detect patterns indicative of phishing.
Dataset Nodes Edges Illicit Nodes
Ethereum 2,973,489 13,551,303 1,165
Key Node Properties
Objective: Identify patterns and trends for each node.
Key Features Extracted and Used:
● Indegree:
○ Number of transactions received by the node.
● Outdegree:
○ Number of transactions sent by the node.
● Degree:
○ Total number of transactions in which the node is involved.
● Instrength:
○ Total amount of cryptocurrency received by the node.
● Outstrength:
○ Total amount of cryptocurrency sent by the node.
● Strength:
○ Total amount of cryptocurrency transacted.
● Number of Neighbours:
○ The number of other nodes interacting with this node.
Previous Methods: RiWalk
What is RiWalk?
● A random-walk-based embedding method that captures structural
and contextual information of nodes.
● These walks generate feature vectors that represent the node’s
connections and its local environment within the graph.
● RiWalk is chosen over other embedding algorithms, like node2vec,
because it provides high-quality embeddings before training a
neural network or classifiers.
● It is also highly effective in handling large, sparse graphs like
Ethereum’s transaction network.
Embedding Integration Workflow:
Step 1: Generate node embeddings using RiWalk.
Step 2: Merge these embeddings with engineered node
features.
Step 3: Input the combined features into classifiers.
Baseline Models - RF and LR
● RiWalk embeddings of the Ethereum transaction graph were fed
into two classic classifiers:
○ Logistic Regression model (linear, predicting the probability
of an address being phishing)
○ Random Forest (50 trees, max_depth=5, max_features=10)
● Logistic Regression achieved 96.8 % overall accuracy but
performed poorly on the rare phishing class—61.5 % precision,
just 13.7 % recall (F1 = 0.225)—meaning it missed over 86 % of
actual phishers.
● Random Forest raised overall accuracy to 97.2 % and phishing
precision to 76.1 % with 23.2 % recall (F1 = 0.355), yet still failed
to detect more than three-quarters of phishing addresses.
Model Accuracy % Precision % Recall % F1-Score
Logistic
Regression
96.8 61.5 13.7 0.225
Random Forest 97.2 76.1 23.2 0.355
Conclusion:
● Both models achieve high weighted accuracy thanks to
the dominant non-phishing class.
● However, neither can reliably recall the minority phishing
nodes—highlighting the need for graph-based approaches
that leverage structural information.
Graph Convolutional Networks
● GCN is a neural network based approach that works well with
graph data and takes a graph as an input
● Matrix Multiplication is the core operation of GCNs
● It tries to capture the features of different nodes in the
surrounding network
● Can be used to extract embeddings which can then be passed
through a neural network or any other ML algorithm
Our Approach
● Just like the Baseline algorithm, we use GCN to process
the graph structure and get the embeddings
● We use Random Forest and Logistic Regression algorithms
for apples-to-apples comparison
● We also, experiment with Neural Networks for the
classification task
Challenges: Architecture Selection and Training Time
● Selecting the right number of layers and tuning the
parameters to fetch the most optimal results took a lot of
experimentation and revision efforts
● One major challenge faced while experimentation was the
slow training and testing speed of the model
○ To resolve this, we went deeper into the architecture
and found that for large sparse datasets, the
multiplication of sparse matrices increased compute
time without adding value to the model
Experimentation with layers
Experimentation with layers
Preferred Approach: Mid-size GCN + Sparse
Operation
After several experimentations:
● Architecture:
○ 3 convolutional layers
○ batch normalization implemented after conv1 and
conv2
● Training Time:
○ The adjacency matrix representing the graph is
converted into a sparse tensor and sparse operations
were applied for memory efficiency and faster
computation
Results for Sparse Operations
● Reduced training and testing time by 16% in a
GPU Environment.
● Did not have much effect on the scores
Evaluating Performance by Comparing
RiWalk and GCN Embeddings
RiWalk + RF RiWalk + LR GCN + RF GCN + LR
Test F1-Score 0.36 0.23 0.62 0.57
Test Accuracy 97.2% 96.8% 97% 96.5%
Though in terms of accuracy, RiWalk Embedding might outperform GCN
Embeddings, however the more important metric to consider in this binary
classification problem is the F1-Score since it represents the overall
performance of the model over both the classes, thereby handling the
imbalance in the dataset as well
Best Model Performance - GCN + NN
We pass on the GCN embeddings to a neural network consisting
of a ReLU activation function followed by a sigmoid function to
classify our nodes
Conclusion
● Understood the problem of phishing in
cryptocurrencies and how efficient methods are
required for global large scale adoption of crypto
● Tackled imbalance problem using the sampling
techniques
● Explored RiWalk and GCN techniques to tackle
graph-based challenges
● GCN + NN outperforms even though RiWalk gave
better accuracy
● The use of sparse operations reduced the time by
a factor of 16% in a TPU environment
THANK YOU!

Phisher Detection in Ethererum Transaction Networks

  • 1.
    Phisher Detection inEthereum Transaction Networks Yash Jhaveri, Kavish Shah, Varun Mehta, Khushi Naik, Hitansh Surani, Andre Lebecki
  • 2.
    The Problem ● Phishingscams in Ethereum involve malicious actors creating fake addresses or platforms to deceive users into transferring funds to them. ● The Ethereum blockchain records all transactions, and these are represented as a graph where each address is a node, and each transaction is an edge connecting two nodes. ● In this graph, phishing addresses are nodes that represent fraudulent accounts. The challenge in detecting phishing lies in understanding the complex relationships between nodes—specifically how legitimate and phishing addresses are connected. ● Traditional phishing detection methods often fail to capture the intricate patterns in these relationships, making it difficult to distinguish between legitimate and fraudulent addresses. ● We propose a solution based on Graph Convolutional Networks (GCNs), which are well-suited for handling large, sparse graphs like Ethereum’s.
  • 3.
    Dataset Overview ● Thedataset used in the study contains Ethereum transaction data, including both legitimate and fraudulent activities. ● With 13.5 million edges (transactions) and 3 million nodes (addresses) with only about 1,165 phishing nodes, the Ethereum dataset was highly imbalanced. ● This imbalance makes it challenging for detection models to identify phishing addresses effectively, as the majority of addresses are legitimate. ● To tackle the imbalance problem, we carefully curated the node selection process and re-sampled the illicit node transactions to ensure that the dataset is more balanced. ● The dataset also highlights the challenge of dealing with sparse graphs, where most nodes have only a few connections, making it harder to detect patterns indicative of phishing. Dataset Nodes Edges Illicit Nodes Ethereum 2,973,489 13,551,303 1,165
  • 4.
    Key Node Properties Objective:Identify patterns and trends for each node. Key Features Extracted and Used: ● Indegree: ○ Number of transactions received by the node. ● Outdegree: ○ Number of transactions sent by the node. ● Degree: ○ Total number of transactions in which the node is involved. ● Instrength: ○ Total amount of cryptocurrency received by the node. ● Outstrength: ○ Total amount of cryptocurrency sent by the node. ● Strength: ○ Total amount of cryptocurrency transacted. ● Number of Neighbours: ○ The number of other nodes interacting with this node.
  • 5.
    Previous Methods: RiWalk Whatis RiWalk? ● A random-walk-based embedding method that captures structural and contextual information of nodes. ● These walks generate feature vectors that represent the node’s connections and its local environment within the graph. ● RiWalk is chosen over other embedding algorithms, like node2vec, because it provides high-quality embeddings before training a neural network or classifiers. ● It is also highly effective in handling large, sparse graphs like Ethereum’s transaction network.
  • 6.
    Embedding Integration Workflow: Step1: Generate node embeddings using RiWalk. Step 2: Merge these embeddings with engineered node features. Step 3: Input the combined features into classifiers.
  • 7.
    Baseline Models -RF and LR ● RiWalk embeddings of the Ethereum transaction graph were fed into two classic classifiers: ○ Logistic Regression model (linear, predicting the probability of an address being phishing) ○ Random Forest (50 trees, max_depth=5, max_features=10) ● Logistic Regression achieved 96.8 % overall accuracy but performed poorly on the rare phishing class—61.5 % precision, just 13.7 % recall (F1 = 0.225)—meaning it missed over 86 % of actual phishers. ● Random Forest raised overall accuracy to 97.2 % and phishing precision to 76.1 % with 23.2 % recall (F1 = 0.355), yet still failed to detect more than three-quarters of phishing addresses.
  • 8.
    Model Accuracy %Precision % Recall % F1-Score Logistic Regression 96.8 61.5 13.7 0.225 Random Forest 97.2 76.1 23.2 0.355 Conclusion: ● Both models achieve high weighted accuracy thanks to the dominant non-phishing class. ● However, neither can reliably recall the minority phishing nodes—highlighting the need for graph-based approaches that leverage structural information.
  • 9.
    Graph Convolutional Networks ●GCN is a neural network based approach that works well with graph data and takes a graph as an input ● Matrix Multiplication is the core operation of GCNs ● It tries to capture the features of different nodes in the surrounding network ● Can be used to extract embeddings which can then be passed through a neural network or any other ML algorithm
  • 10.
    Our Approach ● Justlike the Baseline algorithm, we use GCN to process the graph structure and get the embeddings ● We use Random Forest and Logistic Regression algorithms for apples-to-apples comparison ● We also, experiment with Neural Networks for the classification task
  • 11.
    Challenges: Architecture Selectionand Training Time ● Selecting the right number of layers and tuning the parameters to fetch the most optimal results took a lot of experimentation and revision efforts ● One major challenge faced while experimentation was the slow training and testing speed of the model ○ To resolve this, we went deeper into the architecture and found that for large sparse datasets, the multiplication of sparse matrices increased compute time without adding value to the model
  • 12.
  • 13.
  • 14.
    Preferred Approach: Mid-sizeGCN + Sparse Operation After several experimentations: ● Architecture: ○ 3 convolutional layers ○ batch normalization implemented after conv1 and conv2 ● Training Time: ○ The adjacency matrix representing the graph is converted into a sparse tensor and sparse operations were applied for memory efficiency and faster computation
  • 15.
    Results for SparseOperations ● Reduced training and testing time by 16% in a GPU Environment. ● Did not have much effect on the scores
  • 16.
    Evaluating Performance byComparing RiWalk and GCN Embeddings RiWalk + RF RiWalk + LR GCN + RF GCN + LR Test F1-Score 0.36 0.23 0.62 0.57 Test Accuracy 97.2% 96.8% 97% 96.5% Though in terms of accuracy, RiWalk Embedding might outperform GCN Embeddings, however the more important metric to consider in this binary classification problem is the F1-Score since it represents the overall performance of the model over both the classes, thereby handling the imbalance in the dataset as well
  • 17.
    Best Model Performance- GCN + NN We pass on the GCN embeddings to a neural network consisting of a ReLU activation function followed by a sigmoid function to classify our nodes
  • 18.
    Conclusion ● Understood theproblem of phishing in cryptocurrencies and how efficient methods are required for global large scale adoption of crypto ● Tackled imbalance problem using the sampling techniques ● Explored RiWalk and GCN techniques to tackle graph-based challenges ● GCN + NN outperforms even though RiWalk gave better accuracy ● The use of sparse operations reduced the time by a factor of 16% in a TPU environment
  • 19.