The document provides an analysis of building a complete dataframe of historical data from the Uniswap V2 ETH-DAI liquidity pool. It describes collecting data on over 59,000 minting and burning events through over 200,000 API calls to retrieve smart contract data. The resulting dataframe contains 27 columns with information on reserves, prices, liquidity levels, fees earned, and more. It explores analyzing profitability, impermanent loss, and volume without accessing individual swap data.
1. Uniswap V2 Eth-Dai Pool Analysis
Building a Complete Dataframe
Jonny H.
jonny@cfmm.house
Jeff Wentworth
jeff@curvegrid.com
Garrette David
garrette@cfmm.house
May 2021
Automated Money Markets (AMMs) such as Uniswap are one of the corner-
stones of decentralised finance (De-Fi). However, the historical data avail-
able on the liquidity pools that make up these AMMs is still quite limited.
Websites exist that have some informative liquidity pool related metrics on
display for their users (eg. Zapper, Sommelier, APY.vision ), but we are
more interested in the raw datasets that can be used to create such metrics,
which can also be used for backtesting strategies. This type of blockchain
data can be quite difficult to find. Certain projects have raised millions of
dollars to provide such blockchain data query tools. Anyone who has tried to
use these tools to build a complete dataset on a Uniswap pool will know how
challenging this can be, particularly in cases when one can only query data
which has previously been arranged into some subset. We wanted the ability
to query the smart contract historical data directly, as one can then find
the exact information they are looking for when building their dataset. We
decided to use the Curvegrid MultiBaas system to build our dataset, along
with Python Pandas and our own mathematical models and algorithms.
Curvegrid’s MultiBaas blockchain middleware provides real time and his-
torical access to the output of smart contract functions, and enriched and
1
2. aggregated smart contract time series event data. Applications can query
MultiBaas’ REST API and users can integrate MultiBaas into their spread-
sheets via a plugin.
Pandas is a powerful, open source data analysis and manipulation tool built
on top of the Python programming language. It is widely used in financial
analytics, and many Quantitative Analysts will be familiar with it.
Our goal was to build the most complete view of a Uniswap trading pair (we
started with Eth-Dai), in the most efficient way possible, by examining
both smart contract function output, and time series event data. The only
events we tracked were Mints and Burns of the Uni-V2 Eth-Dai LP token,
ie. where liquidity is added to or removed from the pool. We did not track
the individual swaps. This is because the number of swaps that occur on
the higher volume pools would be so huge, that many millions of API calls
would have been needed to import the data. We felt that there might be a
more efficient way of doing this analysis.
How can we analyse the profitability of a liquidity pool, the fees generated,
the volume traded, the impermanent loss encountered, without looking at a
single swap? This was our challenge. We used various simple mathematical
models (based on our study of the Uniswap smart contracts), to infer what
the fees earned and impermanent loss encountered must have been, all based
on the information and balances every time liquidity was added to or removed
from the pool.
Our latest solution utilises over 1000 lines of Python code and over 200,000
API calls to the MultiBaas system. We queried both event data, along with
historical Ethereum archive node data, to construct the following data frame.
Some of the columns are populated purely from the result of certain API calls.
Other columns are a result of certain mathematical models and calculations
involving preceding columns.
We saved the dataframe as a .csv , .xlsx and .pkl file, each of which can
be accessed via the following link. We encourage the reader to download
the data and run analysis on it. The dataframe has 59912 rows, each row
representing a Mint or Burn event. The dataframe has 27 columns, each
column representing either a parameter associated with the specific event, or
2
3. a snapshot of the state of the liquidity pool at the time of the event. The
following article will explain what each column represents, so the reader can
understand how to interact with the data.
Index: In order to accurately index blockchain events, we constructed a
primary key from the following components:
ˆ BlockNum - The block number in which the event (mint/burn) oc-
curred.
ˆ txIndexInBlock - The transaction index within a block
ˆ eventIndexInLog - The event index within a log (block)
There are many cases where multiple mints and burns happen in the same
block, so we needed a more accurate index than simply block number or
timestamp. Out of 59912 events examined, the distribution of mints and
burns per block can be seen in Table 1.
Table 1: Number of Mints/Burns in a Block
Number of events in
the block
Frequency (Amount
of Blocks)
1 55541
2 3717
3 517
4 113
5 18
6 6
Timestamp: The timestamp of the block in which the event was confirmed.
Every block has a recorded timestamp. This represents the number of sec-
onds since the Unix Epoch on January 1st
, 1970 at UTC. This can be con-
verted into regular date and time. The earliest timestamp in the data set is
1589413386, which points to May 13th 2020, when the pool went live. We
can see that the very first event here is a Mint where 200 Dai and 1 Eth were
initially added to the pool in block 10060850. [ tx here]
LPChange: The amount of Uni-V2 Eth-Dai LP tokens being created or de-
3
4. stroyed. When one decides to provide liquidity to a Uni-V2 pool, they receive
a token which acts as a claim on their proportion of the pool. This figure is
sometimes given as an output of the smart contract event and therefore can
be requested as part of an API call to MultiBaas. In other instances we had
to calculate it by comparing the number of LP tokens in existence before and
after the event. For a Mint the amount of LP tokens created is calculated
by the following equation, which we found on line 422 of the source code for
Uniswap V2 Eth-Dai Liquidity Pool:
LP tokens created = min
(a0)(LPS)
R0
,
(a1)(LPS)
R1
, (1)
where;
a0 is amount0
a1 is amount1
R0 is Reserve0
R1 is Reserve1
LPS is the supply of Uni-V2 Eth-Dai Lp tokens at the time of the Mint.
These terms are explained in more detail below.
Uni-V3 will make this metric slightly more difficult to track considering the
conditional liquidity provision ranges that are being introduced.
amount0: This refers to the amount of Dai being added to or removed from
the liquidity pool.
amount1: This refers to the amount of Eth being added to or removed from
the liquidity pool.
Reserve0: This originally was the total amount of Dai in the liquidity pool
at the end of the block in which the event has occurred. This consists of
Dai that has been added by LP providers, Dai that was swapped in for
Eth and also Dai that has been earned in fees. We picked this apart in
later columns. Anytime there was more than one mint/burn in the block
we did some extra calculations involving amount0, so the Reserve0 column
then accurately represents the amount of Dai directly after the mint/burn,
as opposed to the end of the block. Even though all mints/burns are not
4
5. confirmed until the end of the block, doing the calculations this way makes
subsequent calculations for fees and impermanent loss a little easier.
Reserve1: This was the total amount of Eth in the liquidity pool at the end
of the block in which the event has occurred. Eth from LP Providers, swaps,
and fees earned. The same logic was used as Reserve0 in terms of using
amount1 to calculate a more accurate version of this column, to represent
the state after the event as opposed to the end of the block.
LpSupply: The number of Uni-V2 Eth-Dai LP tokens that are in existence,
at the end of the block in which the event has occurred. These LP tokens
are generally held in the wallets of the user who provided the liquidity, but
sometimes they are actually held by a 3rd party contract, if the user has
provided liquidity to Uniswap through another platform.
contractName: The name of the contract calling the mint/burn function.
This is generally the Uni-V2 Router, but there are many other contracts in-
teracting with the Uniswap Eth-Dai pool. In fact, we found over 100 differ-
ent contracts adding and removing liquidity, some of which include 1-Inch,
Zapper, Zerion and even Sushiswap migration contracts. Interestingly, we
also found 1180 interactions from unkown contracts. Some more sophisti-
cated market participants code their own smart contracts to interact with
Uniswap, rather than using the Uniswap UI. While we cannot see the spe-
cific functions of these custom made contracts, we can still see their effects on
the pool. We are currently separately doing a more detailed study of these
various interacting contracts, with this dataframe being our starting point.
contractAddress: The address of the contract described in the previous
column.
txName: The name of the transaction within the contract that is calling the
mint/burn. Common examples are addLiquidityEth, removeLiquidityEth-
WithPermit, startExecution (Zerion), ZapOut (ZapperFi) and many more.
txHash: The transaction hash of the mint/burn.
Wallet: The public key (Eth address) of the user who is adding/removing
liquidity. One simple application of this column is to sort the Data Frame
5
6. by largest transactions, and create a subset of “Whale” wallets.
All of the columns described above were populated primarily from API re-
quests to the Curvegrid MultiBaas system. The remainder of the columns
described below are created from calculations on pre-existing columns. Some
of these calculations were quite simple, others were a lot more intricate. We
took particular care to consider that some blocks had multiple mints or burns,
and how this would affect some of the rolling calculations.
Eth Price: The Eth price on Uniswap at the time of the event. This is
created by simply calculating the ratio of the reserves;
Eth Price =
R0
R1
(2)
This represents the Eth price on Uniswap, and this can fall out of line with
the Eth price on centralised exchanges. This is seen as the opportunity for
arbitrageurs to step in and profit, therefore helping close the arbitrage gap
and bring Uniswap prices (ratios) back in line with the rest of the market.
Total Liquidity: This is the USD value of the combined amount of Eth
and Dai in the liquidity pool, at the end of the block in which the Mint/Burn
has occurred. The graph below shows how the liquidity changed over time
(red). We can also see how the Eth price has changed (blue) over the same
time period.
6
7. We can see that there was a huge drop in liquidity provided in November
2020. This is just before the price of Eth really started to move upwards. Did
LP providers remove their liquidity in anticipation of the swift appreciation
of the Eth price, to avoid whatever impermanent loss they might encounter
as a result of the price divergence? What’s important to understand here
is that the liquidity never recovered. The small recovery in January 2021 is
actually false. Since the Liquidity is measured in USD, the “recovery” we
see here is merely just an appreciation of the value of the pool due to the
increase in the price of the Eth that is in the pool, not an influx of liquidity.
We can actually factor this Eth price out of the Liquidity analysis, and the
situation looks more like the graph below.
7
8. It is clear from the above image that the liquidity never really recovered
when you analyse it from the point of view of the number of tokens in the
pool.
Another strong possibility for the liquidity drop off in the Uni Eth-Dai pool
during November 2020 is users migrating their liquidity to Sushiswap. The
liquidity in the Eth-Dai pool on Sushiswap increased from $34m to $120m
during the same time period as the drop in liquidity on Uniswap. We can
also see in our dataframe the increased usage of the SushiSwap migration
contract. This contract was designed to migrate liquidity from Uniswap
pools to Sushiswap pools, in what’s known as a “Vampire Attack”. If we
filter our dataframe to only look at Sushi migration events during November
2020, we can see that $4.8 million were transferred directly from the Uni Eth-
Dai liquidity pool to the Sushi pool using the Sushi migration contract alone.
Millions more were withdrawn from Uni and deposited to Sushi manually.
We are looking at this in more detail in our study on the various contracts
interacting with Uniswap.
Lp Token Value: The value in USD of the combined amount of Dai and
Eth that is the equivalent of 1 Uni-V2 Eth-Dai LP token at the time. This
was calculated by simply dividing the Total Liquidity (USD Value) by the
8
9. total LP token supply at that time. Tracking the performance of the value of
1 LP token over time is a simple way to understand the return from providing
liquidity to the pool. The fees earned and impermanent loss encountered are
incorporated into this value.
Lp Token TVH (Total Value if Held): This column tracks the value
of the ‘Holding Portfolio’ equivalent to the ratio and amounts of Eth-Dai
necessary for 1 LP token at the beginning. However, in this case the Eth-
Dai has not been put into the liquidity pool, therefore the initial ratio is
unchanged throughout the time series, so this value is only changing due to
the change in the price of Eth. This can be viewed as the opportunity cost
of depositing your Eth and Dai into the LP (earning fees and encountering
impermanent loss), rather than simply holding the Eth and Dai in a wallet.
Comparing LP Token Value to Lp Token TVH is a simple way to see whether
entering the liquidity pool was more or less profitable that holding the assets
in a wallet.
In the graph above we can see that providng liquidity was more profitable
than holding until near the end of 2020. Then once the price of Eth really
started to appreciate, the level of impermanent loss encountered by the pool
became greater than the fees being earned, thus meaning liquidity providers
9
10. were now worse off for providing liquidity.
The graph below tracks this profitability comparison as a percentage of the
value of 1 LP token. When this graph turns negative, it is not suggesting
that LP provision is yielding negative returns. Rather, it is suggesting that
providing liquidity had become less profitable than simply holding the assets.
It is also important to note that this analysis is based on comparing strategies
from the inception of the pool until the time of this analysis. Selecting
different starting points can give quite a different result.
Dai Earned: This column represents the amount of Dai earned in fees by
the pool since the last mint/burn.
Eth Earned: This column represents the amount of Eth earned in fees by
the pool since the last mint/burn.
To calculate both Dai Earned and Eth Earned figures, we considered the
difference in the amount of Dai and Eth in the pool now versus after the
previous mint/burn. We also had to consider how much was minted or burned
in the event, so as not to count those into the fees earned. The challenging
part was accounting for how the ratios of Eth-Dai had changed from swapping
and we had to filter this ratio change out of our calculations. Taking all of
10
11. this into account we came up with the following sets of equations which
described the “problem”.
Dai Earned = R0 − a0 − x (3)
Eth Earned = R1 − a1 − y, (4)
where x and y represent the amounts of Dai and Eth that would exist after
the previous event, should no fees have been earned. x and y are such that
they satisfy the following ratio related equations:
xy = k−1 (5)
x
y
=
R0 − a0
R1 − a1
(6)
k is calculated as R0 × R1 , so k−1 represents this value taken from the
block of the previous mint/burn event. Solving the above for x and y and
substituting into equations (3) and (4) gives us the following equations to
calculate the Dai earned and the Eth earned by the liquidity pool since the
last event.
Dai Earned = R0 − a0 −
s
(R0 − a0)(k−1)
R1 − a1
(7)
Eth Earned = R1 − a1 −
s
(R1 − a1)(k−1)
R0 − a0
(8)
Value of Eth Earned: This is simply the previous column (Eth Earned)
multiplied by the price of Eth at that time.
Fees Earned: Dai Earned + Value of Eth Earned.
Fees Earned per LP: Fees Earned divided by LP supply.
Volume: The fees earned by the pool is 0.3% of the trading volume. There-
fore the trading volume should be the Fees Earned divided by 0.003. This
11
12. column displays the implied trading volume since the last mint/burn event.
It is important to understand that the time between such events is arbitrary,
so care must be taken if trying to use this data to interpret/calculate the
average volume over a certain time period (daily/weekly etc).
IL Nominal: The value in USD of the impermanent loss encountered by
the pool since the previous event.
IL %: The Impermanent Loss (I.L.) of the period since the previous event
expressed as a percentage of the Holding Portfolio. The Holding Portfolio
refers to where the LP provider has maintained the same Eth-Dai ratio as
that of the previous event, rather than having the ratio change by being a
part of the liquidity pool. We can calculate this a few different ways. One
way is to manually compare the ratios in the pool at the time of the event
versus the Holding Portfolio. We also created the following equation to model
percentage I.L., from time n to time n+1, based purely on price information:
IL% =
2
√
p0p1p̂0p̂1
p0p̂1 + p̂0p1
− 1
× 100, (9)
where,
p0 represents the price of asset 0 at time n
p1 represents the price of asset 1 at time n
p̂0 represents the price of asset 0 at time n + 1
p̂1 represents the price of asset 1 at time n + 1.
The above equation calculates impermanent loss only for liquidity pools
which are weighted 50/50 and have simlilar trading mechanics to Uniswap.
There are similar formulas to be found, discussed in various other reviews
of impermanent loss, for example the following article by Uniswap or this
medium article by DefiYield.info.
We are currently separately doing a more detailed study on impermanent loss
using our datasets, from various points of view. For example, we can look at
the I.L. from the point of view of each individual wallet (user), and calculate
12
13. the I.L. that was actually encountered by each liquidity provider during the
time in which they provided liquidity. We can then study whether there
is a certain level of I.L. users are willing to put up with, before they pull
their liquidity and make their impermanent loss become permanent. This
statistic may have more importance than simply analysing the hypothetical
I.L. between two arbitrary times or price points.
Currently there is no consensus across the sector on exactly how to quantify
and price the risk of I.L., other than these hypothetical calculations. This is
something we are examining further. For example, the payoff for providing
liquidity depends on the fees earned (a function of volume and the fee itself),
the share of the liquidity pool owned, and the price divergence of the two
assets in question. An LP provider wants high volume and thus high fees
earned, and very little price divergence in order to maximise profits. This
payoff is not dissimilar to writing options, more specifically, being the writer
of a straddle option strategy. The option writer earns fees as the option
premium, but can lose quite a lot if the price moves significantly in either
direction. We can use our dataframe to calculate historical fees, price diver-
gences and volatilities and incorporate these into well tested options pricing
strategies to gain further understanding in how to accurately price the risk
versus reward of providing liquidity. Uni-V3 adds another layer of complexity
considering the concentrated liquidity strategy. However, the general ‘shape’
of AMM’s are so far conserved across networks with Cosmos Gravity Dex
being a prime example.
Our dataframes and models are a continuous work in progress, and are con-
stantly being edited and updated as we learn. With our models and algo-
rithms, and utilisation of Curvegrid’s MultiBaas system, we can do this type
of analysis (historical and live) on any token pair, on any AMM, on any
blockchain. We highly encourage blockchain data explorers to contact us
so we can expand our analysis and understanding for AMM’s in a systemic
sense. We are currently still exploring what is possible and anticipate using
additional tools as they are made available.
Consideration of the movements in mints and burns are examples of met-
rics that are not so obvious but have indications on pool and trading pair
health. Future work on these dataframes will include additional parameters
like gas prices, Bitcoin variance, and altcoin correlations to find relationships
13
14. and create meaningful clarity into the forces that drive AMM markets. We
encourage the reader to download and interact with the dataset, perhaps
run some of your own analysis and share it. Should you wish to contact
us regarding licencing or collaboration, please email hello@cfmm.house. We
particularly encourage critical analysis of our methods and models and chal-
lenges to our assumptions.
14