Bitcoin Price Detection with Pyspark presentation

BITCOIN PRICE DETECTION
WITH RANDOM FOREST
CS 556 - BIG DATA ANALYSIS TERM PROJECT -
YAKUP GORUR - S017327

OUTLINE
• MOTIVATION
• CRYPTOCURRENCY DATA
• RANDOM FOREST
• RESULT

BITCOIN
CRYPTOCURRENCY DATA
• OPEN
• HIGH
• LOW
• CLOSE
• Volume (BTC)
• Volume (Currency)
• Weighted Price

CANDLE STICK CHART
CRYPTOCURRENCY DATA
• OPEN PRICE:
The open represents the first price traded
during the candlestick.
• HIGH PRICE:
The high is the highest price traded during
the candlestick
• LOW:
The low shows the lowest price traded
during the candlestick
• CLOSE:
The close is the last price traded during the
candlestick
Image via Wikipedia
The color of the candle body indicates whether the closing price was
higher than the opening price
green bar, called an ‘up-bar’
red bar, called a ‘down-bar’

CANDLE STICK CHART
CRYPTOCURRENCY DATA
• Volume (BTC):
Volume, in BTC traded in the stock
market during a given measurement
interval.
• Volume (Currency):
Volume, in USD, traded on stock market
during a given measurement interval.
• Weighted Price:
Measure of the average price

CRYPTOCURRENCY DATA
HOW BIG
WEEKLY DATA
DAILY DATA
HOURLY DATA
PER MINUTE DATA
x 7
x 24
x 60
2009 - 2018 for BITCOIN
~468 weeks
~3,285 days
~78,840 hours
~4,730,400 minutes
EVERY DAY OR EVERY WEEK
EVERY DAY
EVERY HOUR
EVERY MINUTE
PREDICTION
REAL TIME
Number of Market
~ 27000
Number of Coins
~ 5000
~12,636,000 weeks
~88 M days
~2 B hours
~120 B minutes
~60 B weeks
~440 B days
~10 T hours
~ 600 T minutes

• classification and regression tasks
• decision tree, the basic building
block of a random forest
• fast to build
• Fully parallelizable
RANDOM FOREST

• Each tree in a Random Forest is
trained independently, multiple trees
can be trained in parallel.
• Random Forests use a different
subsample of the data to train each
tree. Instead of replicating data
explicitly, we save memory by using
a TreePoint structure which stores
the number of replicas of each
instance in each subsample.
RANDOM FOREST IN SPARK

1. Read train data from a CSV file.
2. For n Trees call n mappers
3. Create 90% subset of the training data with replacement
4. When mapper is running it takes these data
5. After receiving data, each mapper start to build tree and
produce prediction for test dataset.
6. Pass the test data and label as key and value to Reducer
7. Reducer counts the majority label according to key.
8. Write results to output file.
RANDOM FOREST ON MAPREDUCE
BASIC MAPREDUCE STRUCTURE

DATA REPRESENTATION
RESULT
Window Size=7

RESULT
DATA PREDICTION
RMSE: 180

RESULT
DATA PREDICTION
“past performance is not an indicator for future outcomes”.*Finance world:

Bitcoin Price Detection with Pyspark presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to Bitcoin Price Detection with Pyspark presentation

Similar to Bitcoin Price Detection with Pyspark presentation (20)

Recently uploaded

Recently uploaded (20)

Bitcoin Price Detection with Pyspark presentation