SlideShare a Scribd company logo
@ODSC
OPEN
DATA
SCIENCE
CONFERENCE
London | September 19 - 22 2018
Olaf de Leeuw
Can we predict the Bitcoin Price
with
LSTM Sentiment Analysis?
Olaf de Leeuw
Data Scientist at Dataworkz
This is where the idea arose, after a full day of talks at ODSC London 2017
Market
Data
Collector
Twitter
Data
Collector
Not only tweets with #Bitcoin but all kind of news related tweets were collected
Collecting the data
Exploring the Twitter data
•All the Twitter data is stored in ElasticSearch…,
•We don’t know exactly yet what it looks like…,
•We want to create a Recurrent Neural Network with LSTM in Tensorflow…,
•So it’s a good thing Python has an ElasticSearch and a Tensorflow module!
Short demo
Predicting the sentiment of a tweet: positive or negative?
1 Million tweets… How to analyze these tweets and how do we put them in a deep learning algorithm?
Deep learning needs scalars or matrices of scalars as input.
For example a convolutional neural network uses pixels of images for object recognition
Likewise text/speech needs to be vectorized before analyzing it.
“Only words or word encodings provide no useful information regarding the relationships
that may exist between the individual symbols” (tensorflow.org).
So vectorization of our tweets….
Word2Vec is the answer
The basic ideas behind a Word2Vec model
Word2Vec
model
Neural Network with one hidden layer
This hidden layer is a matrix with dimension N x D where
D is the length of a vector representing a word.
The input is a one-hot vector of a word and has dimension
N x 1 where N is the number of words in your dictionary.
The output layer is a vector with probabilities that a the
input word is the neighbour of the words in this vector.
This hidden layer is exactly what we are looking for!
Pre-trained Word2Vec models
• Available on Stanford website (https://nlp.stanford.edu/projects/glove/)
• Data available with different number of words and several vector dimensions.
• In this project a set of 400k words is used with vectors of dimension 50 x 1.
• The data consist of a word list and a matrix:
❖The word list contains 400k words each represented by a number
❖The matrix has dimension 400k x 50, for each word a vector representation of length 50
Long-Short term memories, why should we use them?
source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks are sufficient if you want to predict for instance the sentiment of:
“The movie was really bad”
The problem arises when the relevant information is much further away or spread out over multiple
sentences:
“This is the best day ever. The weather is beautiful and I got a new job. However the movie I just saw
was really bad”
In a more simple recurrent network this may be predicted as negative. Long-Short term memories can
deal with the information in the whole text.
First an intuitive interpretation
•The complete network consist of n such layers.
•At each layer you put in the next word of your text, Xt, and add it to the already stored information.
•A number of updates and calculations are done and finally there is some output, ht and we move on
to the next layer.
And now step by step…
Step by Step: the main information line
•On this line all the information is stored and this information loops through all the cells until all words a
•Within each cell information is added, removed and updated.
Step by Step: the forget gate
•The next word is added to the cell, Xt, just like the information from the previous cell, ht-1.
•The sigmoid function determines which information from ht-1 is kept, e.g.:
When Xt is a new subject, you may want to forget the old one which is stored in the cell state at
the main line
•The outcome is multiplied with information from the cell state Ct-1.
Step by Step: the forget gate - example
• Assume the word at Xt is “bitcoin”. As earlier stated we use word vectors:
• The vector is multiplied by a weight matrix Wf with dimension 50 x (num LSTM units) and after that a bias is
added. In formula notation:
• We work with 50d vectors and 64 LSTM units, so the formula gives us:
• Finally this is put into the sigmoid function and the outcome goes to the cell state Ct
• Together with the previous state ht-1 the complete equation becomes:
𝜎(𝑋𝑡 ∗ 𝑊𝑥,𝑓 + 𝑏𝑥,𝑓)
𝑋𝑡 ∗ 𝑊𝑥,𝑓 + 𝑏𝑥,𝑓
𝜎(𝑋𝑡 ∗ 𝑊𝑥,𝑓 + 𝑏𝑥,𝑓)
𝜎(ℎ𝑡−1 ∗ 𝑊ℎ,𝑓 + 𝑏ℎ,𝑓)
Step by Step: the input gate
•The input gate consists of two functions:
1. A sigmoid function is used to determine what kind of information we would like to store. e.g. the
new subject
2. A tanh function is used to determine the content of the information, e.g. is the new subject male
or female?
•The output of these functions together is added to the current cell state Ct.
Step by Step: the output gate
•The output gate filters some information from the current cell state.
•A sigmoid decides what we are going to output and the tanh function makes sure the values are
between -1 and 1:
If we saw a new subject the output will be whether the subject is male or female, singular or plural.
The full model:
Tweets (word indices)
Globe word vectors
Labels
([0,1]
[1,0])
Hyperparameters:
There are a lot of choices you have to make before training the RNN with LSTM.
• Length of the sequence: the number of LSTM cells.
• Number of LSTM units: comparable with the number of units in a layer of a regular NN.
• Iterations: how often you run the model during your training. Each iteration you run one batch.
• Batch size: each iteration you run one batch of tweets.
• Optimizer: the function that tries to optimize the loss. Often used functions are Gradient Descent and Adam.
• DropoutWrapper and its probability: the probability of keeping informations, it helps to prevent from overfitting.
• Learning rate: too big and you model may not converge, too small and it may take ages.
Loss function:
The loss function we use is softmax cross entropy:
• Softmax function: it squashes the output vector with real numbers to a vector with
real numbers between 0 and 1 and such that they add up to 1:
𝑆(𝑣)𝑖 =
𝑒𝑣𝑖
𝑘=1
𝑁
𝑒(𝑣𝑘)
• Cross entropy is an often used alternative of the well known squared error and is
defined by:
𝐻(𝑦, 𝑝) = −
𝑖
𝑦𝑖𝑙𝑜𝑔(𝑆𝑖)
Where Si is the output of the softmax function. Cross entropy is only useful when
the input is a probability distribution and therefore the Softmax function.
Optimization of the loss function:
The optimization functions used in this model are Gradient Descent and the Adam optimizer. The
Adam optimizer is an extension of Stochastic Gradient Descent. The SGD is defined as
SGD maintains a single learning rate for all parameter updates. Adam has learning rates for each
network weight and they are separately adapted.
• Adam: Adaptive Moment Estimation
• Adam stores the first and second moments (mean and variance) of the decaying average of the past gr
𝑚𝑡 = 𝛽1𝑚𝑡 − 1 + (1 − 𝛽1)𝛻𝑡
𝑣𝑡 = 𝛽2𝑣𝑡 − 1 + (1 − 𝛽2)𝛻𝑡
2
These variables are used to update your parameters/weights used in the model
𝑊𝑡+1 = 𝑊𝑡 −
𝛼
(𝑣𝑡) + 𝜖
𝑚𝑡
𝑊𝑡+1 = 𝑊𝑡 − 𝛼𝛻𝑡
http://ruder.io/optimizing-gradient-descent/index.html#adam
Demo model training and Tensorboard:
The result:
sentiment of our 1 million tweets and the Bitcoin rate
The result:
sentiment of our 1 million tweets and the Bitcoin rate
How about the ‘derivative’ of the sentiment?
• If the sentiment is getting better, the derivative is positive,
• If the sentiment is getting worse, the derivative is negative,
• If the sentiment is stable, the derivative is zero.
Discussion and conclusion
• Recurrent Neural Networks with LSTM are powerful tools to work with,
• The mathematics behind it are complicated, however the code is not that hard to understand,
• Many parameters to tune,
• Bitcoins and sentiment are not related according to this model.
Some possible improvements:
• Use a training set with the same kind of tweets as the actual set,
• Use other keywords in your tweets than only news and finance related topics
• Put a higher weight on tweets that were more retweeted than others.
Thank you all for coming
★Questions: https://www.linkedin.com/in/olaf-de-leeuw-6a2b073b/
★Code/Notebooks: https://github.com/olafdeleeuw/ODSC-London-2018

More Related Content

What's hot

Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
Omid Vahdaty
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
Omid Vahdaty
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
Lukas Tencer
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
DataRobot
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine Learning
CodeForFrankfurt
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
MLconf
 
Optimization as a model for few shot learning
Optimization as a model for few shot learningOptimization as a model for few shot learning
Optimization as a model for few shot learning
Katy Lee
 
H transformer-1d paper review!!
H transformer-1d paper review!!H transformer-1d paper review!!
H transformer-1d paper review!!
taeseon ryu
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Shahar Cohen
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Anoop Deoras
 
Neural Nets from Scratch
Neural Nets from ScratchNeural Nets from Scratch
Neural Nets from Scratch
Seth H. Weidman
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Saurabh Saxena
 
Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)
YerevaNN research lab
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
Vijay Ganti
 
Introduction To TensorFlow
Introduction To TensorFlowIntroduction To TensorFlow
Introduction To TensorFlow
Spotle.ai
 
Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better Mortgage
MLconf
 
Meta learning with memory augmented neural network
Meta learning with memory augmented neural networkMeta learning with memory augmented neural network
Meta learning with memory augmented neural network
Katy Lee
 
On Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and BeyondOn Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and Beyond
Eunjeong (Lucy) Park
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
ananth
 

What's hot (20)

Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine Learning
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Optimization as a model for few shot learning
Optimization as a model for few shot learningOptimization as a model for few shot learning
Optimization as a model for few shot learning
 
H transformer-1d paper review!!
H transformer-1d paper review!!H transformer-1d paper review!!
H transformer-1d paper review!!
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Neural Nets from Scratch
Neural Nets from ScratchNeural Nets from Scratch
Neural Nets from Scratch
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Introduction To TensorFlow
Introduction To TensorFlowIntroduction To TensorFlow
Introduction To TensorFlow
 
Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better Mortgage
 
Meta learning with memory augmented neural network
Meta learning with memory augmented neural networkMeta learning with memory augmented neural network
Meta learning with memory augmented neural network
 
On Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and BeyondOn Semi-Supervised Learning and Beyond
On Semi-Supervised Learning and Beyond
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
 

Similar to Dataworkz odsc london 2018

Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
NILESH VERMA
 
Introduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in PythonIntroduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in Python
Peadar Coyle
 
Deep learning - Chatbot
Deep learning - ChatbotDeep learning - Chatbot
Deep learning - Chatbot
Liam Bui
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
Manish Mishra
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
Kuppusamy P
 
Deep learning
Deep learningDeep learning
Deep learning
Ratnakar Pandey
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
Eran Shlomo
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
Yan Xu
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
HJ van Veen
 
Neural machine translation by jointly learning to align and translate.pptx
Neural machine translation by jointly learning to align and translate.pptxNeural machine translation by jointly learning to align and translate.pptx
Neural machine translation by jointly learning to align and translate.pptx
ssuser2624f71
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
Uwe Friedrichsen
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
Shirin Elsinghorst
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
Vidyasagar Bhargava
 
Bitcoin Price Prediction
Bitcoin Price PredictionBitcoin Price Prediction
Bitcoin Price Prediction
Kadambini Indurkar
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
nilesh405711
 
Internship project presentation_final_upload
Internship project presentation_final_uploadInternship project presentation_final_upload
Internship project presentation_final_upload
Suraj Rathore
 
Deep Learning Demystified
Deep Learning DemystifiedDeep Learning Demystified
Deep Learning Demystified
Affine Analytics
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
vatsal199567
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
Nicholas McClure
 

Similar to Dataworkz odsc london 2018 (20)

Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
Introduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in PythonIntroduction to Bayesian Analysis in Python
Introduction to Bayesian Analysis in Python
 
Deep learning - Chatbot
Deep learning - ChatbotDeep learning - Chatbot
Deep learning - Chatbot
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Deep learning
Deep learningDeep learning
Deep learning
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Neural machine translation by jointly learning to align and translate.pptx
Neural machine translation by jointly learning to align and translate.pptxNeural machine translation by jointly learning to align and translate.pptx
Neural machine translation by jointly learning to align and translate.pptx
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
 
Bitcoin Price Prediction
Bitcoin Price PredictionBitcoin Price Prediction
Bitcoin Price Prediction
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
 
Internship project presentation_final_upload
Internship project presentation_final_uploadInternship project presentation_final_upload
Internship project presentation_final_upload
 
Deep Learning Demystified
Deep Learning DemystifiedDeep Learning Demystified
Deep Learning Demystified
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 

Dataworkz odsc london 2018

  • 3. Can we predict the Bitcoin Price with LSTM Sentiment Analysis? Olaf de Leeuw Data Scientist at Dataworkz
  • 4. This is where the idea arose, after a full day of talks at ODSC London 2017
  • 5. Market Data Collector Twitter Data Collector Not only tweets with #Bitcoin but all kind of news related tweets were collected Collecting the data
  • 6. Exploring the Twitter data •All the Twitter data is stored in ElasticSearch…, •We don’t know exactly yet what it looks like…, •We want to create a Recurrent Neural Network with LSTM in Tensorflow…, •So it’s a good thing Python has an ElasticSearch and a Tensorflow module! Short demo
  • 7. Predicting the sentiment of a tweet: positive or negative? 1 Million tweets… How to analyze these tweets and how do we put them in a deep learning algorithm? Deep learning needs scalars or matrices of scalars as input. For example a convolutional neural network uses pixels of images for object recognition Likewise text/speech needs to be vectorized before analyzing it. “Only words or word encodings provide no useful information regarding the relationships that may exist between the individual symbols” (tensorflow.org). So vectorization of our tweets….
  • 9. The basic ideas behind a Word2Vec model Word2Vec model Neural Network with one hidden layer This hidden layer is a matrix with dimension N x D where D is the length of a vector representing a word. The input is a one-hot vector of a word and has dimension N x 1 where N is the number of words in your dictionary. The output layer is a vector with probabilities that a the input word is the neighbour of the words in this vector. This hidden layer is exactly what we are looking for!
  • 10.
  • 11. Pre-trained Word2Vec models • Available on Stanford website (https://nlp.stanford.edu/projects/glove/) • Data available with different number of words and several vector dimensions. • In this project a set of 400k words is used with vectors of dimension 50 x 1. • The data consist of a word list and a matrix: ❖The word list contains 400k words each represented by a number ❖The matrix has dimension 400k x 50, for each word a vector representation of length 50
  • 12. Long-Short term memories, why should we use them? source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Recurrent Neural Networks are sufficient if you want to predict for instance the sentiment of: “The movie was really bad” The problem arises when the relevant information is much further away or spread out over multiple sentences: “This is the best day ever. The weather is beautiful and I got a new job. However the movie I just saw was really bad” In a more simple recurrent network this may be predicted as negative. Long-Short term memories can deal with the information in the whole text.
  • 13. First an intuitive interpretation •The complete network consist of n such layers. •At each layer you put in the next word of your text, Xt, and add it to the already stored information. •A number of updates and calculations are done and finally there is some output, ht and we move on to the next layer. And now step by step…
  • 14. Step by Step: the main information line •On this line all the information is stored and this information loops through all the cells until all words a •Within each cell information is added, removed and updated.
  • 15. Step by Step: the forget gate •The next word is added to the cell, Xt, just like the information from the previous cell, ht-1. •The sigmoid function determines which information from ht-1 is kept, e.g.: When Xt is a new subject, you may want to forget the old one which is stored in the cell state at the main line •The outcome is multiplied with information from the cell state Ct-1.
  • 16. Step by Step: the forget gate - example • Assume the word at Xt is “bitcoin”. As earlier stated we use word vectors: • The vector is multiplied by a weight matrix Wf with dimension 50 x (num LSTM units) and after that a bias is added. In formula notation: • We work with 50d vectors and 64 LSTM units, so the formula gives us: • Finally this is put into the sigmoid function and the outcome goes to the cell state Ct • Together with the previous state ht-1 the complete equation becomes: 𝜎(𝑋𝑡 ∗ 𝑊𝑥,𝑓 + 𝑏𝑥,𝑓) 𝑋𝑡 ∗ 𝑊𝑥,𝑓 + 𝑏𝑥,𝑓 𝜎(𝑋𝑡 ∗ 𝑊𝑥,𝑓 + 𝑏𝑥,𝑓) 𝜎(ℎ𝑡−1 ∗ 𝑊ℎ,𝑓 + 𝑏ℎ,𝑓)
  • 17. Step by Step: the input gate •The input gate consists of two functions: 1. A sigmoid function is used to determine what kind of information we would like to store. e.g. the new subject 2. A tanh function is used to determine the content of the information, e.g. is the new subject male or female? •The output of these functions together is added to the current cell state Ct.
  • 18. Step by Step: the output gate •The output gate filters some information from the current cell state. •A sigmoid decides what we are going to output and the tanh function makes sure the values are between -1 and 1: If we saw a new subject the output will be whether the subject is male or female, singular or plural.
  • 19. The full model: Tweets (word indices) Globe word vectors Labels ([0,1] [1,0])
  • 20. Hyperparameters: There are a lot of choices you have to make before training the RNN with LSTM. • Length of the sequence: the number of LSTM cells. • Number of LSTM units: comparable with the number of units in a layer of a regular NN. • Iterations: how often you run the model during your training. Each iteration you run one batch. • Batch size: each iteration you run one batch of tweets. • Optimizer: the function that tries to optimize the loss. Often used functions are Gradient Descent and Adam. • DropoutWrapper and its probability: the probability of keeping informations, it helps to prevent from overfitting. • Learning rate: too big and you model may not converge, too small and it may take ages.
  • 21. Loss function: The loss function we use is softmax cross entropy: • Softmax function: it squashes the output vector with real numbers to a vector with real numbers between 0 and 1 and such that they add up to 1: 𝑆(𝑣)𝑖 = 𝑒𝑣𝑖 𝑘=1 𝑁 𝑒(𝑣𝑘) • Cross entropy is an often used alternative of the well known squared error and is defined by: 𝐻(𝑦, 𝑝) = − 𝑖 𝑦𝑖𝑙𝑜𝑔(𝑆𝑖) Where Si is the output of the softmax function. Cross entropy is only useful when the input is a probability distribution and therefore the Softmax function.
  • 22. Optimization of the loss function: The optimization functions used in this model are Gradient Descent and the Adam optimizer. The Adam optimizer is an extension of Stochastic Gradient Descent. The SGD is defined as SGD maintains a single learning rate for all parameter updates. Adam has learning rates for each network weight and they are separately adapted. • Adam: Adaptive Moment Estimation • Adam stores the first and second moments (mean and variance) of the decaying average of the past gr 𝑚𝑡 = 𝛽1𝑚𝑡 − 1 + (1 − 𝛽1)𝛻𝑡 𝑣𝑡 = 𝛽2𝑣𝑡 − 1 + (1 − 𝛽2)𝛻𝑡 2 These variables are used to update your parameters/weights used in the model 𝑊𝑡+1 = 𝑊𝑡 − 𝛼 (𝑣𝑡) + 𝜖 𝑚𝑡 𝑊𝑡+1 = 𝑊𝑡 − 𝛼𝛻𝑡 http://ruder.io/optimizing-gradient-descent/index.html#adam
  • 23. Demo model training and Tensorboard:
  • 24. The result: sentiment of our 1 million tweets and the Bitcoin rate
  • 25. The result: sentiment of our 1 million tweets and the Bitcoin rate
  • 26. How about the ‘derivative’ of the sentiment? • If the sentiment is getting better, the derivative is positive, • If the sentiment is getting worse, the derivative is negative, • If the sentiment is stable, the derivative is zero.
  • 27. Discussion and conclusion • Recurrent Neural Networks with LSTM are powerful tools to work with, • The mathematics behind it are complicated, however the code is not that hard to understand, • Many parameters to tune, • Bitcoins and sentiment are not related according to this model. Some possible improvements: • Use a training set with the same kind of tweets as the actual set, • Use other keywords in your tweets than only news and finance related topics • Put a higher weight on tweets that were more retweeted than others.
  • 28. Thank you all for coming ★Questions: https://www.linkedin.com/in/olaf-de-leeuw-6a2b073b/ ★Code/Notebooks: https://github.com/olafdeleeuw/ODSC-London-2018

Editor's Notes

  1. Leicester Square We wanted to learn about RNN’s with LSTM and sentiment analysis. Needed a cool topic, so bitcoin
  2. We build an application in Java that collects Twitter data and stores it in ES. We run the collector during a couple weeks We collected tweets with finance and news related items The bitcoin data is stored in MySQL
  3. Opportunity to learn some new things: ES I wanted to learn about LSTM and Tensorflow So I needed ES, Tensorflow and Recurrent NN —> python
  4. We collected 1 million tweets but a RNN needs vectors, no strings Example about image recognition Strings provide no useful info to a RNN How to convert the data? Vectorization
  5. Words related in semantics, meaning and context are closer to each other
  6. Word2Vec is a neural network: 1 hidden layer Input is a 1-hot vector: see picture next slide. Length is the number of words in your dictionary. In my case 400k Output of the NN is a vector with probabilities for all words in the dict that your input word is the neighbour. Hidden layer is the vector matrix we want. It has dim 400k X 50 and is the vector representation of all the words in our dictionary
  7. The hidden layer is the word vector matrix. We don’t need the output layer here
  8. You can train a word2vec yourself but you need a lot of text. It is not the purpose of this talk. So I used pre-trained model. There are sets available with vector dim from 50 to 300 So we split our tweets into words and each word in the tweet is converted to a word vector.
  9. Use RNN with LSTM when regular RNN is not good enough, so when there is too much information and when it’s spread out. Ref cholas blog
  10. N cells, usually about the number of words. In our case the max length of tweets is about 60. At each cell you put in a new word of your tweet In the cell the input of the new word and output of previous cell is used to update your information about the sentiment, about your prediction..
  11. Main layer, stores all relevant information This goes from beginning to end, the output The information is updated in each cell based on new words via multiplication and addition
  12. Next word added just like information about the previous state Sigmoid determines what to throw away from this —> 0 all, 1 nothing Example: a new subject may be interesting and you may want to throw away the old subject The output of sigmoid is multiplied with the current cell state to throw away this irrelevant data
  13. Bitcoin as a vector —> Xt via Globe Multiplication by weight matrix and add bias In the model we start with a random normal distributed weight matrix and a constant bias Via optimization algorithms such as SGD or Adam these weights and biases are updated The outcomes are multiplied with Ct to throw away the information you don’t need anymore
  14. AT the input gate you do 2 things: - determine which items you want to update, e.g. the new subject - determine what information you want to update: e.g. plural or singular, male or female This is added (not multiplied) to Ct because you want to add information
  15. In the last gate information is filtered which we would like to output This information is also sent to the forget gate of the next cell A sigmoid function determines which items are output, such as the new subject as in the previous example A tanh function on the cell state determines what information the model outputs at this timestep
  16. Start with all the tweets Split them to lists Create indices of words Create vectors with the Globe dictionary/dataset Run the RNN model with LSTM —> check loss, optimize with for example Adam Evaluate the output labels
  17. Explain hyperparameters Batch size and number iterations may influence the overfitting of your model. My example subset, with bs 64 and 100k iterations
  18. for each item you want the chances sum up to one —> softmax, e.g. 0,4 for pos and 0,6 for neg So in fact it creates a probability distribution Normal squared error causes non-convex functions for classification, therefore cross entropy. This makes sure we have a convex problem
  19. Adam is better suitable because learning rate for each parameter, SGD 1 for all Changing epsilon can help to prevent from fluctuations, in my model it didn’t
  20. One period without predictions, because I had no data. Skiing :)