Deep	
  Learning	
  for	
  Stock	
  
Prediction
Yue	
  Zhang
My	
  research	
  areas
Machine Learning
Natural Language Processing
Applications
Text synthesis
Machine translation
Information extractionMarket prediction
Sentiment analysis
Syntactic analysis
This	
  talk
• Reading	
  news	
  from	
  the	
  Internet	
  and	
  
predicting	
  the	
  stock	
  market
Outline
• Event-­‐driven	
  predict
• Two	
  extensions
Introduction
• Is	
  it	
  possible?
– Random	
  walk	
  theory
– Efficient	
  market	
  hypothesis
– Human/algorithm	
  trading
• Examples
– Shares	
  of	
  Apple	
  Inc.	
  fell	
  as	
  trading	
  began	
  in	
  New	
  York	
  
on	
  Tuesday	
  morning,	
  the	
  day	
  after	
  former	
  CEO	
  Steve	
  
Jobs	
  passed	
  away
– Google’s	
  stock	
  falls	
  after	
  grim	
  earnings	
  come	
  out	
  early
Why	
  events?
• Previous	
  work
– Bag-­‐of-­‐words
– Named	
  Entities
– Noun	
  Phrases
• Examples
– Oracle	
  Corp	
  would	
  sue	
  Google	
  Inc.,	
  claiming	
  Google’s	
  
Android	
  operating	
  system…
– Microsoft	
  agrees	
  to	
  buy	
  Nokia’s	
  mobile	
  phone	
  
business	
  for	
  $	
  7.2	
  billion.
Method
• Event	
  Representation
– E=(O1,	
  P,	
  O2,	
  T)
– Actor
– Event
– Object
– Time
Method
• Event	
  Extraction
– Syntactic	
  parsing
– Open	
  information	
  extraction
Method
• Event	
  Generalization
– First,	
  we	
  construct	
  a	
  morphological	
  analysis	
  tool	
  
based	
  on	
  the	
  WordNet stemmer	
  to	
  extract	
  lemma	
  
forms	
  of	
  inflected	
  words
– Second,	
  we	
  generalize	
  each	
  verb	
  to	
  its	
  class	
  name	
  in	
  
VerbNet
• For	
  example
– Instant	
  view:	
  Private	
  sector	
  adds	
  114,000	
  jobs	
  in	
  July.
– (Private	
  sector,	
  adds,	
  114,000	
  jobs)
– (private	
  sector,	
  multiply_class,	
  114,000	
  job)
Method
• Model
– Input:	
  events
– Output:	
  two-­‐way	
  movement
• Training:	
  historical	
  data
• Testing:	
  coming	
  data
Method
• Prediction	
  Model
– Linear	
  model
• Most	
  previous	
  work	
  uses	
  linear	
  models	
  to	
  predict	
  the	
  stock	
  
market.	
  To	
  make	
  direct	
  comparisons,	
  this	
  paper	
  constructs	
  a	
  
linear	
  prediction	
  model	
  by	
  using	
  SVM	
  with	
  linear	
  kernel
– Nonlinear	
  model
• Intuitively,	
  the	
  relationship	
  between	
  events	
  and	
  the	
  stock	
  
market	
  may	
  be	
  more	
  complex	
  than	
  linear,	
  due	
  to	
  hidden	
  and	
  
indirect	
  relationships.	
  We	
  exploit	
  a	
  deep	
  neural	
  network	
  
model,	
  the	
  hidden	
  layers	
  of	
  which	
  is	
  useful	
  for	
  learning	
  such	
  
hidden	
  relationships
…
News	
  documents
φ1
Class	
  +1
The	
  polarity	
  of	
  the	
  stock	
  
price	
  movement	
   is	
  
positive
Class	
  -­‐1
The	
  polarity	
  of	
  the	
  stock	
  
price	
  movement	
   is	
  
negative
Input	
  
Layer
Output	
  
Layer
Hidden	
  
Layers
…
…
φ2 φ3 φM
Method
• Feature	
  Representation
– Bag-­‐of-­‐words
• TF*IDF
– Events
• O1,	
  P,	
  O2,	
  O1	
  +	
  P,	
  P	
  +	
  O2,	
  O1	
  +	
  P	
  +	
  O2
• For	
  Example
– (Microsoft,	
  buy,	
  Nokia's	
  mobile	
  phone	
  business)
– (#arg1=Microsoft,	
  #action=buy,	
  #arg2= Nokia's	
  mobile	
  phone	
  
business,	
  #arg1_action=Microsoft	
  buy,	
  #action_arg2=buy	
  
Nokia's	
  mobile	
  phone	
  business,	
  #arg1_action_arg2=	
  
Microsoft	
  buy	
  Nokia's	
  mobile	
  phone	
  business)
Experiments
• Data	
  Description
– We	
  use	
  publicly	
  available	
  financial	
  news	
  from	
  Reuters	
  
and	
  Bloomberg	
  over	
  the	
  period	
  from	
  October	
  2006	
  to	
  
November	
  2013.	
  This	
  time	
  span	
  witnesses	
  a	
  severe	
  
economic	
  downturn	
  in	
  2007-­‐2010,	
  followed	
  by	
  a	
  
modest	
  recovery	
  in	
  2011-­‐2013.	
  There	
  are	
  106,521	
  
documents	
  in	
  total	
  from	
  Reuters	
  News	
  and	
  447,145	
  
from	
  Bloomberg	
  News.
– We	
  mainly	
  focus	
  on	
  predicting	
  the	
  Standard	
  &Poor's	
  
500	
  stocks	
  (S&P	
  500)	
  index,	
  obtaining	
  indices	
  and	
  
stock	
  price	
  from	
  Yahoo	
  Finance.
Experiments
• Evaluation	
  Metrics
– Accuracy	
  and	
  MCC
• Overall	
  Results
Experiments
• Experiments	
  with	
  Different	
  Number	
  of	
  Hidden	
  
Layers	
  of	
  the	
  Deep	
  Neural	
  Network	
  Model
Experiments
• Experiments	
  with	
  Different	
  Quantities	
  of	
  Data
Experiments
• Individual	
  Stock	
  Prediction
Experiments
• Individual	
  Stock	
  Prediction
Experiments
• Individual	
  Stock	
  Prediction
Conclusion
• Events	
  are	
  useful.	
  
– Events	
  are	
  more	
  useful	
  representations	
  compared	
  to	
  bags-­‐of-­‐words	
  for	
  the	
  
task	
  of	
  stock	
  market	
  prediction.
• Hidden	
  relations	
  useful.
– A	
  deep	
  neural	
  network	
  model	
  can	
  be	
  more	
  accurate	
  on	
  predicting	
  the	
  stock	
  
market	
  compared	
  to	
  the	
  linear	
  model.
• Robust	
  results	
  obtained.
– Our	
  approach	
  can	
  achieve	
  stable	
  experiment	
  results	
  on	
  S&P	
  500	
  index	
  
prediction	
  and	
  individual	
  stock	
  prediction	
  over	
  a	
  large	
  amount	
  of	
  data	
  (eight	
  
years	
  of	
  stock	
  prices	
  and	
  more	
  than	
  550,000	
  pieces	
  of	
  news).
• Quality	
  of	
  information	
  is	
  more	
  important	
  than	
  quantity.	
  
– The	
  most	
  relevant	
  information	
   (i.e.	
  news	
  title	
  vs news	
  content,	
  individual	
  
company	
  news	
  vs all	
  news)	
  is	
  better	
  than	
  more,	
  but	
  less	
  relevant	
  information.
Two	
  extensions
• Better event encodings
• Long term history
Two	
  extensions
• Event	
  sparsity
– Using	
  structured	
  event	
  to	
  predict	
  stock	
  market	
  
movement	
  suffers	
  from	
  increased	
  data	
  sparsity
(Actor	
  =	
  Microsoft,	
  Action	
  =	
  sues,	
  Object	
  =	
  Barnes	
  &	
  Noble)
Two	
  extensions
• Modeling	
  long-­‐term	
  effect	
  of	
  events
– The	
  effect	
  becomes	
  weaker
– Little	
  has	
  been	
  done
Event	
  Embedding
• Previous work
– Learning entity embedding (Socher et al. 2013)
Neural	
  Tensor	
  Network
𝑓(𝑒$
% 𝑊 $:( 𝑒) + 𝑉
,-
,.
+ 𝑏)𝑓(𝑊𝑥 + 𝑏)
Neural Network Neural Tensor Network
Neural Tensor Network for Event Embedding
O1 T1 P T2 O2
R1 R2
U
T3
O1 T1 P
R1
𝑅$ = 	
   𝑓(𝑂$
%
𝑇$
[$:(]
𝑃 + 𝑉 :-
;
+ 𝑏)
Neural Tensor Network for Event Embedding
Training
• Minimize the margin loss
• 500 iterations
• Standard back-­‐propagation
Random replace with
an object
Regulation weight,set
to 0.0001
Parameters
Deep	
  Prediction	
  Model
• We model long-­‐, mid-­‐, short-­‐term events
– Long-­‐term events (Last month)
– Mid-­‐term events (Last week)
– Short-­‐term events (Last day)
Deep	
  Prediction	
  Model
• Architecture
Deep	
  Prediction	
  Model
• Convolution and Max-­‐pooling
– Convolution layer to obtain local feature
– Max-­‐pooling to determine the global
representativefeature
Experiments
• Baselines
Input Method
Luss and d’Aspremont [2012] Bag of words NN
Ding et al. [2014] (E-NN) Structured event NN
WB-NN Word embedding NN
WB-CNN Word embedding CNN
E-CNN Structured event CNN
EB-NN Event embedding NN
EB-CNN Event embedding CNN
Experiments
• Finds
– Events	
  are	
  better	
  features	
  than	
  words
– Reducing	
  sparsity if	
  helpful	
  in	
  the	
  task
– CNN-­‐based	
  is	
  more	
  powerful
Experiments
• 15	
  companies	
  from	
  S&P	
  500
– Consists	
  of	
  High-­‐,mid-­‐ and	
  low-­‐ranking	
  companies
– Evaluation	
  metric:	
  Accuracy	
  and	
  MCC
Conclusion
• Event	
  embeddings-­‐based	
  document	
  
representations	
  are	
  better	
  than	
  discrete	
  
events-­‐based	
  methods
• Deep	
  CNN	
  can	
  help	
  capture	
  longer-­‐term	
  
influence	
  of	
  news	
  event
Current
• More technical enhancements
• More	
  markets
– China’s	
  A	
  market
– Chinese	
  syntactic	
  and	
  semantic	
  analysis
– Chinese	
  Open	
  IE

Deep Learning for Stock Prediction

  • 1.
    Deep  Learning  for  Stock   Prediction Yue  Zhang
  • 2.
    My  research  areas MachineLearning Natural Language Processing Applications Text synthesis Machine translation Information extractionMarket prediction Sentiment analysis Syntactic analysis
  • 3.
    This  talk • Reading  news  from  the  Internet  and   predicting  the  stock  market
  • 4.
  • 5.
    Introduction • Is  it  possible? – Random  walk  theory – Efficient  market  hypothesis – Human/algorithm  trading • Examples – Shares  of  Apple  Inc.  fell  as  trading  began  in  New  York   on  Tuesday  morning,  the  day  after  former  CEO  Steve   Jobs  passed  away – Google’s  stock  falls  after  grim  earnings  come  out  early
  • 6.
    Why  events? • Previous  work – Bag-­‐of-­‐words – Named  Entities – Noun  Phrases • Examples – Oracle  Corp  would  sue  Google  Inc.,  claiming  Google’s   Android  operating  system… – Microsoft  agrees  to  buy  Nokia’s  mobile  phone   business  for  $  7.2  billion.
  • 7.
    Method • Event  Representation –E=(O1,  P,  O2,  T) – Actor – Event – Object – Time
  • 8.
    Method • Event  Extraction –Syntactic  parsing – Open  information  extraction
  • 9.
    Method • Event  Generalization –First,  we  construct  a  morphological  analysis  tool   based  on  the  WordNet stemmer  to  extract  lemma   forms  of  inflected  words – Second,  we  generalize  each  verb  to  its  class  name  in   VerbNet • For  example – Instant  view:  Private  sector  adds  114,000  jobs  in  July. – (Private  sector,  adds,  114,000  jobs) – (private  sector,  multiply_class,  114,000  job)
  • 10.
    Method • Model – Input:  events – Output:  two-­‐way  movement • Training:  historical  data • Testing:  coming  data
  • 11.
    Method • Prediction  Model –Linear  model • Most  previous  work  uses  linear  models  to  predict  the  stock   market.  To  make  direct  comparisons,  this  paper  constructs  a   linear  prediction  model  by  using  SVM  with  linear  kernel – Nonlinear  model • Intuitively,  the  relationship  between  events  and  the  stock   market  may  be  more  complex  than  linear,  due  to  hidden  and   indirect  relationships.  We  exploit  a  deep  neural  network   model,  the  hidden  layers  of  which  is  useful  for  learning  such   hidden  relationships
  • 12.
    … News  documents φ1 Class  +1 The  polarity  of  the  stock   price  movement   is   positive Class  -­‐1 The  polarity  of  the  stock   price  movement   is   negative Input   Layer Output   Layer Hidden   Layers … … φ2 φ3 φM
  • 13.
    Method • Feature  Representation –Bag-­‐of-­‐words • TF*IDF – Events • O1,  P,  O2,  O1  +  P,  P  +  O2,  O1  +  P  +  O2 • For  Example – (Microsoft,  buy,  Nokia's  mobile  phone  business) – (#arg1=Microsoft,  #action=buy,  #arg2= Nokia's  mobile  phone   business,  #arg1_action=Microsoft  buy,  #action_arg2=buy   Nokia's  mobile  phone  business,  #arg1_action_arg2=   Microsoft  buy  Nokia's  mobile  phone  business)
  • 14.
    Experiments • Data  Description –We  use  publicly  available  financial  news  from  Reuters   and  Bloomberg  over  the  period  from  October  2006  to   November  2013.  This  time  span  witnesses  a  severe   economic  downturn  in  2007-­‐2010,  followed  by  a   modest  recovery  in  2011-­‐2013.  There  are  106,521   documents  in  total  from  Reuters  News  and  447,145   from  Bloomberg  News. – We  mainly  focus  on  predicting  the  Standard  &Poor's   500  stocks  (S&P  500)  index,  obtaining  indices  and   stock  price  from  Yahoo  Finance.
  • 15.
    Experiments • Evaluation  Metrics –Accuracy  and  MCC • Overall  Results
  • 16.
    Experiments • Experiments  with  Different  Number  of  Hidden   Layers  of  the  Deep  Neural  Network  Model
  • 17.
    Experiments • Experiments  with  Different  Quantities  of  Data
  • 18.
  • 19.
  • 20.
  • 21.
    Conclusion • Events  are  useful.   – Events  are  more  useful  representations  compared  to  bags-­‐of-­‐words  for  the   task  of  stock  market  prediction. • Hidden  relations  useful. – A  deep  neural  network  model  can  be  more  accurate  on  predicting  the  stock   market  compared  to  the  linear  model. • Robust  results  obtained. – Our  approach  can  achieve  stable  experiment  results  on  S&P  500  index   prediction  and  individual  stock  prediction  over  a  large  amount  of  data  (eight   years  of  stock  prices  and  more  than  550,000  pieces  of  news). • Quality  of  information  is  more  important  than  quantity.   – The  most  relevant  information   (i.e.  news  title  vs news  content,  individual   company  news  vs all  news)  is  better  than  more,  but  less  relevant  information.
  • 22.
    Two  extensions • Betterevent encodings • Long term history
  • 23.
    Two  extensions • Event  sparsity – Using  structured  event  to  predict  stock  market   movement  suffers  from  increased  data  sparsity (Actor  =  Microsoft,  Action  =  sues,  Object  =  Barnes  &  Noble)
  • 24.
    Two  extensions • Modeling  long-­‐term  effect  of  events – The  effect  becomes  weaker – Little  has  been  done
  • 25.
    Event  Embedding • Previouswork – Learning entity embedding (Socher et al. 2013)
  • 26.
    Neural  Tensor  Network 𝑓(𝑒$ %𝑊 $:( 𝑒) + 𝑉 ,- ,. + 𝑏)𝑓(𝑊𝑥 + 𝑏) Neural Network Neural Tensor Network
  • 27.
    Neural Tensor Networkfor Event Embedding O1 T1 P T2 O2 R1 R2 U T3
  • 28.
    O1 T1 P R1 𝑅$=   𝑓(𝑂$ % 𝑇$ [$:(] 𝑃 + 𝑉 :- ; + 𝑏) Neural Tensor Network for Event Embedding
  • 29.
    Training • Minimize themargin loss • 500 iterations • Standard back-­‐propagation Random replace with an object Regulation weight,set to 0.0001 Parameters
  • 30.
    Deep  Prediction  Model •We model long-­‐, mid-­‐, short-­‐term events – Long-­‐term events (Last month) – Mid-­‐term events (Last week) – Short-­‐term events (Last day)
  • 31.
  • 32.
    Deep  Prediction  Model •Convolution and Max-­‐pooling – Convolution layer to obtain local feature – Max-­‐pooling to determine the global representativefeature
  • 33.
    Experiments • Baselines Input Method Lussand d’Aspremont [2012] Bag of words NN Ding et al. [2014] (E-NN) Structured event NN WB-NN Word embedding NN WB-CNN Word embedding CNN E-CNN Structured event CNN EB-NN Event embedding NN EB-CNN Event embedding CNN
  • 34.
    Experiments • Finds – Events  are  better  features  than  words – Reducing  sparsity if  helpful  in  the  task – CNN-­‐based  is  more  powerful
  • 35.
    Experiments • 15  companies  from  S&P  500 – Consists  of  High-­‐,mid-­‐ and  low-­‐ranking  companies – Evaluation  metric:  Accuracy  and  MCC
  • 36.
    Conclusion • Event  embeddings-­‐based  document   representations  are  better  than  discrete   events-­‐based  methods • Deep  CNN  can  help  capture  longer-­‐term   influence  of  news  event
  • 37.
    Current • More technicalenhancements • More  markets – China’s  A  market – Chinese  syntactic  and  semantic  analysis – Chinese  Open  IE