Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

1,231 views

Published on

Finance market prediction has always been one of the hottest topics in Data Science and Machine Learning. However, the prediction algorithm is just a small piece of the puzzle. Building a data stream pipeline that is constantly combining the latest price info with high volume historical data is extremely challenging using traditional platforms, requiring a lot of code and thinking about how to scale or move into the cloud. This session is going to walk-through the architecture and implementation details of an application built on top of open-source tools that demonstrate how to easily build a stock prediction solution with no source code - except a few lines of R and the web interface that will consume data through a RESTful endpoint, real-time. The solution leverages in-memory data grid technology for high-speed ingestion, combining streaming of real-time data and distributed processing for stock indicator algorithms.

Published in: Software
  • Hurry up, Live Webinar starts in 6 minute! it's about the FREE Training Webinar: An insider system that made $23,481 in last 6 weeks! ♥♥♥ http://scamcb.com/zcodesys/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

  1. 1. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SPRINGONE2GX WASHINGTON, DC Implementing a highly scalable Stock prediction system with R, Apache Geode and Spring XD Fred Melo
 @fredmelo_br William Markito
 @william_markito
  2. 2. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ About us Fred Melo Technical Director for Data fmelo@pivotal.io @fredmelo_br 2 William Markito Enterprise Architect for GemFire wmarkito@pivotal.io @william_markito
  3. 3. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 3
  4. 4. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 4 It's all about DATA Data Sources Look for patterns Prediction
  5. 5. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ What do we want to build? 5 "Smart System"
  6. 6. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ … in our specific case 6 Trading Data "Smart System" Historical Data Repository Learns with historical trends "How were the medium average price and relative strength reading when the latest failures happened? " Live data becomes historical over time Real-Time Evaluates live data “According to historical trends, there’s an 80% chance this stock prices might go downhill within the next hour" Historical
  7. 7. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ … in our specific case 7 Trading Data "Smart System" Historical Data Repository Learns with historical trends "How were the medium average price and relative strength reading when the latest failures happened? " Live data becomes historical over time Real-Time Evaluates live data “According to historical trends, there’s an 80% chance this stock prices might go downhill within the next hour" Historical
  8. 8. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 8 Live Data Data Temperature Hot Cold Greenplum DB Apache Geode / GemFire 1- Live data is ingested into the grid 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset Machine Learning model 5 - Re-training is triggered, updating the model with the latest historical data Spring XD Spring XD The ML pipeline data flow 2 - Trained ML model compares new data to historical patterns
  9. 9. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 9 Live Data Apache Geode / GemFire 1- Live data is ingested into the grid 2 - Trained ML model compares new data to historical patterns 3 - Results are pushed immediately to deployed applications Machine Learning model 4 - Re-training is triggered, updating the model with the latest historical data Spring XD Spring XD Simplified demo model Data Temperature Hot Warm
  10. 10. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 10 Transform Sink SpringXD Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native Machine Learning Enrich Filter Split Dashboard Indicators 1 2 Predict 3 Real data Simulator /Stocks /TechIndicators /Predictions
  11. 11. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 11 Eating it in small bites…
  12. 12. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 12 SpringXD GemFire
  13. 13. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ • Cache • Configurable through XML, ,Java • Region • Distributed j.u.Map on steroids • Highly available, redundant • Member • Locator, Server, Client • Callbacks • Listener, Writer, AsyncEventListener, Parallel/Serial Apache Geode & GemFire Concepts 13
  14. 14. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apache Geode & GemFire, why ? • Performance • Consistency • Resiliency 14
  15. 15. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apache Geode & GemFire, why ? 15 © Copyright 2014 Pivotal. All rights reserved. Pivotal GemFire High Availability and Fault Tolerance in 6 acts Failing data copies are replaced transparently Data is replicated to other clusters and sites (WAN) Network segmentations are identified and fixed automatically Client and cluster disconnections are handled gracefully Data is persisted on local disk for ultimate durability “split brain” Failed function executions are restarted automatically restart
  16. 16. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Some interesting cases… 16 China Railway
 Corporation 5,700 train stations 4.5 million tickets per day 20 million daily users 1.4 billion page views per day 40,000 visits per second * http://pivotal.io/big-data/pivotal-gemfire Indian Railways 7,000 stations 72,000 miles of track 23 million passengers daily 120,000 concurrent users 10,000 transactions per minute
  17. 17. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Use cases and industries 17 Indian RailwaysChina Railway Corporation World: ~7,349,000,000 ~36% of the world population Population: 1,251,695,6161,401,586,609
  18. 18. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ • Commercial product available since 2004 • Native clients in Java, C++, C#, REST • Event Subscriptions and Continuous Queries • Configurable WAN Gateway between clusters • Enterprise Support, commercial features Apache Geode & Pivotal GemFire • Open Sourced in April/2015 • Java Native Client, REST • 98% of GemFire API • Event subscriptions • ~30 contributors • Under Incubation 18
  19. 19. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 19 SpringXD GemFire
  20. 20. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SpringXD Basic Concepts • Streams • Pipelines • Sources • Sinks • Filters • Taps 20
  21. 21. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SpringXD Basic Concepts 21
  22. 22. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ A simple example 22 twittersearch --consumerKey=XXX —consumerSecret=XXX -- query=SpringOne2GX --outputType=application/json | gemfire-json- server --useLocator=true --host=localhost --port=10334 -- regionName=tweets --keyExpression=payload.getField('id_str') twittersearch --query=SpringOne2GX | gemfire-json-server --host=localhost--regionName=tweets
  23. 23. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 23 SpringXD GemFire
  24. 24. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apache Spark Concepts • RDD • Dataframe • Driver • Worker 24 "An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes."
  25. 25. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Apache Spark Concepts • RDD • Dataframe • Driver • Worker 25
  26. 26. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 26 medium avg (x+1) relative strength (x) medium avg (x) price(x) Machine Learning Model (e.g. Linear Regression)
  27. 27. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 27 medium avg (x+1) relative strength (x) medium avg (x) price(x) Machine Learning Model (e.g. Linear Regression) Features Label
  28. 28. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 28 Transform Sink SpringXD Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native Machine Learning Enrich Filter Split Dashboard Indicators 1 2 Predict 3 Real data Simulator /Stocks /TechIndicators /Predictions
  29. 29. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 29
  30. 30. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Learn more! 30 https://github.com/Pivotal-Open-Source-Hub/geode-security-samples https://github.com/Pivotal-Open-Source-Hub/WifiAnalyticsIoT https://github.com/Pivotal-Open-Source-Hub/geode-social-demo http://pivotal-open-source-hub.github.io/StockInference-Spark/
  31. 31. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a
 Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Thank you 31 @william_markito @fredmelo_br Related: Building Highly-Scalable Spring Applications with In-Memory, Distributed Data Grids by John Blum & Luke Shannon September 15, 2015 -10:30 - Salon M http://pivotal-open-source-hub.github.io/StockInference-Spark/

×