SlideShare a Scribd company logo
1 of 37
‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›
Building a Stock Prediction system with
Machine Learning using Geode, Spring XD
e Spark MLLib
William Markito
@william_markito
Fred Melo
@fredmelo_br
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
It's all about DATA
Data Sources
Look for patterns
Prediction
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
medium avg
(x+1)
relative
strength (x)
medium avg (x)
price(x)
Machine Learning Model
(e.g. Linear Regression)
© Copyright 2014 Pivotal. All rights reserved.
Transform Sink
SpringXD
Extensible
Open-Source
Fault-Tolerant
Horizontally Scalable
Cloud-Native
Machine Learning
Enrich Filter
Split
Dashboard
Indicators
1
2
Predict
3
Real data
Simulator
/Stocks
/TechIndicators
/Predictions
‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›
Apache Geode (incubating)
Introduction
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Introduction
A distributed, memory-based data management platform for
data oriented apps that need:
High performance, scalability, resiliency and continuous
availability
Fast access to critical data set
Location aware distributed data processing
Event driven data architecture
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Cache
In-memory storage and management for
your data
Configurable through XML, Spring, Java
API or CLI
Collection of Region
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Region
Distributed java.util.Map on steroids
(Key/Value)
Consistent API regardless of where or how data
is stored
Observable (reactive)
Highly available, redundant on cache Member
(s).
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Region
Local, Replicated or Partitioned
In-memory or persistent
Redundant
LRU
Overflow
LOCAL
LOCAL_HEAP_LRU
LOCAL_OVERFLOW
LOCAL_PERSISTENT
LOCAL_PERSISTENT_OVERFLOW
PARTITION
PARTITION_HEAP_LRU
PARTITION_OVERFLOW
PARTITION_PERSISTENT
PARTITION_PERSISTENT_OVERFLOW
PARTITION_PROXY
PARTITION_PROXY_REDUNDANT
PARTITION_REDUNDANT
PARTITION_REDUNDANT_HEAP_LRU
PARTITION_REDUNDANT_OVERFLOW
PARTITION_REDUNDANT_PERSISTENT
PARTITION_REDUNDANT_PERSISTENT_OVERFLOW
REPLICATE
REPLICATE_HEAP_LRU
REPLICATE_OVERFLOW
REPLICATE_PERSISTENT
REPLICATE_PERSISTENT_OVERFLOW
REPLICATE_PROXY
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Member
A process that has a connection to the system
A process that has created a cache
Embeddable within your application
Client
Locator
Server
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Client cache
A process connected to the Geode server(s)
Can have a local copy of the data
Can be notified about events on the servers
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Listeners
CacheWriter / CacheListener
AsyncEventListener (queue / batch)
Parallel or Serial
Conflation
© Copyright 2014 Pivotal. All rights reserved. 19
Apache Geode (incubating)
Currently under incubation in Apache Software Foundation
Welcome contributions and contributors
Code and Patches
Bugs, feature requests
Documentation and content
Any form of feedback
© Copyright 2014 Pivotal. All rights reserved. 20
Code
New features
Bug fixes (patches)
Writing tests
Documentation
Wiki
Web site
User guides
Community
Join our mailing lists (Ask or answer)
Become a speaker
Find and report bugs
Testing a release candidate or beta
Apache Geode (incubating)
© Copyright 2014 Pivotal. All rights reserved. 21
JIRA - https://issues.apache.org/jira/browse/GEODE
GitHub - https://github.com/apache/incubator-geode
Mailing lists:
Development - dev@geode.incubator.apache.org
Users - user@geode.incubator.apache.org
Wiki - cwiki.apache.org/confluence/display/GEODE
StackOverflow - http://stackoverflow.com/questions/tagged/geode+or+gemfire
Apache Geode (incubating)
‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›
SpringXD
Introduction
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
 A stream is composed from modules. Each module is deployed to a container and its
channels are bound to the transport.
‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›
Apache Zeppelin
(incubating)
Introduction
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Web based REPL
Iterative & Exploratory
Support for Data Ingestion
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Multi interpreters
Markdown
Shell
Spark
Geode
Python…
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
Sharing through URLs without Reports
‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›
Apache Spark
Introduction
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
RDD
Dataframe
Driver
Worker
"An RDD in Spark is simply an immutable distributed collection of objects.
Each RDD is split into multiple partitions, which may be computed on different nodes
of the cluster. RDDs can contain any type of Python, Java, or Scala objects,
including user-defined classes."
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
RDD
Dataframe
Driver
Worker
“A dataframe is a distributed collection of rows organized into named columns. An
abstraction for selecting, filtering and plotting structured data (pandas), previously
known as SchemaRDD."
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Concepts
RDD
Dataframe
Driver
Worker
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Summary
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Summary
• Integration
• Spark, JDBC, Geode
• HDFS, Twitter, File, Mail…
• Data pipeline orchestration
• Intuitive DSL
• Streaming & Analytics
• Distributed and scalable
• Web based REPL
• Multiple Interpreters
• Apache Spark
• Markdown
• Flink
• Python
• Geode…
• Iterative & Exploratory
‹#›© 2015 Pivotal Software, Inc. All rights reserved.
Summary
• Fast data processing
• Columnar queries
• RDDs
• Machine Learning
• Analytics & Streaming
• Fast data store and processing
• In-memory & Persistent
• Highly Consistent
• Transaction processing
• Thousands of concurrent
clients
© Copyright 2014 Pivotal. All rights reserved. 36
Source Code
http://pivotal-open-source-hub.github.io/StockInference-Spark/
Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

More Related Content

What's hot

Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017Karanjeet Singh
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Databricks
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkLi Jin
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid DataWorks Summit
 
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...Spark Summit
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaSpark Summit
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbenchRan Wei
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101Data Con LA
 
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"Wes McKinney
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...Dremio Corporation
 
Docker data science pipeline
Docker data science pipelineDocker data science pipeline
Docker data science pipelineDataWorks Summit
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...DataWorks Summit
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaDatabricks
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoDatabricks
 
Enancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIEnancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIDatabricks
 
Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at UberDataWorks Summit
 

What's hot (20)

Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid
 
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
Secured (Kerberos-based) Spark Notebook for Data Science: Spark Summit East t...
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbench
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
Docker data science pipeline
Docker data science pipelineDocker data science pipeline
Docker data science pipeline
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu Ma
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
 
Enancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIEnancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AI
 
Geospatial data platform at Uber
Geospatial data platform at UberGeospatial data platform at Uber
Geospatial data platform at Uber
 

Viewers also liked

Apache Geode (incubating) Introduction with Docker
Apache Geode (incubating) Introduction with DockerApache Geode (incubating) Introduction with Docker
Apache Geode (incubating) Introduction with DockerWilliam Markito Oliveira
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine LearningCarol McDonald
 
Time Series Analysis with Spark
Time Series Analysis with SparkTime Series Analysis with Spark
Time Series Analysis with SparkSandy Ryza
 
Machine Learning with Spark MLlib
Machine Learning with Spark MLlibMachine Learning with Spark MLlib
Machine Learning with Spark MLlibTodd McGrath
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Libraryjeykottalam
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkDB Tsai
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Sparkdatamantra
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkDatabricks
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibTaras Matyashovsky
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkPetr Zapletal
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibDatabricks
 
Machine Learning With Spark
Machine Learning With SparkMachine Learning With Spark
Machine Learning With SparkShivaji Dutta
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingDerek Kane
 

Viewers also liked (15)

How to Contribute to Apache Geode
How to Contribute to Apache GeodeHow to Contribute to Apache Geode
How to Contribute to Apache Geode
 
Apache Geode (incubating) Introduction with Docker
Apache Geode (incubating) Introduction with DockerApache Geode (incubating) Introduction with Docker
Apache Geode (incubating) Introduction with Docker
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
Time Series Analysis with Spark
Time Series Analysis with SparkTime Series Analysis with Spark
Time Series Analysis with Spark
 
Machine Learning with Spark MLlib
Machine Learning with Spark MLlibMachine Learning with Spark MLlib
Machine Learning with Spark MLlib
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
 
Large-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache SparkLarge-Scale Machine Learning with Apache Spark
Large-Scale Machine Learning with Apache Spark
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlib
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on Spark
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlib
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
Machine Learning With Spark
Machine Learning With SparkMachine Learning With Spark
Machine Learning With Spark
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series Forecasting
 

Similar to Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingAll Things Open
 
피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일
피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일 피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일
피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일 VMware Tanzu Korea
 
Pivotal Cloud Platform Roadshow Keynote
Pivotal Cloud Platform Roadshow KeynotePivotal Cloud Platform Roadshow Keynote
Pivotal Cloud Platform Roadshow Keynotecornelia davis
 
Spark meets Spring
Spark meets SpringSpark meets Spring
Spark meets Springmark_fisher
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorCask Data
 
Building Cloud Native Applications with Oracle Autonomous Database.
Building Cloud Native Applications with Oracle Autonomous Database.Building Cloud Native Applications with Oracle Autonomous Database.
Building Cloud Native Applications with Oracle Autonomous Database.Oracle Developers
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8Janu Jahnavi
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8Janu Jahnavi
 
Reducing the Risks of Migrating Off Oracle
Reducing the Risks of Migrating Off OracleReducing the Risks of Migrating Off Oracle
Reducing the Risks of Migrating Off OracleEDB
 
Pivotal Digital Transformation Forum: Data Science Technical Overview
Pivotal Digital Transformation Forum: Data Science Technical OverviewPivotal Digital Transformation Forum: Data Science Technical Overview
Pivotal Digital Transformation Forum: Data Science Technical OverviewVMware Tanzu
 
Node summit workshop
Node summit workshopNode summit workshop
Node summit workshopShubhra Kar
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireJohn Blum
 
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...SAP Cloud Platform
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal GemfireIn-Memory Computing Summit
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahTurning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahData Con LA
 
Removing Barriers Between Dev and Ops
Removing Barriers Between Dev and OpsRemoving Barriers Between Dev and Ops
Removing Barriers Between Dev and OpsVMware Tanzu
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 

Similar to Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib (20)

Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
 
피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일
피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일 피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일
피보탈 클라우드 파운드리 밋업 - 2017년 2월 24일
 
Pivotal Cloud Platform Roadshow Keynote
Pivotal Cloud Platform Roadshow KeynotePivotal Cloud Platform Roadshow Keynote
Pivotal Cloud Platform Roadshow Keynote
 
Spark meets Spring
Spark meets SpringSpark meets Spring
Spark meets Spring
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
Building Cloud Native Applications with Oracle Autonomous Database.
Building Cloud Native Applications with Oracle Autonomous Database.Building Cloud Native Applications with Oracle Autonomous Database.
Building Cloud Native Applications with Oracle Autonomous Database.
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Reducing the Risks of Migrating Off Oracle
Reducing the Risks of Migrating Off OracleReducing the Risks of Migrating Off Oracle
Reducing the Risks of Migrating Off Oracle
 
Pivotal Digital Transformation Forum: Data Science Technical Overview
Pivotal Digital Transformation Forum: Data Science Technical OverviewPivotal Digital Transformation Forum: Data Science Technical Overview
Pivotal Digital Transformation Forum: Data Science Technical Overview
 
Node summit workshop
Node summit workshopNode summit workshop
Node summit workshop
 
Introducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFireIntroducing Apache Geode and Spring Data GemFire
Introducing Apache Geode and Spring Data GemFire
 
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
Overview and Walkthrough of the Application Programming Model with SAP Cloud ...
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahTurning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
 
IoT Use Cases with MapR
IoT Use Cases with MapRIoT Use Cases with MapR
IoT Use Cases with MapR
 
Marcin Szałowicz - MySQL Workbench
Marcin Szałowicz - MySQL WorkbenchMarcin Szałowicz - MySQL Workbench
Marcin Szałowicz - MySQL Workbench
 
Removing Barriers Between Dev and Ops
Removing Barriers Between Dev and OpsRemoving Barriers Between Dev and Ops
Removing Barriers Between Dev and Ops
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 

Recently uploaded

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 

Recently uploaded (20)

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 

Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

  • 1. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#› Building a Stock Prediction system with Machine Learning using Geode, Spring XD e Spark MLLib William Markito @william_markito Fred Melo @fredmelo_br
  • 2.
  • 3. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. It's all about DATA Data Sources Look for patterns Prediction
  • 4. ‹#›© 2015 Pivotal Software, Inc. All rights reserved.
  • 5. ‹#›© 2015 Pivotal Software, Inc. All rights reserved.
  • 6.
  • 7. ‹#›© 2015 Pivotal Software, Inc. All rights reserved.
  • 8. medium avg (x+1) relative strength (x) medium avg (x) price(x) Machine Learning Model (e.g. Linear Regression)
  • 9.
  • 10. © Copyright 2014 Pivotal. All rights reserved. Transform Sink SpringXD Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native Machine Learning Enrich Filter Split Dashboard Indicators 1 2 Predict 3 Real data Simulator /Stocks /TechIndicators /Predictions
  • 11. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#› Apache Geode (incubating) Introduction
  • 12. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Introduction A distributed, memory-based data management platform for data oriented apps that need: High performance, scalability, resiliency and continuous availability Fast access to critical data set Location aware distributed data processing Event driven data architecture
  • 13. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Cache In-memory storage and management for your data Configurable through XML, Spring, Java API or CLI Collection of Region
  • 14. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Region Distributed java.util.Map on steroids (Key/Value) Consistent API regardless of where or how data is stored Observable (reactive) Highly available, redundant on cache Member (s).
  • 15. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Region Local, Replicated or Partitioned In-memory or persistent Redundant LRU Overflow LOCAL LOCAL_HEAP_LRU LOCAL_OVERFLOW LOCAL_PERSISTENT LOCAL_PERSISTENT_OVERFLOW PARTITION PARTITION_HEAP_LRU PARTITION_OVERFLOW PARTITION_PERSISTENT PARTITION_PERSISTENT_OVERFLOW PARTITION_PROXY PARTITION_PROXY_REDUNDANT PARTITION_REDUNDANT PARTITION_REDUNDANT_HEAP_LRU PARTITION_REDUNDANT_OVERFLOW PARTITION_REDUNDANT_PERSISTENT PARTITION_REDUNDANT_PERSISTENT_OVERFLOW REPLICATE REPLICATE_HEAP_LRU REPLICATE_OVERFLOW REPLICATE_PERSISTENT REPLICATE_PERSISTENT_OVERFLOW REPLICATE_PROXY
  • 16. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Member A process that has a connection to the system A process that has created a cache Embeddable within your application Client Locator Server
  • 17. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Client cache A process connected to the Geode server(s) Can have a local copy of the data Can be notified about events on the servers
  • 18. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Listeners CacheWriter / CacheListener AsyncEventListener (queue / batch) Parallel or Serial Conflation
  • 19. © Copyright 2014 Pivotal. All rights reserved. 19 Apache Geode (incubating) Currently under incubation in Apache Software Foundation Welcome contributions and contributors Code and Patches Bugs, feature requests Documentation and content Any form of feedback
  • 20. © Copyright 2014 Pivotal. All rights reserved. 20 Code New features Bug fixes (patches) Writing tests Documentation Wiki Web site User guides Community Join our mailing lists (Ask or answer) Become a speaker Find and report bugs Testing a release candidate or beta Apache Geode (incubating)
  • 21. © Copyright 2014 Pivotal. All rights reserved. 21 JIRA - https://issues.apache.org/jira/browse/GEODE GitHub - https://github.com/apache/incubator-geode Mailing lists: Development - dev@geode.incubator.apache.org Users - user@geode.incubator.apache.org Wiki - cwiki.apache.org/confluence/display/GEODE StackOverflow - http://stackoverflow.com/questions/tagged/geode+or+gemfire Apache Geode (incubating)
  • 22. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#› SpringXD Introduction
  • 23. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts
  • 24. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts  A stream is composed from modules. Each module is deployed to a container and its channels are bound to the transport.
  • 25. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#› Apache Zeppelin (incubating) Introduction
  • 26. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Web based REPL Iterative & Exploratory Support for Data Ingestion
  • 27. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Multi interpreters Markdown Shell Spark Geode Python…
  • 28. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts Sharing through URLs without Reports
  • 29. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#› Apache Spark Introduction
  • 30. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts RDD Dataframe Driver Worker "An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes."
  • 31. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts RDD Dataframe Driver Worker “A dataframe is a distributed collection of rows organized into named columns. An abstraction for selecting, filtering and plotting structured data (pandas), previously known as SchemaRDD."
  • 32. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Concepts RDD Dataframe Driver Worker
  • 33. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Summary
  • 34. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Summary • Integration • Spark, JDBC, Geode • HDFS, Twitter, File, Mail… • Data pipeline orchestration • Intuitive DSL • Streaming & Analytics • Distributed and scalable • Web based REPL • Multiple Interpreters • Apache Spark • Markdown • Flink • Python • Geode… • Iterative & Exploratory
  • 35. ‹#›© 2015 Pivotal Software, Inc. All rights reserved. Summary • Fast data processing • Columnar queries • RDDs • Machine Learning • Analytics & Streaming • Fast data store and processing • In-memory & Persistent • Highly Consistent • Transaction processing • Thousands of concurrent clients
  • 36. © Copyright 2014 Pivotal. All rights reserved. 36 Source Code http://pivotal-open-source-hub.github.io/StockInference-Spark/