Real time stock processing with apache nifi, apache flink and apache kafka
1.
Real-Time Stock Processing
With Apache NiFi, Apache Flink and Apache Kafka
Timothy Spann, Principal DataFlow Field Engineer @
Pierre Villard, Senior Product Manager @
2.
Who?
Tim Spann
@PaasDev // Blog: www.datainmotion.dev
Principal DataFlow Field Engineer. Princeton Future of Data Meetup.
https://github.com/tspannhw/EverythingApacheNiFi
https://community.cloudera.com/t5/Community-Articles/Real-Time-Stock-Processing-With-A
pache-NiFi-and-Apache-Kafka/ta-p/249221
Pierre Villard
Twitter & Github - @pvillard31 // Blog: www.pierrevillard.com
Committer and PMC member for Apache NiFi (in the community since 2015)
Senior Product Manager at Cloudera for products around Apache NiFi, NiFi Registry and MiNiFi
Previously at Google & Hortonworks
3.
What?
This talk is about ingesting real-time data from many sources and build a dashboard on top it to track in
real time what our stocks are.
This use case is a good example to show the combination of some of the best Apache solutions for
streaming applications.
4.
NiFi, Kafka and Flink in a few numbers
- Apache NiFi (version 1.12.x) - created and open sourced by the NSA - initial release in 2006
350+ contributors, 1200+ people in Slack, 3.1M+ docker pulls
Many sub-projects: NiFi, MiNiFi Java, MiNiFi C++, NiFi Registry, etc
- Apache Kafka (version 2.6.x) - created and open sourced by LinkedIn - initial release in 2011
700+ contributors
- Apache Flink (version 1.11.x) - initial release in 2011
750+ contributors, 2nd top repository by number of commits, top active project on mailing lists
6.
Analyze
Streaming OLAP
Analytics & Time Series
Store Powered by
Druid & Kudu
Buffer
Apache Kafka
Topics
Ingest Gateway
Powered by Kafka
Distribute
Apache NiFi
Data Flow Apps
Powered by NiFi
Buffer
Apache Kafka
Syndicate
topics
Syndicate Services
Powered by Kafka
Collect
Syndicate
topics
Syndicate Services
Powered by Kafka
Replication /
Data Deployment
Analyze
Streaming Analytics Apps
Stream Processing
Powered by Flink
Streaming Reference Architecture
Data Collection
at the Edge
Apache NiFi / MiNiFi
- sensors, IoT
- databases
- file systems
- app sidecar
- live streams
- MQ
- logs
- network
Anything… you
name it!
7.
Where?
CDP services are optimized for the elastic
compute & ‘always-on’ storage services provided
by any cloud provider
Web service hosted and managed by Cloudera
Hosted in the your cloud environment, but
managed by the CDP Management Console
Shared Data Experience (SDX) technologies form
a secure and governed data lake backed by object
storage (S3, ADLS, GCS)
Flow Management Streams Messaging Streaming Analytics
8.
How? This use case architecture
Stock Data
Logs
Errors
Aggregates
Other data
SQL
Analytics
It appears that you have an ad-blocker running. By whitelisting SlideShare on your ad-blocker, you are supporting our community of content creators.
Hate ads?
We've updated our privacy policy.
We’ve updated our privacy policy so that we are compliant with changing global privacy regulations and to provide you with insight into the limited ways in which we use your data.
You can read the details below. By accepting, you agree to the updated privacy policy.