SlideShare a Scribd company logo
Developing High Frequency
Indicators Using Real-Time Tick Data
on Apache Superset and Druid
CBRT Big Data Team
Emre Tokel, Kerem Başol, M. Yağmur Şahin
Zekeriya Besiroglu / Komtas Bilgi Yonetimi
21 March 2019 Barcelona
Agenda
WHO WE ARE
CBRT & Our Team
PROJECT DETAILS
Before, Test Cluster,
Phase 1-2-3, Prod
Migration
HIGH FREQUENCY
INDICATORS
Importance & Goals
CURRENT ARCHITECTURE
Apache Kafka, Spark,
Druid & Superset
WORK IN
PROGRESS
Further analyses
FUTURE PLANS
6
5
4
3
2
1
Who We Are
1
Our Solutions
Data Management
• Data Governance Solutions
• Next Generation Analytics
• 360 Engagement
• Data Security
Analytics
• Data Warehouse Solutions
• Customer Journey Analytics
• Advanced Marketing Analytics Solutions
• Industry-specific analytic use cases
• Online Customer Data Platform
• IoT Analytics
• Analytic Lab Solution
Big Data & AI
• Big Data & AI Advisory Services
• Big Data & AI Accelerators
• Data Lake Foundation
• EDW Optimization / Offloading
• Big Data Ingestion and Governance
• AI Implementation – Chatbot
• AI Implementation – Image Recognition
Security Analytics
• Security Analytic Advisory Services
• Integrated Law Enforcement Solutions
• Cyber Security Solutions
• Fraud Analytics Solutions
• Governance, Risk & Compliance Solutions
• +20 IT , +18 DB&DWH
• +7 BIG DATA
• Lead Archtitect &Big Data /Analytics
@KOMTAS
• Instructor&Consultant
• ITU,MEF,Şehir Uni. BigData Instr.
• Certified R programmer
• Certified Hadoop Administrator
Our Organization
§ The Central Bank of the Republic of Turkey is primarily responsible for steering the
monetary and exchange rate policies in Turkey.
o Price stability
o Financial stability
o Exchange rate regime
o The privilege of printing and issuing banknotes
o Payment systems
• Big Data Engineer• Big Data Engineer
M. Yağmur Şahin Emre Tokel Kerem Başol
• Big Data Team Leader
High Frequency
Indicators
2
1
Importance and Goals
§ To observe foreign exchange markets in real-time
o Are there any patterns regarding to specific time intervals during the day?
o Is there anything to observe before/after local working hours throughout the whole day?
o What does the difference between bid/ask prices tell us?
§ To be able to detect risks and take necessary policy measures in a timely manner
o Developing liquidity and risk indicators based real-time tick data
o Visualizing observations for decision makers in real-time
o Finally, discovering possible intraday seasonality
§ Wouldn’t it be great to be able to correlate with news flow as well?
Project Details 3
2
1
Development of High Frequency Indicators Using Real-Time Tick
Data on Apache Superset and Druid
Phase 1
Prod
migratio
n
Next
phases
Test
Cluster
Phase 2 Phase 3
Test Cluster
§ Our first studies on big data have started on very humble servers
o 5 servers with 32 GB RAM for each
o 3 TB storage
§ HDP 2.6.0.3 installed
o Not the latest version back then
§ Technical difficulties
o Performance problems
o Apache Druid indexing
o Apache Superset maturity
Development of High Frequency Indicators Using Real-Time Tick
Data on Apache Superset and Druid
Phase 1
Prod
migratio
n
Next
phases
Test
Cluster
Phase 2 Phase 3
TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI
Thomson Reuters Enterprise Platform (TREP)
§ Thomson Reuters provides its subscribers with an enterprise platform that they can
collect the market data as it is generated
§ Each financial instrument on TREP has a unique code called RIC
§ The event queue implemented by the platform can be consumed with the provided
Java SDK
§ We developed a Java application for consuming this event queue to collect tick-data
according to required RICs
TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI
Apache Kafka
§ The data flow is very fast and quite dense
o We published the messages containing tick data collected by our Java application to a message
queue
o Twofold analysis: Batch and real-time
§ We decided to use Apache Kafka residing on our test big data cluster
§ We created a topic for each RIC on Apache Kafka and published data to related topics
TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI
Apache NiFi
§ In order to manage the flow, we decided to use Apache NiFi
§ We used KafkaConsumer processor to consume messages from Kafka queues
§ The NiFi flow was designed to be persisted on MongoDB
Our NiFi
Flow
TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI
MongoDB
§ We had prepared data in JSON format with our Java application
§ Since we have MongoDB installed on our enterprise systems, we decided to persist
this data to MongoDB
§ Although MongoDB is not a part of HDP, it seemed as a good choice for our
researchers to use this data in their analyses
TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI
Apache Zeppelin
§ We provided our researchers with access to Apache Zeppelin and connection to
MongoDB via Python
§ By doing so, we offered an alternative to the tools on local computers and provided a
unified interface for financial analysis
Business Intelligence on Client Side
§ Our users had to download daily tick-data manually from their Thomson Reuters
Terminals and work on Excel
§ Users were then able to access tick-data using Power BI
o We also provided our users with a news timeline along with the tick-data
We needed more!
§ We had to visualize the data in real-time
o Analysis on persisted data using MongoDB, PowerBI and Apache Zeppelin was not enough
TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI
Development of High Frequency Indicators Using Real-Time Tick
Data on Apache Superset and Druid
Phase 1
Prod
migratio
n
Next
phases
Test
Cluster
Phase 2 Phase 3
TREP
API
Apache
Kafka
Apache
Druid
Apache
Superset
Apache Druid
§ We needed a database which was able to:
o Answer ad-hoc queries (slice/dice) for a limited window efficiently
o Store historic data and seamlessly integrate current and historic data
o Provide native integration with possible real-time visualization frameworks (preferably from
Apache stack)
o Provide native integration with Apache Kafka
§ Apache Druid addressed all the aforementioned requirements
§ Indexing task was achieved using Tranquility
TREP
API
Apache
Kafka
Apache
Druid
Apache
Superset
Apache Superset
§ Apache Superset was the obvious alternative for real-time visualization since tick-data
was stored on Apache Druid
o Native integration with Apache Druid
o Freely available on Hortonworks service stack
§ We prepared real-time dashboards including:
o Transaction Count
o Bid / Ask Prices
o Contributor Distribution
o Bid - Ask Spread
We needed more, again!
§ Reliability issues with Druid
§ Performance issues
§ Enterprise integration requirements
Development of High Frequency Indicators Using Real-Time Tick
Data on Apache Superset and Druid
Phase 1
Prod
migratio
n
Next
phases
Test
Cluster
Phase 2 Phase 3
Architecture
Internet Data
Enterprise Content
Social Media/Media
Micro Level Data
Commercial Data Vendors
Ingestion
Big Data Platform Data Science
GovernanceData Sources
Development of High Frequency Indicators Using Real-Time Tick
Data on Apache Superset and Druid
Phase 1
Prod
migratio
n
Next
phases
Test
Cluster
Phase 2 Phase 3
TREP API Apache Kafka
Apache
Hive + Druid
Integration
Apache Spark
Apache
Superset
Apache Hive + Druid Integration
§ After setting up our production environment (using HDP 3.0.1.0) and started to
feed data, we realized that data were scattered and we were missing the option to
co-utilize these different data sources
§ We then realized that Apache Hive was already providing Kafka & Druid indexing
service in the form of a simple table creation and querying facility for Druid from
Hive
TREP API Apache Kafka
Apache
Hive + Druid
Integration
Apache Spark
Apache
Superset
Apache Spark
§ Due to additional calculation requirements of our users, we decided to utilize Apache
Spark
§ With Apache Spark 2.4, we used Spark Streaming and Spark SQL contexts together in
the same application
§ In our Spark application
o For every 5 seconds, a 30-second window is created
o On each window, outlier boundaries are calculated
o Outlier data points are detected
Current Architecture
4
3
2
1
Current Architecture & Progress So Far
Java Application
Kafka Topic (real-time)
Kafka Topic (windowed)
TREP Event Queue
Consume Publish
Spark Application
Consume
Publish
Druid Datasource
(real-time)
Druid Datasource
(windowed)
Superset Dashboard
(tick data)
Superset Dashboard
(outlier)
TREP Data Flow
Windowed Spark Streaming
Tick-Data Dashboard
Outlier Dashboard
Work in Progress
5
4
3
2
1
Implementing…
§ Moving average calculation (20-day window)
§ Volatility Indicator
§ Average True Range Indicator (moving average)
o [ max(t) - min(t) ]
o [ max(t) - close(t-1) ]
o [ max(t) - close(t-1) ]
Future Plans
6
5
4
3
2
1
To-Do List
§ Matching data subscription
§ Bringing historical tick data into real-time analysis
§ Possible use of machine learning for intraday indicators
Thank you!
Q & A

More Related Content

What's hot

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
Guang Xu
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
kbajda
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
 Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit... Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Databricks
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightThe Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
Databricks
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
Milind Bhandarkar
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
Martin Traverso
 
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackKibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stack
Sylvain Wallez
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
Novita Sari
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoop
Prasanna Rajaperumal
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
HostedbyConfluent
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
Tao Feng
 
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and StorageBuilding Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Databricks
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
Databricks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform
HANCOM MDS
 

What's hot (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
 Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit... Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightThe Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackKibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stack
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
 
Hoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoopHoodie: Incremental processing on hadoop
Hoodie: Incremental processing on hadoop
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and StorageBuilding Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform
 

Similar to Developing high frequency indicators using real time tick data on apache superset and druid

Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
DataWorks Summit
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
SnappyData
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
Sheetal Dolas
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Big Data Spain
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
kgshukla
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
Databricks
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Landon Robinson
 
PNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsPNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data Analytics
John Evans
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
confluent
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
DataWorks Summit
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
Swami Sundaramurthy
 
Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2
Traveloka
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Landon Robinson
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
Joan Viladrosa Riera
 
Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
Apache Spark Streaming + Kafka 0.10 with Joan ViladrosarieraApache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
Spark Summit
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Databricks
 

Similar to Developing high frequency indicators using real time tick data on apache superset and druid (20)

Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Apache Spark Streaming
Apache Spark StreamingApache Spark Streaming
Apache Spark Streaming
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
PNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsPNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data Analytics
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
 
Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2Traveloka's data journey — Traveloka data meetup #2
Traveloka's data journey — Traveloka data meetup #2
 
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
Spark + AI Summit 2019: Apache Spark Listeners: A Crash Course in Fast, Easy ...
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
 
Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
Apache Spark Streaming + Kafka 0.10 with Joan ViladrosarieraApache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
 

More from Zekeriya Besiroglu

Bigdata : Big picture
Bigdata : Big pictureBigdata : Big picture
Bigdata : Big picture
Zekeriya Besiroglu
 
Oracle Rac Performance Tunning Tips&Tricks
Oracle Rac Performance Tunning Tips&TricksOracle Rac Performance Tunning Tips&Tricks
Oracle Rac Performance Tunning Tips&Tricks
Zekeriya Besiroglu
 
Weblogic performance tips&tricks
Weblogic performance tips&tricksWeblogic performance tips&tricks
Weblogic performance tips&tricks
Zekeriya Besiroglu
 
Dba için oracle veritabanı 11g yeni özellikleri
Dba için oracle veritabanı 11g yeni özellikleriDba için oracle veritabanı 11g yeni özellikleri
Dba için oracle veritabanı 11g yeni özellikleri
Zekeriya Besiroglu
 
Oracle Cloud G in gidişi C nin gelişi
Oracle Cloud G in gidişi C nin gelişiOracle Cloud G in gidişi C nin gelişi
Oracle Cloud G in gidişi C nin gelişi
Zekeriya Besiroglu
 
Oracle Database & Oracle Datawarehouse Best Practices
Oracle Database & Oracle Datawarehouse Best PracticesOracle Database & Oracle Datawarehouse Best Practices
Oracle Database & Oracle Datawarehouse Best Practices
Zekeriya Besiroglu
 

More from Zekeriya Besiroglu (7)

Bigdata : Big picture
Bigdata : Big pictureBigdata : Big picture
Bigdata : Big picture
 
Oracle Rac Performance Tunning Tips&Tricks
Oracle Rac Performance Tunning Tips&TricksOracle Rac Performance Tunning Tips&Tricks
Oracle Rac Performance Tunning Tips&Tricks
 
Oracle semineri,
Oracle semineri, Oracle semineri,
Oracle semineri,
 
Weblogic performance tips&tricks
Weblogic performance tips&tricksWeblogic performance tips&tricks
Weblogic performance tips&tricks
 
Dba için oracle veritabanı 11g yeni özellikleri
Dba için oracle veritabanı 11g yeni özellikleriDba için oracle veritabanı 11g yeni özellikleri
Dba için oracle veritabanı 11g yeni özellikleri
 
Oracle Cloud G in gidişi C nin gelişi
Oracle Cloud G in gidişi C nin gelişiOracle Cloud G in gidişi C nin gelişi
Oracle Cloud G in gidişi C nin gelişi
 
Oracle Database & Oracle Datawarehouse Best Practices
Oracle Database & Oracle Datawarehouse Best PracticesOracle Database & Oracle Datawarehouse Best Practices
Oracle Database & Oracle Datawarehouse Best Practices
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

Developing high frequency indicators using real time tick data on apache superset and druid

  • 1. Developing High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid CBRT Big Data Team Emre Tokel, Kerem Başol, M. Yağmur Şahin Zekeriya Besiroglu / Komtas Bilgi Yonetimi 21 March 2019 Barcelona
  • 2. Agenda WHO WE ARE CBRT & Our Team PROJECT DETAILS Before, Test Cluster, Phase 1-2-3, Prod Migration HIGH FREQUENCY INDICATORS Importance & Goals CURRENT ARCHITECTURE Apache Kafka, Spark, Druid & Superset WORK IN PROGRESS Further analyses FUTURE PLANS 6 5 4 3 2 1
  • 4. Our Solutions Data Management • Data Governance Solutions • Next Generation Analytics • 360 Engagement • Data Security Analytics • Data Warehouse Solutions • Customer Journey Analytics • Advanced Marketing Analytics Solutions • Industry-specific analytic use cases • Online Customer Data Platform • IoT Analytics • Analytic Lab Solution Big Data & AI • Big Data & AI Advisory Services • Big Data & AI Accelerators • Data Lake Foundation • EDW Optimization / Offloading • Big Data Ingestion and Governance • AI Implementation – Chatbot • AI Implementation – Image Recognition Security Analytics • Security Analytic Advisory Services • Integrated Law Enforcement Solutions • Cyber Security Solutions • Fraud Analytics Solutions • Governance, Risk & Compliance Solutions
  • 5. • +20 IT , +18 DB&DWH • +7 BIG DATA • Lead Archtitect &Big Data /Analytics @KOMTAS • Instructor&Consultant • ITU,MEF,Şehir Uni. BigData Instr. • Certified R programmer • Certified Hadoop Administrator
  • 6. Our Organization § The Central Bank of the Republic of Turkey is primarily responsible for steering the monetary and exchange rate policies in Turkey. o Price stability o Financial stability o Exchange rate regime o The privilege of printing and issuing banknotes o Payment systems
  • 7. • Big Data Engineer• Big Data Engineer M. Yağmur Şahin Emre Tokel Kerem Başol • Big Data Team Leader
  • 9. Importance and Goals § To observe foreign exchange markets in real-time o Are there any patterns regarding to specific time intervals during the day? o Is there anything to observe before/after local working hours throughout the whole day? o What does the difference between bid/ask prices tell us? § To be able to detect risks and take necessary policy measures in a timely manner o Developing liquidity and risk indicators based real-time tick data o Visualizing observations for decision makers in real-time o Finally, discovering possible intraday seasonality § Wouldn’t it be great to be able to correlate with news flow as well?
  • 11. Development of High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid Phase 1 Prod migratio n Next phases Test Cluster Phase 2 Phase 3
  • 12. Test Cluster § Our first studies on big data have started on very humble servers o 5 servers with 32 GB RAM for each o 3 TB storage § HDP 2.6.0.3 installed o Not the latest version back then § Technical difficulties o Performance problems o Apache Druid indexing o Apache Superset maturity
  • 13. Development of High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid Phase 1 Prod migratio n Next phases Test Cluster Phase 2 Phase 3
  • 14. TREP API Apache Kafka Apache NiFi MongoDB Apache Zeppelin & Power BI
  • 15. Thomson Reuters Enterprise Platform (TREP) § Thomson Reuters provides its subscribers with an enterprise platform that they can collect the market data as it is generated § Each financial instrument on TREP has a unique code called RIC § The event queue implemented by the platform can be consumed with the provided Java SDK § We developed a Java application for consuming this event queue to collect tick-data according to required RICs
  • 16. TREP API Apache Kafka Apache NiFi MongoDB Apache Zeppelin & Power BI
  • 17. Apache Kafka § The data flow is very fast and quite dense o We published the messages containing tick data collected by our Java application to a message queue o Twofold analysis: Batch and real-time § We decided to use Apache Kafka residing on our test big data cluster § We created a topic for each RIC on Apache Kafka and published data to related topics
  • 18. TREP API Apache Kafka Apache NiFi MongoDB Apache Zeppelin & Power BI
  • 19. Apache NiFi § In order to manage the flow, we decided to use Apache NiFi § We used KafkaConsumer processor to consume messages from Kafka queues § The NiFi flow was designed to be persisted on MongoDB
  • 21. TREP API Apache Kafka Apache NiFi MongoDB Apache Zeppelin & Power BI
  • 22. MongoDB § We had prepared data in JSON format with our Java application § Since we have MongoDB installed on our enterprise systems, we decided to persist this data to MongoDB § Although MongoDB is not a part of HDP, it seemed as a good choice for our researchers to use this data in their analyses
  • 23. TREP API Apache Kafka Apache NiFi MongoDB Apache Zeppelin & Power BI
  • 24. Apache Zeppelin § We provided our researchers with access to Apache Zeppelin and connection to MongoDB via Python § By doing so, we offered an alternative to the tools on local computers and provided a unified interface for financial analysis
  • 25. Business Intelligence on Client Side § Our users had to download daily tick-data manually from their Thomson Reuters Terminals and work on Excel § Users were then able to access tick-data using Power BI o We also provided our users with a news timeline along with the tick-data
  • 26. We needed more! § We had to visualize the data in real-time o Analysis on persisted data using MongoDB, PowerBI and Apache Zeppelin was not enough
  • 27. TREP API Apache Kafka Apache NiFi MongoDB Apache Zeppelin & Power BI
  • 28. Development of High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid Phase 1 Prod migratio n Next phases Test Cluster Phase 2 Phase 3
  • 30. Apache Druid § We needed a database which was able to: o Answer ad-hoc queries (slice/dice) for a limited window efficiently o Store historic data and seamlessly integrate current and historic data o Provide native integration with possible real-time visualization frameworks (preferably from Apache stack) o Provide native integration with Apache Kafka § Apache Druid addressed all the aforementioned requirements § Indexing task was achieved using Tranquility
  • 32. Apache Superset § Apache Superset was the obvious alternative for real-time visualization since tick-data was stored on Apache Druid o Native integration with Apache Druid o Freely available on Hortonworks service stack § We prepared real-time dashboards including: o Transaction Count o Bid / Ask Prices o Contributor Distribution o Bid - Ask Spread
  • 33. We needed more, again! § Reliability issues with Druid § Performance issues § Enterprise integration requirements
  • 34. Development of High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid Phase 1 Prod migratio n Next phases Test Cluster Phase 2 Phase 3
  • 35. Architecture Internet Data Enterprise Content Social Media/Media Micro Level Data Commercial Data Vendors Ingestion Big Data Platform Data Science GovernanceData Sources
  • 36. Development of High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid Phase 1 Prod migratio n Next phases Test Cluster Phase 2 Phase 3
  • 37. TREP API Apache Kafka Apache Hive + Druid Integration Apache Spark Apache Superset
  • 38. Apache Hive + Druid Integration § After setting up our production environment (using HDP 3.0.1.0) and started to feed data, we realized that data were scattered and we were missing the option to co-utilize these different data sources § We then realized that Apache Hive was already providing Kafka & Druid indexing service in the form of a simple table creation and querying facility for Druid from Hive
  • 39. TREP API Apache Kafka Apache Hive + Druid Integration Apache Spark Apache Superset
  • 40. Apache Spark § Due to additional calculation requirements of our users, we decided to utilize Apache Spark § With Apache Spark 2.4, we used Spark Streaming and Spark SQL contexts together in the same application § In our Spark application o For every 5 seconds, a 30-second window is created o On each window, outlier boundaries are calculated o Outlier data points are detected
  • 41.
  • 43. Current Architecture & Progress So Far Java Application Kafka Topic (real-time) Kafka Topic (windowed) TREP Event Queue Consume Publish Spark Application Consume Publish Druid Datasource (real-time) Druid Datasource (windowed) Superset Dashboard (tick data) Superset Dashboard (outlier)
  • 49. Implementing… § Moving average calculation (20-day window) § Volatility Indicator § Average True Range Indicator (moving average) o [ max(t) - min(t) ] o [ max(t) - close(t-1) ] o [ max(t) - close(t-1) ]
  • 51. To-Do List § Matching data subscription § Bringing historical tick data into real-time analysis § Possible use of machine learning for intraday indicators