SlideShare a Scribd company logo
1 of 20
Download to read offline
Visualizing Big Data in Realtime
Sasha Parfenov
sashap@apache.org
June 15, 2017
Agenda
Apache Apex
DataTorrent RTS
Real-time Dashboards and Widgets
App Data Framework
Apache Apex AutoMetrics
Exporting and Packaging Dashboards
Q&A
2
3
What is Apache Apex?
✓ Platform and Runtime Engine - enables development of scalable and
fault-tolerant distributed applications for processing streaming and batch data
✓ Highly Scalable - linear scalability to billions of events per second
✓ Highly Performant - millisecond end-to-end latency
✓ Fault Tolerant - automatically recovers from failures
✓ Stateful - guarantees that application state is preserved
✓ YARN Native - Uses Hadoop YARN for resource management
✓ Developer Friendly - Exposes an easy API for developing Operators, which
can include any custom logic written in Java
✓ Malhar Library - library of many popular operators and application examples
○ Input / Output Connectors - File Systems, RDBMS, NoSQL, Messaging, Social, …
○ Compute Operators - Parsers, Transforms, Stats, ML, Scripting, …
✓ Integrations - Calcite, SAMOA, Beam, Nifi, Geode, Bigtop, etc.
apex.apache.org
4
Apache Apex Use Cases
Data Sources
Op1
Hadoop (YARN + HDFS)
Real-time
Analytics &
Visualizations
Op3
Op2
Op4
Streaming Computation Actions & Insights
Data Targets
5
Apache Apex Enables “Shift Left”
6
Apex Application Development
Application DAG is made up of connected
operators and streams
Stream is a sequence of data tuples
Operator takes one or more input streams,
performs computations & emits one or more
output streams
● Each Operator is YOUR custom business logic
in java, or built-in operator from our open
source library
● Operator has many instances that run in
parallel and each instance is single-threaded
7
Apache Apex & DataTorrent RTS
Ingestion &
Data Prep
Solutions for
Business
Awesome
Visual Tools GUI Application AssemblyManagement & Monitoring Real-Time Data Visualization
Hadoop 2.x - YARN + HDFS | On Prem & Cloud
FileSync | Kafka-to-HDFS | JDBC-to-HDFS | HDFS-to-HDFS | S3-to-HDFS
Application
Templates
Apex-Malhar Operator Library
Apache Apex Core
Big Data
Infrastructure
Core
High-level API
Transformation ML & Score SQL Analytics
Dev Framework
Batch
Support
Apache
Apex
Fraud &
Security
Ad Tech ETL Pipelines IoT & Industrial
8
DataTorrent RTS Visualization Demo
9
Realtime App Visualizations
● Apex App Visualizations
○ Events & Logs
○ Logical & Physical DAGs
○ Tuple Recordings
○ Stats & Metrics
○ Data Queries & Results
● Dashboards
○ Configurable
○ Export/Import via Apex app packages
● Widgets
○ Real-time data streams
○ Visualizations include tables, charts, maps, ...
○ Configurable
○ Support external development and dynamic
loading from Apex app packages.
10
Connecting Dashboards to App Data
Apex Applications with AppData Support
DataTorrent RTS Dashboard & Widgets
DataTorrent RTS Gateway
dtGateway
resultsquery
11
App Data Framework
App Data Framework Documentation
http://docs.datatorrent.com/app_data_framework/
Data Sources are Query + Source + Result
operators exposed via Gateway Topics
App Data Framework Schema & Data Queries
Enables Real-time Visualization Widgets
Console Gateway
Schema Subscribe
Data Subscribe
Data Publish
Schema Publish
Data Query
Data Renew
Schema Query
12
App Data Framework Schema Queries
1. Request application data sources
http://<gateway-host:port>/ws/v2/applications/<appId>
{
...
"appDataSources": [
{
"name": "SnapshotServer.queryResult",
"context": {...},
"query": {
"topic": "TwitterHashtagQueryDemo",
...
},
"result": {
"topic": "TwitterHashtagQueryResultDemo",
...
}
}
]
}
2. Subscribe to schema result on a unique topic
ws://<gateway-host:port>/pubsub
{
"type": "subscribe",
"topic": "TwitterHashtagQueryResultDemo.0.20716154835833223"
}
3. Request schema from published DataSource topic
ws://<gateway-host:port>/pubsub
{
"type": "publish",
"topic": "TwitterHashtagQueryDemo",
"data": {
"id": 0.20716154835833223,
"type": "schemaQuery",
"context": {...}
}
}
4. DataSource responds on unique topic
{
"topic": "TwitterHashtagQueryResultDemo.0.20716154835833223",
"data": {
"id": "0.20716154835833223",
"type": "schemaResult",
"data": [
{
"values": [{
"name": "hashtag",
"type": "string"
},{
"name": "count",
"type": "integer"
}
],
"schemaType": "snapshot",
"schemaVersion": "1.0"
}
]
},
"type": "data"
}
3. Data is published on the unique result topic
{
"topic": "TwitterHashtagQueryResultDemo.0.6760250790172551",
"data": {
"id": "0.6760250790172551",
"type": "dataResult",
"data": [
{
"count": "1398",
"hashtag": "iHeartApache"
},
{
"count": "1415",
"hashtag": "ApexBigDataWorld"
},
{
"count": "1498",
"hashtag": "StreamingBigData"
},
{
"count": "1521",
"hashtag": "ApacheApex"
},
{
"count": "1728",
"hashtag": "DataTorrentRTS"
},
...
],
"countdown": "29"
},
"type": "data"
}
13
App Data Framework Data Queries
1. Subscribe to data result on a unique topic
ws://<gateway-host:port>/pubsub
{
"type": "subscribe",
"topic": "TwitterHashtagQueryResultDemo.0.6760250790172551"
}
2. Request data on query topic with matching id
ws://<gateway-host:port>/pubsub
{
"type": "publish",
"topic": "TwitterHashtagQueryDemo",
"data": {
"id": 0.6760250790172551,
"type": "dataQuery",
"data": {
"fields": [
"hashtag",
"count"
]
},
"countdown": 30,
"incompleteResultOK": true
}
}
14
Easiest way to expose custom data in Apache Apex apps
import com.datatorrent.api.AutoMetric;
public class LineReceiver extends BaseOperator
{
@AutoMetric
long evalsPerWindow;
@AutoMetric
long evalsTotal;
public final transient DefaultInputPort<String> input = new DefaultInputPort<String>()
{
@Override
public void process(String s)
{
evalsPerWindow ++;
evalsTotal++;
}
};
@Override
public void beginWindow(long windowId)
{
evalsPerWindow = 0;
}
}
Apache Apex App Data with AutoMetrics
Example Operators with @AutoMetric
JsonParser.java, PojoToAvro.java, POJOKafkaOutputOperator.java
Custom Aggregators for non-numeric fields
Apache Apex - Building Custom Aggregators
Requesting AutoMetrics Data via StrAM API
http://<appMasterTrackingUrl>/ws/v2/stram/physicalPlan
{
"operators": [{
"name": "picalc",
"metrics": {
"evalsPerWindow": "23000",
"evalsTotal": "1005787500"
}
}]
}
Get StrAM URL with Apex CLI
$ apex
apex> connect <appId>
apex (appId)> get-app-info
... "appMasterTrackingUrl": "node24.datatorrent.com:40466" …
Key Operators Enabling TopN Computation and Visualization
WindowedTopCounter<String> topCounts = dag.addOperator("TopCounter", new WindowedTopCounter<String>());
AppDataSnapshotServerMap snapshotServer = dag.addOperator("SnapshotServer", new AppDataSnapshotServerMap());
snapshotServer.setSnapshotSchemaJSON(SNAPSHOT_SCHEMA);
snapshotServer.setTableFieldToMapField(conversionMap);
PubSubWebSocketAppDataQuery wsQuery = new PubSubWebSocketAppDataQuery();
wsQuery.setUri(uri);
snapshotServer.setEmbeddableQueryInfoProvider(wsQuery);
PubSubWebSocketAppDataResult wsResult = dag.addOperator("QueryResult", new PubSubWebSocketAppDataResult());
wsResult.setUri(uri);
Operator.InputPort<String> queryResultPort = wsResult.input;
Snapshot Schema for SnapshotServer Operator
{
"values": [{"name": "url", "type": "string"},
{"name": "count", "type": "integer"}]
}
15
Snapshot Schema Apps
Available SnapshotServer Implementations
AppDataSnapshotServerMap.java
AppDataSnapshotServerPOJO.java
Example Applications with Snapshot Schema
TwitterTopCounterApplication.java (twitter)
ApplicationAppData.java (pi demo)
Twitter Demo Logical Plan with Snapshot Schema
Dimensions Schema for DimensionsComputation Operator
{
"keys":[{"name":"channel","type":"string","enumValues":["Mobile","Online","Store"]},
{"name":"region","type":"string","enumValues":["Dallas","New York","San Francisco", ... ]},
{"name":"product","type":"string","enumValues":["Laptops","Printers","Routers", ...]}],
"timeBuckets":["1m", "1h", "1d", "5m"],
"values":
[{"name":"sales","type":"double","aggregators":["SUM"]},
{"name":"discount","type":"double","aggregators":["SUM"]},
{"name":"tax","type":"double","aggregators":["SUM"]}],
"dimensions":
[{"combination":[]},
{"combination":["region"]},
{"combination":["product"]},
{"combination":["channel","product"]},
{"combination":["channel","region","product"]}]
}
// full schema -> salesGenericEventSchema.json
16
Dimensions Schema Apps
Key Operators Enabling Dimensions Computation and
Visualization
DimensionsComputationFlexibleSingleSchemaMap dimensions =
dag.addOperator("DimensionsComputation", DimensionsComputationFlexibleSingleSchemaMap.class);
AppDataSingleSchemaDimensionStoreHDHT store = dag.addOperator("Store",
AppDataSingleSchemaDimensionStoreHDHT.class);
PubSubWebSocketAppDataQuery wsIn = new PubSubWebSocketAppDataQuery();
store.setEmbeddableQueryInfoProvider(wsIn);
PubSubWebSocketAppDataResult wsOut = dag.addOperator("QueryResult", new
PubSubWebSocketAppDataResult());
Example Applications with Dimensions Schema
CDRDemoV2.java
SalesDemo.java
Sales Demo Logical Plan with Dimensions Schema
3. Create ui.json in Apex app project folder under
<Apex App>/src/main/resources/resources/ui/ui.json
{
"dashboards": [
{
"file": "TwitterDemo.dtdashboard"
},
{
"name": "Sales Dimensions Demo",
"file": "SalesDemo.dtdashboard",
"appNames": ["SalesDemo-Sasha", "SalesDemo"]
}
]
}
// "appNames" is used to auto-associate packaged dashboards with running apps
4. Compile Apex app project and verify .apa package has
myApp.apa
+ resources/
+ ui/
- ui.json
+ dashboards/
- TwitterDemo.dtdashboard
- SalesDemo.dtdashboard
17
Exporting and Packaging Dashboards
1. Create and download dashboard from UI Console
2. Copy dashboards to Apex app project folder under
<Apex App>/src/main/resources/resources/ui/dashboards/
- TwitterDemo.dtdashboard
- SalesDemo.dtdashboard
Questions?
18
Sasha Parfenov
sashap@apache.org
@utdsasha
Thank You!
19
Resources
• Apache Apex - http://apex.apache.org/
• Subscribe to forums
ᵒ Apex - http://apex.apache.org/community.html
ᵒ DataTorrent - https://groups.google.com/forum/#!forum/dt-users
• Download - https://datatorrent.com/download/
• Twitter
ᵒ @ApacheApex; Follow - https://twitter.com/apacheapex
ᵒ @DataTorrent; Follow – https://twitter.com/datatorrent
• Meetups - http://meetup.com/topics/apache-apex
• Webinars - https://datatorrent.com/webinars/
• Videos - https://youtube.com/user/DataTorrent
• Slides - http://slideshare.net/DataTorrent/presentations
• Startup Accelerator Program - Full featured enterprise product
ᵒ https://datatorrent.com/product/start-up-accelerator/
• Big Data Application Templates/Examples – https://datatorrent.com/apphub
20
We Are Hiring!
jobs@datatorrent.com

More Related Content

What's hot

Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...DataWorks Summit/Hadoop Summit
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseDataWorks Summit
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Data Con LA
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!DataWorks Summit
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaDataWorks Summit
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsDataWorks Summit
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceDataWorks Summit/Hadoop Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 

What's hot (20)

Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
High-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in HadoopHigh-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in Hadoop
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Creating the Internet of Your Things
Creating the Internet of Your ThingsCreating the Internet of Your Things
Creating the Internet of Your Things
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
 
IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!Analyzing the World's Largest Security Data Lake!
Analyzing the World's Largest Security Data Lake!
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 

Similar to Visualizing Big Data in Realtime with Apache Apex and DataTorrent RTS

Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseHao Chen
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Stream analytics
Stream analyticsStream analytics
Stream analyticsrebeccatho
 
Apache StreamPipes – Flexible Industrial IoT Management
Apache StreamPipes – Flexible Industrial IoT ManagementApache StreamPipes – Flexible Industrial IoT Management
Apache StreamPipes – Flexible Industrial IoT ManagementApache StreamPipes
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Apache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementApache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementHao Chen
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Paulo Gutierrez
 
XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorialmarpierc
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in ActionHao Chen
 
StreamAnalytix - Multi-Engine Streaming Analytics Platform
StreamAnalytix - Multi-Engine Streaming Analytics PlatformStreamAnalytix - Multi-Engine Streaming Analytics Platform
StreamAnalytix - Multi-Engine Streaming Analytics PlatformAtul Sharma
 
Effective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekEffective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekDatabricks
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextPrateek Maheshwari
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) Surendar S
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft Private Cloud
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
 

Similar to Visualizing Big Data in Realtime with Apache Apex and DataTorrent RTS (20)

Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Stream analytics
Stream analyticsStream analytics
Stream analytics
 
Apache StreamPipes – Flexible Industrial IoT Management
Apache StreamPipes – Flexible Industrial IoT ManagementApache StreamPipes – Flexible Industrial IoT Management
Apache StreamPipes – Flexible Industrial IoT Management
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Apache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementApache Eagle Architecture Evolvement
Apache Eagle Architecture Evolvement
 
Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要Spark + AI Summit 2020 イベント概要
Spark + AI Summit 2020 イベント概要
 
XSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata TutorialXSEDE14 SciGaP-Apache Airavata Tutorial
XSEDE14 SciGaP-Apache Airavata Tutorial
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in Action
 
StreamAnalytix - Multi-Engine Streaming Analytics Platform
StreamAnalytix - Multi-Engine Streaming Analytics PlatformStreamAnalytix - Multi-Engine Streaming Analytics Platform
StreamAnalytix - Multi-Engine Streaming Analytics Platform
 
Effective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekEffective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a Week
 
Apache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's NextApache Samza 1.0 - What's New, What's Next
Apache Samza 1.0 - What's New, What's Next
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration) SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
SnapLogic- iPaaS (Elastic Integration Cloud and Data Integration)
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview PresentationMicrosoft SQL Server - StreamInsight Overview Presentation
Microsoft SQL Server - StreamInsight Overview Presentation
 
Scale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | GimelScale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | Gimel
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Visualizing Big Data in Realtime with Apache Apex and DataTorrent RTS

  • 1. Visualizing Big Data in Realtime Sasha Parfenov sashap@apache.org June 15, 2017
  • 2. Agenda Apache Apex DataTorrent RTS Real-time Dashboards and Widgets App Data Framework Apache Apex AutoMetrics Exporting and Packaging Dashboards Q&A 2
  • 3. 3 What is Apache Apex? ✓ Platform and Runtime Engine - enables development of scalable and fault-tolerant distributed applications for processing streaming and batch data ✓ Highly Scalable - linear scalability to billions of events per second ✓ Highly Performant - millisecond end-to-end latency ✓ Fault Tolerant - automatically recovers from failures ✓ Stateful - guarantees that application state is preserved ✓ YARN Native - Uses Hadoop YARN for resource management ✓ Developer Friendly - Exposes an easy API for developing Operators, which can include any custom logic written in Java ✓ Malhar Library - library of many popular operators and application examples ○ Input / Output Connectors - File Systems, RDBMS, NoSQL, Messaging, Social, … ○ Compute Operators - Parsers, Transforms, Stats, ML, Scripting, … ✓ Integrations - Calcite, SAMOA, Beam, Nifi, Geode, Bigtop, etc. apex.apache.org
  • 4. 4 Apache Apex Use Cases Data Sources Op1 Hadoop (YARN + HDFS) Real-time Analytics & Visualizations Op3 Op2 Op4 Streaming Computation Actions & Insights Data Targets
  • 5. 5 Apache Apex Enables “Shift Left”
  • 6. 6 Apex Application Development Application DAG is made up of connected operators and streams Stream is a sequence of data tuples Operator takes one or more input streams, performs computations & emits one or more output streams ● Each Operator is YOUR custom business logic in java, or built-in operator from our open source library ● Operator has many instances that run in parallel and each instance is single-threaded
  • 7. 7 Apache Apex & DataTorrent RTS Ingestion & Data Prep Solutions for Business Awesome Visual Tools GUI Application AssemblyManagement & Monitoring Real-Time Data Visualization Hadoop 2.x - YARN + HDFS | On Prem & Cloud FileSync | Kafka-to-HDFS | JDBC-to-HDFS | HDFS-to-HDFS | S3-to-HDFS Application Templates Apex-Malhar Operator Library Apache Apex Core Big Data Infrastructure Core High-level API Transformation ML & Score SQL Analytics Dev Framework Batch Support Apache Apex Fraud & Security Ad Tech ETL Pipelines IoT & Industrial
  • 9. 9 Realtime App Visualizations ● Apex App Visualizations ○ Events & Logs ○ Logical & Physical DAGs ○ Tuple Recordings ○ Stats & Metrics ○ Data Queries & Results ● Dashboards ○ Configurable ○ Export/Import via Apex app packages ● Widgets ○ Real-time data streams ○ Visualizations include tables, charts, maps, ... ○ Configurable ○ Support external development and dynamic loading from Apex app packages.
  • 10. 10 Connecting Dashboards to App Data Apex Applications with AppData Support DataTorrent RTS Dashboard & Widgets DataTorrent RTS Gateway dtGateway resultsquery
  • 11. 11 App Data Framework App Data Framework Documentation http://docs.datatorrent.com/app_data_framework/ Data Sources are Query + Source + Result operators exposed via Gateway Topics App Data Framework Schema & Data Queries Enables Real-time Visualization Widgets Console Gateway Schema Subscribe Data Subscribe Data Publish Schema Publish Data Query Data Renew Schema Query
  • 12. 12 App Data Framework Schema Queries 1. Request application data sources http://<gateway-host:port>/ws/v2/applications/<appId> { ... "appDataSources": [ { "name": "SnapshotServer.queryResult", "context": {...}, "query": { "topic": "TwitterHashtagQueryDemo", ... }, "result": { "topic": "TwitterHashtagQueryResultDemo", ... } } ] } 2. Subscribe to schema result on a unique topic ws://<gateway-host:port>/pubsub { "type": "subscribe", "topic": "TwitterHashtagQueryResultDemo.0.20716154835833223" } 3. Request schema from published DataSource topic ws://<gateway-host:port>/pubsub { "type": "publish", "topic": "TwitterHashtagQueryDemo", "data": { "id": 0.20716154835833223, "type": "schemaQuery", "context": {...} } } 4. DataSource responds on unique topic { "topic": "TwitterHashtagQueryResultDemo.0.20716154835833223", "data": { "id": "0.20716154835833223", "type": "schemaResult", "data": [ { "values": [{ "name": "hashtag", "type": "string" },{ "name": "count", "type": "integer" } ], "schemaType": "snapshot", "schemaVersion": "1.0" } ] }, "type": "data" }
  • 13. 3. Data is published on the unique result topic { "topic": "TwitterHashtagQueryResultDemo.0.6760250790172551", "data": { "id": "0.6760250790172551", "type": "dataResult", "data": [ { "count": "1398", "hashtag": "iHeartApache" }, { "count": "1415", "hashtag": "ApexBigDataWorld" }, { "count": "1498", "hashtag": "StreamingBigData" }, { "count": "1521", "hashtag": "ApacheApex" }, { "count": "1728", "hashtag": "DataTorrentRTS" }, ... ], "countdown": "29" }, "type": "data" } 13 App Data Framework Data Queries 1. Subscribe to data result on a unique topic ws://<gateway-host:port>/pubsub { "type": "subscribe", "topic": "TwitterHashtagQueryResultDemo.0.6760250790172551" } 2. Request data on query topic with matching id ws://<gateway-host:port>/pubsub { "type": "publish", "topic": "TwitterHashtagQueryDemo", "data": { "id": 0.6760250790172551, "type": "dataQuery", "data": { "fields": [ "hashtag", "count" ] }, "countdown": 30, "incompleteResultOK": true } }
  • 14. 14 Easiest way to expose custom data in Apache Apex apps import com.datatorrent.api.AutoMetric; public class LineReceiver extends BaseOperator { @AutoMetric long evalsPerWindow; @AutoMetric long evalsTotal; public final transient DefaultInputPort<String> input = new DefaultInputPort<String>() { @Override public void process(String s) { evalsPerWindow ++; evalsTotal++; } }; @Override public void beginWindow(long windowId) { evalsPerWindow = 0; } } Apache Apex App Data with AutoMetrics Example Operators with @AutoMetric JsonParser.java, PojoToAvro.java, POJOKafkaOutputOperator.java Custom Aggregators for non-numeric fields Apache Apex - Building Custom Aggregators Requesting AutoMetrics Data via StrAM API http://<appMasterTrackingUrl>/ws/v2/stram/physicalPlan { "operators": [{ "name": "picalc", "metrics": { "evalsPerWindow": "23000", "evalsTotal": "1005787500" } }] } Get StrAM URL with Apex CLI $ apex apex> connect <appId> apex (appId)> get-app-info ... "appMasterTrackingUrl": "node24.datatorrent.com:40466" …
  • 15. Key Operators Enabling TopN Computation and Visualization WindowedTopCounter<String> topCounts = dag.addOperator("TopCounter", new WindowedTopCounter<String>()); AppDataSnapshotServerMap snapshotServer = dag.addOperator("SnapshotServer", new AppDataSnapshotServerMap()); snapshotServer.setSnapshotSchemaJSON(SNAPSHOT_SCHEMA); snapshotServer.setTableFieldToMapField(conversionMap); PubSubWebSocketAppDataQuery wsQuery = new PubSubWebSocketAppDataQuery(); wsQuery.setUri(uri); snapshotServer.setEmbeddableQueryInfoProvider(wsQuery); PubSubWebSocketAppDataResult wsResult = dag.addOperator("QueryResult", new PubSubWebSocketAppDataResult()); wsResult.setUri(uri); Operator.InputPort<String> queryResultPort = wsResult.input; Snapshot Schema for SnapshotServer Operator { "values": [{"name": "url", "type": "string"}, {"name": "count", "type": "integer"}] } 15 Snapshot Schema Apps Available SnapshotServer Implementations AppDataSnapshotServerMap.java AppDataSnapshotServerPOJO.java Example Applications with Snapshot Schema TwitterTopCounterApplication.java (twitter) ApplicationAppData.java (pi demo) Twitter Demo Logical Plan with Snapshot Schema
  • 16. Dimensions Schema for DimensionsComputation Operator { "keys":[{"name":"channel","type":"string","enumValues":["Mobile","Online","Store"]}, {"name":"region","type":"string","enumValues":["Dallas","New York","San Francisco", ... ]}, {"name":"product","type":"string","enumValues":["Laptops","Printers","Routers", ...]}], "timeBuckets":["1m", "1h", "1d", "5m"], "values": [{"name":"sales","type":"double","aggregators":["SUM"]}, {"name":"discount","type":"double","aggregators":["SUM"]}, {"name":"tax","type":"double","aggregators":["SUM"]}], "dimensions": [{"combination":[]}, {"combination":["region"]}, {"combination":["product"]}, {"combination":["channel","product"]}, {"combination":["channel","region","product"]}] } // full schema -> salesGenericEventSchema.json 16 Dimensions Schema Apps Key Operators Enabling Dimensions Computation and Visualization DimensionsComputationFlexibleSingleSchemaMap dimensions = dag.addOperator("DimensionsComputation", DimensionsComputationFlexibleSingleSchemaMap.class); AppDataSingleSchemaDimensionStoreHDHT store = dag.addOperator("Store", AppDataSingleSchemaDimensionStoreHDHT.class); PubSubWebSocketAppDataQuery wsIn = new PubSubWebSocketAppDataQuery(); store.setEmbeddableQueryInfoProvider(wsIn); PubSubWebSocketAppDataResult wsOut = dag.addOperator("QueryResult", new PubSubWebSocketAppDataResult()); Example Applications with Dimensions Schema CDRDemoV2.java SalesDemo.java Sales Demo Logical Plan with Dimensions Schema
  • 17. 3. Create ui.json in Apex app project folder under <Apex App>/src/main/resources/resources/ui/ui.json { "dashboards": [ { "file": "TwitterDemo.dtdashboard" }, { "name": "Sales Dimensions Demo", "file": "SalesDemo.dtdashboard", "appNames": ["SalesDemo-Sasha", "SalesDemo"] } ] } // "appNames" is used to auto-associate packaged dashboards with running apps 4. Compile Apex app project and verify .apa package has myApp.apa + resources/ + ui/ - ui.json + dashboards/ - TwitterDemo.dtdashboard - SalesDemo.dtdashboard 17 Exporting and Packaging Dashboards 1. Create and download dashboard from UI Console 2. Copy dashboards to Apex app project folder under <Apex App>/src/main/resources/resources/ui/dashboards/ - TwitterDemo.dtdashboard - SalesDemo.dtdashboard
  • 20. Resources • Apache Apex - http://apex.apache.org/ • Subscribe to forums ᵒ Apex - http://apex.apache.org/community.html ᵒ DataTorrent - https://groups.google.com/forum/#!forum/dt-users • Download - https://datatorrent.com/download/ • Twitter ᵒ @ApacheApex; Follow - https://twitter.com/apacheapex ᵒ @DataTorrent; Follow – https://twitter.com/datatorrent • Meetups - http://meetup.com/topics/apache-apex • Webinars - https://datatorrent.com/webinars/ • Videos - https://youtube.com/user/DataTorrent • Slides - http://slideshare.net/DataTorrent/presentations • Startup Accelerator Program - Full featured enterprise product ᵒ https://datatorrent.com/product/start-up-accelerator/ • Big Data Application Templates/Examples – https://datatorrent.com/apphub 20 We Are Hiring! jobs@datatorrent.com