SlideShare a Scribd company logo
SPARKTA
A real-time analytics platform
based on Apache Spark
London, May 2015
FIRST SPARK PLATFORM.
APR 2014
20+ INTERNATIONAL
PROJECTS
WITH SPARK
PLATFORM
OVERVIEW1
STRATIO
INGESTION
Customer lake
STRATIO
STREAMING
STRATIO
QUANTUM
STRATIO DEEP
STRATIO CROSSDATA
ODBC JBDC API Rest
CRM
ERP
Call
Center
BI
Internal
Data
External
Data
BI AD HOC APP
Hdfs S3 Elastic
Search
Mongo DB Cassandra Redis Oracle, DB2
Other
Databases
STRATIO DATAVIS
4
STRATIO
INGESTION
Customer lake
STRATIO
STREAMING
STRATIO
QUANTUM
STRATIO DEEP
STRATIO CROSSDATA
ODBC JBDC API Rest
CRM
ERP
Call
Center
BI
Internal
Data
External
data
BI AD HOC APP
Ingests,
transforms
Analyzes and
processes real
time streaming
A unified SQL interface
Machine Learning
and algorithms
Processes & combines with
Spark
STRATIO DATAVIS
Creates and designs
dashboards and reports
Hdfs S3 Elastic
Search
Mongo DB Cassandra Redis Oracle, DB2
Other
Databases
5
STRATIO
INGESTION
Ingests,
transforms
STRATIO
STREAMING
STRATIO
QUANTUM
STRATIO CROSSDATA
Analyzes & processes
A unified SQL interface
Machine Learning
and algorithms
ODBC JBDC API Rest
Streaming
Apache Kite
Apache Flume
CRM
ERP
Call
Center
BI
MLlib
Internal
Data
External
Data
BI AD HOC APP
Combines with Spark data from any
source
Customer lake
STRATIO DEEP
Processes & combines with Spark
Hdfs S3 Elastic
Search
Mongo DB Cassandra Redis Oracle, DB2
Other
Databases
STRATIO DATAVIS
Creates and designs
dashboards and reports
6
STRATIO
INGESTION
Hdfs S3 Elastic
Search
Mongo DB Cassandra Redis Oracle, DB2
Other
Databases
Ingests,
transforms
STRATIO
STREAMING
STRATIO
QUANTUM
STRATIO CROSSDATA
Analyzes &
processes
Consult & analyze. SQL interface
Machine Learning
& algorithms
ODBC JBDC API Rest
Streaming
Apache Kite
Apache Flume
CRM
ERP
Call
Center
BI
MLib
Internal
Data
External
Data
BI AD HOC APP
Data combination through time
Customer lake
STRATIO DEEP
Processes & combines with
Spark
Real-time
Ephemer
al tables
Past
Stored
tables
Future
Quantum
tables
STRATIO DATAVIS
Creates and designs
dashboards and reports
7
STRATIO DATAVIS
STRATIO
INGESTION
Ingests,
transforms
STRATIO
STREAMING
STRATIO
QUANTUM
STRATIO CROSSDATA
Analyzes &
processes
Consulta y analiza. Interfaz SQL
Machine Learning
& algorithms
ODBC JBDC API Rest
Streaming
Apache Kite
Apache Flume
CRM
ERP
Call
Center
BI
MLlib
Internal
Data
External
Data
Creates and designs
dashboards and reports
Customer lake
STRATIO DEEP
Processes & combines with Spark
Hdfs S3 Elastic
Search
Mongo DB Cassandra Redis Oracle, DB2
Other
Databases
INFORMATIONAL + OPERATIONAL
WITHOUT NEED TO REPLICATE DATA
Oracle, DB2
Other Databases Mongo DB TeradataOPERATIONAL
8
REAL-TIME:
Beyond cool dashboards2
The time is N W
We all know this story already
Social media and networking sites are a part of the fabric of
everyday life, changing the way the world shares and accesses
information.
The overwhelming amount of information gathered not only
from messages, updates and images but also readings from
sensors,GPS signals and many other sources was the origin of
a (big) technological revolution.
Remember? VOLUME, VARIETY & VELOCITY
CONFERENCE10
Look at these sexy infographics!
We all love data
visualization
Insights from this vast amount of data
allows us to learn from the users and
explore our own world.
We can follow in real-time the evolution
of a topic, an event or even an incident
just by exploring aggregated data.
CONFERENCE11
Delivering real-time business in the Internet
But beyond cool visualizations, there are
some core services delivered in real-time,
using aggregated data to answer common
questions in the fastest way.
These services are the heart of the
business behind their nice logos.
Site traffic, user engagement monitoring,
service health, APIs, internal monitoring
platforms, real-time dashboards…
Aggregated data feeds directly to end
users, publishers, and advertisers, among
others.
CONFERENCE12
Pushing business’ processes to perform faster
Digital companies, born to develop their services in real-time have changed
the expectations of many others businesses.
Real-time information makes it possible for a company to be much more agile
than its competitors, improving business answers, gaining insights on their
performance…
CONFERENCE13
Listen to your data…
CLIENTTPV
Accounts
Loans
and credits
Insurances
Broker
Mortgages
Cards
Deposits
ATM
Online
gateway
application logs
Social
networks
transactions
geolocation
CRM
Where as business intelligence is data gathered
for the purpose of analyzing trends over time,
operational intelligence provides a picture of
what is currently happening within a process.
And we can listen to almost everything! Orders,
transactions, clicks, calls, bookings, internal
services...
CONFERENCE14
…and start delivering real-time services
Real-time monitoring could be really nice, but your
company needs to work in the same way as digital
companies:
• Rethinking existing processes to deliver them
faster, better.
• Creating new opportunities for competitive
advantages.
CONFERENCE15
REAL-TIME
Challenges at Stratio2
Real-time fraud monitoring
DATA RECEIVER
REAL-TIME
AGGREGATION
CONSOLIDATION
Dashboardin
g
Reporting
FRAUD
DETECTION
Leveraging the power of Spark Streaming, we have developed some fraud detection
solutions, aggregating data in real-time to work better with machine learning
algorithms.
CONFERENCE17
Extract, Transform and Aggregate
By combining Apache Flume and Spark Streaming we have deployed complex
topologies to deal with data coming from heterogeneous sources.
The full solution allow us to transform and aggregate data on-the-fly
(data cleaning, normalization and enrichment)
REAL-TIME
AGGREGATION
Dashboardin
g
Reporting
CONFERENCE18
Custom data sources and storage
Each project requires
specific inputs and data
storages, dealing with
different kinds of
events.
From click stream
activity to bank
transactions...
DATA STREAM
LOADING
TRANSFORM
CUSTOM LOGS
CONFERENCE19
Towards a generic real-time aggregation platform
At Stratio, we have implemented several real-time analytic projects based
on Apache Spark, Kafka, Flume, Cassandra, or MongoDB.
These technologies were always a perfect fit, but soon we found ourselves
writing the same pieces of integration code over and over again.
This is how SPARKTA was born.
CONFERENCE20
ELSEWHERE3
#1 RainBird from Twitter
Some folks from twitter shared some thoughts
about their real-time needs at Strata (2011).
They worked on a “generic” platform in order to
deal with pre-calculated data from a huge number
of events.
It allows them to deal with:
• Data Structures
• Hierarchical Aggregation
• Temporal Aggregation
• Multiple Formulas
Still not open sourceCURRENT STATE
http://goo.gl/ykvQa
CONFERENCE22
#2 Countandra
Countandra is a hierarchical distributed counting
engine exploiting all the excellent write&read
performance of Cassandra.
It supports:
• Geographically distributed counting.
• Easy Http Based interface to insert counts.
• Hierarchical counting such as
com.mywebsite.music.
• Retrieves counts, sums and square in near real-
time.
• Simple Http queries provides desired output in Json
format
• Queries can be sliced by period such as lasthour
,lastyear and so on for minutely,hourly,daily,monthly
values
https://github.com/milindparikh/Countandra
Rather deprecatedCURRENT STATE
CONFERENCE23
#3 ThunderRain from Intel
ThunderRain is a Real-Time Analytical Processing
(RTAP) example using Spark and Shark, which
can be best characterized by the following four
salient properties:
• Data continuously streamed in & processed
in near real-time
• Real-time data queried and presented
in an online fashion
• Real-time and history data combined
and mined interactively
• Predominant RAM-based processing
https://github.com/thunderain-
project/thunderain
Rather deprecatedCURRENT STATE
CONFERENCE24
#4 TSAR from Twitter
TSAR (the TimeSeries AggregatoR) is a
flexible, reusable, end-to-end service
architecture on top of Summingbird.
Twitter really needs a truly robust real-
time aggregation service considering their
scaling and evolving needs.
They realized that many time-series
applications call for essentially the same
architecture, with only slight variations in
the data model.
https://blog.twitter.com/2014/tsar-a-timeseries-aggregator
Still not open sourceCURRENT STATE
CONFERENCE25
Towards a generic real-time aggregation platform
Some initiatives have tried to solve this problem, but until now most of them
were complex or obsolete while others were not open source.
For this reason, Stratio created SPARKTA: an open source and full-featured
platform for real-time analytics, based on Apache Spark.
This is why SPARKTA was conceived
CONFERENCE26
4
THIS IS
SPARKTA
Distributed, high-volume & pluggable analytics framework
Our goals:
Since Aryabhatta invented zero, Mathematicians such as John von Neuman have
been in pursuit of efficient counting and architects have constantly built systems that
computes counts quicker. In this age of social media, where 100s of 1000s events
take place every second, we designed a aggregation engine to deliver real-time
service
• Pure Spark!
• No need of coding, only declarative aggregation
workflows
• Data continuously streamed in & processed in near real-
time
• Ready to use out of the box
• Plug & play: flexible workflows (inputs, outputs, parsers,
etc…)
• High performance
• Scalable and fault tolerant
CONFERENCE28
Sparkta: A first look
DRIVER - SUPERVISOR
AGGREGATION POLICY
QUERY
SERVICES
Aggregation policy
definition is sent to the
engine
Allows multiple application to be
defined, each of which is bound to
a context, executing the
aggregation workflow
others
AGGREGATION WORKFLOW
CONFERENCE29
Sparkta: Deploy any number of real-time aggregation policies
DRIVER - SUPERVISOR
You can start
several workflows
at any time, and
also stop or
monitor them
CONFERENCE30
Sparkta: Key Technologies
+
Apache Kite SDK
INPUTS PROCESSING
RabbitMQ
ZeroMQ
Twitter
Flume
Kafka
....
OUTPUTS
..
..
CONFERENCE31
Sparkta: Define your real-time needs
AGGREGATION POLICY
Remember: no need to code anything.
Define your workflow in a JSON document, including:
INPUT Where is the data coming from?
OUTPUT(s) Where should aggregate data be stored?
DIMENSION(s) Which fields will you need for your real-time
needs?
ROLLUP(s) How do you want to aggregate the dimensions?
TRANSFORMATION(s) Which functions should be applied before aggregation?
SAVE RAW DATA Do you want to save raw events?
CONFERENCE32
Sparkta: Key Technologies
ROLLUPS
• Pass-through
• Time-based
• Secondly, minutely, hourly, daily,
monthly, yearly...
• Hierarchycal
• GeoRange: Areas with different sizes
(rectangles)
OPERATORS
• Max, min, count, sum
• Average, median
• Stdev, variance, count distinct
• Last value
• Full-text search
KiteSDK
CONFERENCE33
Sparkta SDK
INPUT
OUTPUT(s)
DIMENSION(s)
OPERATORS
TRANSFORMATION(s)
Sparkta has been conceived as an SDK.
You can extend several points of the platform to
fulfill your needs, such as adding new inputs,
outputs, operators, dimension types.
Add new functions to Apache Kite in order to
extend the data cleaning, enrichment and
normalization capabilities.
CONFERENCE34
NEXT STEPS5
Source: mydisguises.com
Next steps in our roadmap (1)
Sparkta is a work in progress, so we still have some nice features to
develop…
QUERY
SERVICES
ALARMS
Creating a REST services layer in order to query the
aggregated data allows us to isolate the final consumer
from the specific data storage
Features
- Time ranges
- Agreggation on time ranges
- Best rollup selection
For example, I want to know if we have earned over $3000 in
London in the last hour...
Remember operational intelligence!
CONFERENCE36
Next steps in our roadmap (II)
WEB
APPLICATION
DEPLOYING &
MONITORING
How about a nice web interface to create and manage policies?
Forget the JSON file and use your mouse to define the workflow :)
We have been working with Spark jobServer & Yarn, but it will be
nice to support Mesos, for example.
Hey, did you miss something? Do you have a great idea?
Let us know!
MORE AWESOMENESS
CONFERENCE37
OPEN SOURCE
& COMMUNITY6
OPEN TO YOUR IDEAS
www.stratio.com
@StratioBD
https://github.com/stratio/sparkta
SPARKTA is fully open source
Apache 2 License.
We are open to contributors & ideas
CONFERENCE39
DEMO TIME7
Do you want to try SPARKTA?
Use a full-featured sandbox to start trying SPARKTA
vagrant init “stratio/sparkta”
vagrant up
Just open a shell and type
CONFERENCE41
Do you want to try SPARKTA?
Getting some real-time stats from
#StrataHadoop
Our real-time policy defines some
rollups in order to know chatty users, hot
hashtags, and heatmaps from
StrataConf tweets.
We are using the standard Twitter input
from Spark Streaming, ElasticSearch
output & Kibana to display results
CONFERENCE42
BIG DATA
CHILD`S PLAY

More Related Content

What's hot

Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
Trieu Nguyen
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaDatabricks
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Altan Khendup
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Databricks
 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark Summit
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
Dr. Mirko Kämpf
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
Databricks
 
Simplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeSimplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta Lake
Databricks
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
Prasad Wagle
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
Mars Lan
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Spark Summit
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Spark Summit
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
Rajesh Muppalla
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel Pedreschi
Brian Olsen
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Databricks
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystem
DataWorks Summit
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph Analysis
Jen Aman
 
Data Pipelines With Streamsets
Data Pipelines With Streamsets Data Pipelines With Streamsets
Data Pipelines With Streamsets
Jowanza Joseph
 
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Databricks
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
Spark Summit
 

What's hot (20)

Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion StoicaSpark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
 
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at ZalandoData Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
 
Simplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeSimplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta Lake
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
 
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
 
Lambda architecture @ Indix
Lambda architecture @ IndixLambda architecture @ Indix
Lambda architecture @ Indix
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel Pedreschi
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystem
 
Credit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph AnalysisCredit Fraud Prevention with Spark and Graph Analysis
Credit Fraud Prevention with Spark and Graph Analysis
 
Data Pipelines With Streamsets
Data Pipelines With Streamsets Data Pipelines With Streamsets
Data Pipelines With Streamsets
 
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
 

Similar to [Strata] Sparkta

Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
SingleStore
 
7_considerations_final
7_considerations_final7_considerations_final
7_considerations_finalJane Roberts
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPoint
confluent
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Grid Dynamics
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
confluent
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
AWS User Group Kochi
 
SAP Cloud Platform Community NL Kick-off
SAP Cloud Platform Community NL Kick-offSAP Cloud Platform Community NL Kick-off
SAP Cloud Platform Community NL Kick-off
Jan Penninkhof
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Anant Corporation
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
ScyllaDB
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?
confluent
 
Oracle Digital Business Transformation and Internet of Things by Ermin Prašović
Oracle Digital Business Transformation and Internet of Things by Ermin PrašovićOracle Digital Business Transformation and Internet of Things by Ermin Prašović
Oracle Digital Business Transformation and Internet of Things by Ermin Prašović
Bosnia Agile
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRB
William Poos
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
confluent
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
confluent
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
Riccardo Zamana
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
Luciano Resende
 
FIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE OverviewFIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE Overview
FIWARE
 

Similar to [Strata] Sparkta (20)

Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
7_considerations_final
7_considerations_final7_considerations_final
7_considerations_final
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPoint
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...Open Blueprint for Real-Time  Analytics in Retail: Strata Hadoop World 2017 S...
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
SAP Cloud Platform Community NL Kick-off
SAP Cloud Platform Community NL Kick-offSAP Cloud Platform Community NL Kick-off
SAP Cloud Platform Community NL Kick-off
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?
 
Oracle Digital Business Transformation and Internet of Things by Ermin Prašović
Oracle Digital Business Transformation and Internet of Things by Ermin PrašovićOracle Digital Business Transformation and Internet of Things by Ermin Prašović
Oracle Digital Business Transformation and Internet of Things by Ermin Prašović
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRB
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
SnapLogic's Latest Elastic iPaaS Release Adds Hybrid Links for Spark, Cortana...
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
 
FIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE OverviewFIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE Overview
 

More from Stratio

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Stratio
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18
Stratio
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka Meetup
Stratio
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
Stratio
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Stratio
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning
Stratio
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0
Stratio
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challenge
Stratio
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big Data
Stratio
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric Platform
Stratio
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
Stratio
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack”
Stratio
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Stratio
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelos
Stratio
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + Kerberos
Stratio
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model Trees
Stratio
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scalaStratio
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scalaStratio
 
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Stratio
 

More from Stratio (20)

Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
 
Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18Can an intelligent system exist without awareness? BDS18
Can an intelligent system exist without awareness? BDS18
 
Kafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka MeetupKafka and KSQL - Apache Kafka Meetup
Kafka and KSQL - Apache Kafka Meetup
 
Wild Data - The Data Science Meetup
Wild Data - The Data Science MeetupWild Data - The Data Science Meetup
Wild Data - The Data Science Meetup
 
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka MeetupUsing Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
 
Ensemble methods in Machine Learning
Ensemble methods in Machine Learning Ensemble methods in Machine Learning
Ensemble methods in Machine Learning
 
Stratio Sparta 2.0
Stratio Sparta 2.0Stratio Sparta 2.0
Stratio Sparta 2.0
 
Big Data Security: Facing the challenge
Big Data Security: Facing the challengeBig Data Security: Facing the challenge
Big Data Security: Facing the challenge
 
Operationalizing Big Data
Operationalizing Big DataOperationalizing Big Data
Operationalizing Big Data
 
Artificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric PlatformArtificial Intelligence on Data Centric Platform
Artificial Intelligence on Data Centric Platform
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
“A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack” “A Distributed Operational and Informational Technological Stack”
“A Distributed Operational and Informational Technological Stack”
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
 
Lunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelosLunch&Learn: Combinación de modelos
Lunch&Learn: Combinación de modelos
 
Meetup: Spark + Kerberos
Meetup: Spark + KerberosMeetup: Spark + Kerberos
Meetup: Spark + Kerberos
 
Distributed Logistic Model Trees
Distributed Logistic Model TreesDistributed Logistic Model Trees
Distributed Logistic Model Trees
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
 
Introduction to Asynchronous scala
Introduction to Asynchronous scalaIntroduction to Asynchronous scala
Introduction to Asynchronous scala
 
Functional programming in scala
Functional programming in scalaFunctional programming in scala
Functional programming in scala
 
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

[Strata] Sparkta

  • 1. SPARKTA A real-time analytics platform based on Apache Spark London, May 2015
  • 2. FIRST SPARK PLATFORM. APR 2014 20+ INTERNATIONAL PROJECTS WITH SPARK
  • 4. STRATIO INGESTION Customer lake STRATIO STREAMING STRATIO QUANTUM STRATIO DEEP STRATIO CROSSDATA ODBC JBDC API Rest CRM ERP Call Center BI Internal Data External Data BI AD HOC APP Hdfs S3 Elastic Search Mongo DB Cassandra Redis Oracle, DB2 Other Databases STRATIO DATAVIS 4
  • 5. STRATIO INGESTION Customer lake STRATIO STREAMING STRATIO QUANTUM STRATIO DEEP STRATIO CROSSDATA ODBC JBDC API Rest CRM ERP Call Center BI Internal Data External data BI AD HOC APP Ingests, transforms Analyzes and processes real time streaming A unified SQL interface Machine Learning and algorithms Processes & combines with Spark STRATIO DATAVIS Creates and designs dashboards and reports Hdfs S3 Elastic Search Mongo DB Cassandra Redis Oracle, DB2 Other Databases 5
  • 6. STRATIO INGESTION Ingests, transforms STRATIO STREAMING STRATIO QUANTUM STRATIO CROSSDATA Analyzes & processes A unified SQL interface Machine Learning and algorithms ODBC JBDC API Rest Streaming Apache Kite Apache Flume CRM ERP Call Center BI MLlib Internal Data External Data BI AD HOC APP Combines with Spark data from any source Customer lake STRATIO DEEP Processes & combines with Spark Hdfs S3 Elastic Search Mongo DB Cassandra Redis Oracle, DB2 Other Databases STRATIO DATAVIS Creates and designs dashboards and reports 6
  • 7. STRATIO INGESTION Hdfs S3 Elastic Search Mongo DB Cassandra Redis Oracle, DB2 Other Databases Ingests, transforms STRATIO STREAMING STRATIO QUANTUM STRATIO CROSSDATA Analyzes & processes Consult & analyze. SQL interface Machine Learning & algorithms ODBC JBDC API Rest Streaming Apache Kite Apache Flume CRM ERP Call Center BI MLib Internal Data External Data BI AD HOC APP Data combination through time Customer lake STRATIO DEEP Processes & combines with Spark Real-time Ephemer al tables Past Stored tables Future Quantum tables STRATIO DATAVIS Creates and designs dashboards and reports 7
  • 8. STRATIO DATAVIS STRATIO INGESTION Ingests, transforms STRATIO STREAMING STRATIO QUANTUM STRATIO CROSSDATA Analyzes & processes Consulta y analiza. Interfaz SQL Machine Learning & algorithms ODBC JBDC API Rest Streaming Apache Kite Apache Flume CRM ERP Call Center BI MLlib Internal Data External Data Creates and designs dashboards and reports Customer lake STRATIO DEEP Processes & combines with Spark Hdfs S3 Elastic Search Mongo DB Cassandra Redis Oracle, DB2 Other Databases INFORMATIONAL + OPERATIONAL WITHOUT NEED TO REPLICATE DATA Oracle, DB2 Other Databases Mongo DB TeradataOPERATIONAL 8
  • 10. The time is N W We all know this story already Social media and networking sites are a part of the fabric of everyday life, changing the way the world shares and accesses information. The overwhelming amount of information gathered not only from messages, updates and images but also readings from sensors,GPS signals and many other sources was the origin of a (big) technological revolution. Remember? VOLUME, VARIETY & VELOCITY CONFERENCE10
  • 11. Look at these sexy infographics! We all love data visualization Insights from this vast amount of data allows us to learn from the users and explore our own world. We can follow in real-time the evolution of a topic, an event or even an incident just by exploring aggregated data. CONFERENCE11
  • 12. Delivering real-time business in the Internet But beyond cool visualizations, there are some core services delivered in real-time, using aggregated data to answer common questions in the fastest way. These services are the heart of the business behind their nice logos. Site traffic, user engagement monitoring, service health, APIs, internal monitoring platforms, real-time dashboards… Aggregated data feeds directly to end users, publishers, and advertisers, among others. CONFERENCE12
  • 13. Pushing business’ processes to perform faster Digital companies, born to develop their services in real-time have changed the expectations of many others businesses. Real-time information makes it possible for a company to be much more agile than its competitors, improving business answers, gaining insights on their performance… CONFERENCE13
  • 14. Listen to your data… CLIENTTPV Accounts Loans and credits Insurances Broker Mortgages Cards Deposits ATM Online gateway application logs Social networks transactions geolocation CRM Where as business intelligence is data gathered for the purpose of analyzing trends over time, operational intelligence provides a picture of what is currently happening within a process. And we can listen to almost everything! Orders, transactions, clicks, calls, bookings, internal services... CONFERENCE14
  • 15. …and start delivering real-time services Real-time monitoring could be really nice, but your company needs to work in the same way as digital companies: • Rethinking existing processes to deliver them faster, better. • Creating new opportunities for competitive advantages. CONFERENCE15
  • 17. Real-time fraud monitoring DATA RECEIVER REAL-TIME AGGREGATION CONSOLIDATION Dashboardin g Reporting FRAUD DETECTION Leveraging the power of Spark Streaming, we have developed some fraud detection solutions, aggregating data in real-time to work better with machine learning algorithms. CONFERENCE17
  • 18. Extract, Transform and Aggregate By combining Apache Flume and Spark Streaming we have deployed complex topologies to deal with data coming from heterogeneous sources. The full solution allow us to transform and aggregate data on-the-fly (data cleaning, normalization and enrichment) REAL-TIME AGGREGATION Dashboardin g Reporting CONFERENCE18
  • 19. Custom data sources and storage Each project requires specific inputs and data storages, dealing with different kinds of events. From click stream activity to bank transactions... DATA STREAM LOADING TRANSFORM CUSTOM LOGS CONFERENCE19
  • 20. Towards a generic real-time aggregation platform At Stratio, we have implemented several real-time analytic projects based on Apache Spark, Kafka, Flume, Cassandra, or MongoDB. These technologies were always a perfect fit, but soon we found ourselves writing the same pieces of integration code over and over again. This is how SPARKTA was born. CONFERENCE20
  • 22. #1 RainBird from Twitter Some folks from twitter shared some thoughts about their real-time needs at Strata (2011). They worked on a “generic” platform in order to deal with pre-calculated data from a huge number of events. It allows them to deal with: • Data Structures • Hierarchical Aggregation • Temporal Aggregation • Multiple Formulas Still not open sourceCURRENT STATE http://goo.gl/ykvQa CONFERENCE22
  • 23. #2 Countandra Countandra is a hierarchical distributed counting engine exploiting all the excellent write&read performance of Cassandra. It supports: • Geographically distributed counting. • Easy Http Based interface to insert counts. • Hierarchical counting such as com.mywebsite.music. • Retrieves counts, sums and square in near real- time. • Simple Http queries provides desired output in Json format • Queries can be sliced by period such as lasthour ,lastyear and so on for minutely,hourly,daily,monthly values https://github.com/milindparikh/Countandra Rather deprecatedCURRENT STATE CONFERENCE23
  • 24. #3 ThunderRain from Intel ThunderRain is a Real-Time Analytical Processing (RTAP) example using Spark and Shark, which can be best characterized by the following four salient properties: • Data continuously streamed in & processed in near real-time • Real-time data queried and presented in an online fashion • Real-time and history data combined and mined interactively • Predominant RAM-based processing https://github.com/thunderain- project/thunderain Rather deprecatedCURRENT STATE CONFERENCE24
  • 25. #4 TSAR from Twitter TSAR (the TimeSeries AggregatoR) is a flexible, reusable, end-to-end service architecture on top of Summingbird. Twitter really needs a truly robust real- time aggregation service considering their scaling and evolving needs. They realized that many time-series applications call for essentially the same architecture, with only slight variations in the data model. https://blog.twitter.com/2014/tsar-a-timeseries-aggregator Still not open sourceCURRENT STATE CONFERENCE25
  • 26. Towards a generic real-time aggregation platform Some initiatives have tried to solve this problem, but until now most of them were complex or obsolete while others were not open source. For this reason, Stratio created SPARKTA: an open source and full-featured platform for real-time analytics, based on Apache Spark. This is why SPARKTA was conceived CONFERENCE26
  • 28. Distributed, high-volume & pluggable analytics framework Our goals: Since Aryabhatta invented zero, Mathematicians such as John von Neuman have been in pursuit of efficient counting and architects have constantly built systems that computes counts quicker. In this age of social media, where 100s of 1000s events take place every second, we designed a aggregation engine to deliver real-time service • Pure Spark! • No need of coding, only declarative aggregation workflows • Data continuously streamed in & processed in near real- time • Ready to use out of the box • Plug & play: flexible workflows (inputs, outputs, parsers, etc…) • High performance • Scalable and fault tolerant CONFERENCE28
  • 29. Sparkta: A first look DRIVER - SUPERVISOR AGGREGATION POLICY QUERY SERVICES Aggregation policy definition is sent to the engine Allows multiple application to be defined, each of which is bound to a context, executing the aggregation workflow others AGGREGATION WORKFLOW CONFERENCE29
  • 30. Sparkta: Deploy any number of real-time aggregation policies DRIVER - SUPERVISOR You can start several workflows at any time, and also stop or monitor them CONFERENCE30
  • 31. Sparkta: Key Technologies + Apache Kite SDK INPUTS PROCESSING RabbitMQ ZeroMQ Twitter Flume Kafka .... OUTPUTS .. .. CONFERENCE31
  • 32. Sparkta: Define your real-time needs AGGREGATION POLICY Remember: no need to code anything. Define your workflow in a JSON document, including: INPUT Where is the data coming from? OUTPUT(s) Where should aggregate data be stored? DIMENSION(s) Which fields will you need for your real-time needs? ROLLUP(s) How do you want to aggregate the dimensions? TRANSFORMATION(s) Which functions should be applied before aggregation? SAVE RAW DATA Do you want to save raw events? CONFERENCE32
  • 33. Sparkta: Key Technologies ROLLUPS • Pass-through • Time-based • Secondly, minutely, hourly, daily, monthly, yearly... • Hierarchycal • GeoRange: Areas with different sizes (rectangles) OPERATORS • Max, min, count, sum • Average, median • Stdev, variance, count distinct • Last value • Full-text search KiteSDK CONFERENCE33
  • 34. Sparkta SDK INPUT OUTPUT(s) DIMENSION(s) OPERATORS TRANSFORMATION(s) Sparkta has been conceived as an SDK. You can extend several points of the platform to fulfill your needs, such as adding new inputs, outputs, operators, dimension types. Add new functions to Apache Kite in order to extend the data cleaning, enrichment and normalization capabilities. CONFERENCE34
  • 36. Next steps in our roadmap (1) Sparkta is a work in progress, so we still have some nice features to develop… QUERY SERVICES ALARMS Creating a REST services layer in order to query the aggregated data allows us to isolate the final consumer from the specific data storage Features - Time ranges - Agreggation on time ranges - Best rollup selection For example, I want to know if we have earned over $3000 in London in the last hour... Remember operational intelligence! CONFERENCE36
  • 37. Next steps in our roadmap (II) WEB APPLICATION DEPLOYING & MONITORING How about a nice web interface to create and manage policies? Forget the JSON file and use your mouse to define the workflow :) We have been working with Spark jobServer & Yarn, but it will be nice to support Mesos, for example. Hey, did you miss something? Do you have a great idea? Let us know! MORE AWESOMENESS CONFERENCE37
  • 39. OPEN TO YOUR IDEAS www.stratio.com @StratioBD https://github.com/stratio/sparkta SPARKTA is fully open source Apache 2 License. We are open to contributors & ideas CONFERENCE39
  • 41. Do you want to try SPARKTA? Use a full-featured sandbox to start trying SPARKTA vagrant init “stratio/sparkta” vagrant up Just open a shell and type CONFERENCE41
  • 42. Do you want to try SPARKTA? Getting some real-time stats from #StrataHadoop Our real-time policy defines some rollups in order to know chatty users, hot hashtags, and heatmaps from StrataConf tweets. We are using the standard Twitter input from Spark Streaming, ElasticSearch output & Kibana to display results CONFERENCE42

Editor's Notes

  1. Buscar reloj para reemplazar la O.
  2. Buscar reloj para reemplazar la O.
  3. Buscar reloj para reemplazar la O.
  4. Buscar reloj para reemplazar la O.
  5. Buscar reloj para reemplazar la O.
  6. Buscar reloj para reemplazar la O.
  7. Buscar reloj para reemplazar la O.
  8. Buscar reloj para reemplazar la O.
  9. Buscar reloj para reemplazar la O.
  10. Buscar reloj para reemplazar la O.
  11. Buscar reloj para reemplazar la O.
  12. Buscar reloj para reemplazar la O.
  13. Buscar reloj para reemplazar la O.
  14. Buscar reloj para reemplazar la O.
  15. Buscar reloj para reemplazar la O.
  16. Buscar reloj para reemplazar la O.
  17. Buscar reloj para reemplazar la O.
  18. Buscar reloj para reemplazar la O.
  19. Buscar reloj para reemplazar la O.
  20. Buscar reloj para reemplazar la O.
  21. Buscar reloj para reemplazar la O.
  22. Buscar reloj para reemplazar la O.
  23. Buscar reloj para reemplazar la O.
  24. Buscar reloj para reemplazar la O.
  25. Buscar reloj para reemplazar la O.
  26. Buscar reloj para reemplazar la O.
  27. Buscar reloj para reemplazar la O.