Sparkflows.io

Reducing cost and time-to-market for Big Data Analytics
& Applications by 10X
Self-Service Big Data Analytics & Applications
Cut down from months to hours

Agenda
Problem
Sparkflows Solution
Differentiators
2

3
Data Analysts
Data
Engineers
Data
Scientists
Its challenging for users and get value out of the Data Lake
Data Lake
● Data Analytics, Data Preparation &
Blending
● Machine Learning
● Streaming Applications
● Batch Applications
● Dashboards & Visualization
Needs a lot of coding on Big
Data

4
Machine Learning
Classification
Regression
Clustering
Collaborative Filtering
Save/Load Model
Predict
Cross-Validator
NLP
CoreNLP
StanfordNLP
OCR
Tesseract
File Formats
CSV/TSV
Parquet
JSON
Avro
PDF
Images
Whole Files
Feature
Generation
Tokenization
TF, IDF
OneHotEncoder
StringIndexer
Imputer
Scaler
Data Sources/Sinks
HDFS
S3
Kafka, Flume, Twitter
HBase
Solr
ETL
Joins, Unions
Filter
SQL, Scala, Python
GeoIP
ConcatColumns
Column Filter
Dedup

5
Long time to
Production & Value
Hard to maintain and extend
the pipelines/applications
Very Hard to
Collaborate
Business Data Scientist Data Engineer IT
Very Complex
Deployment
Hard to handover code
Results In

Data Analysts
Data
Engineers
Data
Scientists
Spark
Relational
Batch + Streaming
Hadoop
Workflow / Application
Repository
Nodes Repository
Future
● 100+ Nodes
● Entity Resolution
● Machine Learning
● Data Wrangling / ETL / Drools
● Sentiment Analysis
● Recommendations
● Churn Prediction
● Log Analytics
● Workflow Designer
● Preview Mode
● Execution Engine
● Visualization
+ SQL / Scala / Python

Workflow Editor
How Sparkflows Works
Rich Visualizations &
Dashboards
100’s of Nodes
Batch & Streaming
Engine
Interactive Execution
Easy Deployment &
Configuration
Pre-built Workflows
Telco Churn Pred
Housing Price Pred
Bike Sharing Analysis
NY Taxi Data Analysis
Movie Lens
Recommendations

Confidential Property of Sparkflows.io
Sparkflows Product Stack
Streaming
Data
Kafka
Flume
Data
Sources
HIVE/HBase
HDFS/S3
Solr
RDBMS
Apache Spark Cluster
Databricks AWS
IBM
Bluemix
On
Prem
Azur
e
Visualizations
ETL/NLP/OCR
Model Building
Workflow Execution
Scala/SQL/Python
Data Wrangling
Data Analysis
Data Pipelines
Big Data Analytics /Applications
Visualization
Data Sinks
HIVE/HBase
HDFS/S3
Solr
RDBMS

10
Business Analyst
Data
Scientist
Data Engineer IT
Data Analytics for Business Use
Cases by dragging and
dropping nodes and using
various datasets.
Visualization and deep
understanding of the data
Build predictive models and apply
predictions
Do predictive and analytical
modeling with the drag-and-
drop capabilities
Write custom SQL, Scala, Python
to close the gaps
Blend static and real-time streams
to build complex data
pipelines
Build and deploy complex
pipelines in minutes.
Connect to various sources and
sinks including Kafaka, HDFS,
S3, HBase, Solr.
Build and expose custom nodes
in Sparkflows for others to
use
Embed SQL, Scala, Python within
the workflow.
Easily configure multi-tenancy
and security for
Sparkflows users
Connect workflow results to
platform of choice for
visualization
Provision Hadoop
infrastructure, monitor
workflow jobs, and tune
performance

Why Now?
Big Trend towards building with Templates
11
Streamsets
iPhone Apps
Building Website
nifi
StreamAnalytix
Impetus
Alteryx

Dashboards
12
Combine output of various Workflows into Dashboards

Core Differentiators
13
Easy & Natural to use and Deploy
Deep Integration with Hadoop -
Security/Impersonation/HIVE/HBase/Solr
Custom Nodes - Users can write their own
Nodes and plug into the UI
Schema Propagation
Interactive Execution at Design Time
Rich Application Dashboards
Growing Repository of Workflows for
various Solutions
Building out of Complex Nodes by
Sparkflows - Dedup, Drools,
OpenNLP, StanfordNLP, Tesseract
etc.
Batch & Streaming - Nodes support
both Batch & Streaming workloads
Support for SQL, Scala, Jython as
Nodes of the workflow

Line of Products
14
Data Analytics
(Analytics / Wrangling
/ Machine Learning)
Streaming
Analytics
Applications

Building Big Data Analytics & Applications is very costly & time
consuming
16
Customer
360
Fraud
Detection
Operations
Analytics
Cyber
Security
IoT
Analytics
Analytics
Application
s
Not enough users are able to extract great value from the Data Lake

Needs a lot of coding on Big
Data
17
Data Analytics, Data
Preparation &
Blending
Machine LearningStreaming Applications
Batch Applications
Visualizations

Sparkflows.io

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sparkflows.io

Similar to Sparkflows.io (20)

Recently uploaded

Recently uploaded (20)

Sparkflows.io

Editor's Notes