SparkFlow

Use Cases to Build & Deploy in < 30 min
Self-Serve Big Data Analytics & Applications

2
Agenda
Introduction
Sparkflows Solution
Use Cases

3
100 + Building Blocks
ETL, ML, OCR, NLP, Connect to various
Sources/Sinks
Workflow Editor
Powerful Schema Inference, Schema Propagation,
Interactive Execution
Visualization & DashboardsPrebuilt Workflows
Introduction

4
Workflow Editor
Sparkflows Solution
Rich Visualizations &
Dashboards
100’s of Pre-
built Nodes
Batch & Streaming
Engine
Interactive Execution
Easy Deployment &
Configuration
Pre-built Workflows
Telco Churn Pred
Housing Price Pred
Bike Sharing Analysis
NY Taxi Data Analysis
Movie Lens
Recommendations

5
Sparkflows Product Stack
Streaming
Data
Kafka
Flume
Data
Sources
HIVE/HBase
HDFS/S3
Solr
RDBMS
Apache Spark Cluster
Databricks AWS
IBM
Bluemix
On
Prem
Azur
e
Data Sinks
HIVE/HBase
HDFS/S3
Solr
RDBMS
Visualizations
/ Dashboards

6
Machine Learning
Classification
Regression
Clustering
Collaborative Filtering
Save/Load Model
Predict
Cross-Validator
NLP
NER
Sentiment
OCR
Tesseract
Visualization
Line Chart
Bar Chart
Pie Chart
Updating Dashboards
File Formats
CSV/TSV
Parquet
JSON
Avro
PDF
Images
Whole Files
Feature
Generation
Tokenization
TF, IDF
OneHotEncoder
StringIndexer
Imputer
Scaler
Data Sources/Sinks
HDFS
S3
Kafka, Flume, Twitter
HBase
Solr
Elastic Search
ETL
Joins, Unions
Filter
SQL, Scala, Python
GeoIP
ConcatColumns
Column Filter
Dedup
Languages
SQL
Scala
Jython
Java
Some of the Building Block / Nodes

7
Use Cases in < 30 minutes
Self-Serve Big Data Analytics
ETL Pipelines
NLP
OCR
Streaming Analytics
Do Big Data Analytics with Drag & Drop with 100+ building blocks
Build ETL pipelines with ease. Also incorporate SQL, Scala, Jython in it.
Perform NLP on Big Data with OpenNLP and Stanford CoreNLP
Perform OCR on millions of images with Tesseract
Perform Streaming Analytics reading from Kafka, performing complex
transforms, generate graphs and write out to Solr, Hbase etc.

8
Machine Learning
Entity Resolution
Log Analytics
Format Conversion
Load data into Solr, ES,
HBase
Perform Machine Learning on huge datasets with drag and drop
Perform large scale Entity Resolution on data from multiple channels
Build Log Analytics Platform with Kafka, Spark, Solr/Elastic Search, Hue
Convert Big Data from one format to another
Easily load data into Solr, Elastic Search, HBase etc.

9
Custom Nodes Create Custom Nodes and drop them in the Library/Workflow Editor
Dashboards Combine various outputs of workflows into a Dashboard

Self-Serve Data Analytics
Spark
CSV
Read
AVRO
Save
JSON
Parquet
Solr
HBase
Elastic
Search
HIVE
Row Filter /
Rename Col
Random
Forest
SQL / Scala / Jython
JOIN
Read
Graph
Graph
Model
Dashboard

ETL – Build ETL pipelines with ease
HIVE
Solr
Spark
CSV Filter
Filter
JOIN SQL
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadCSV
ReadHIVE

ETL – Connect various SQL for powerful pipelines
HIVE
Solr
Spark
CSV SQL
SQL
SQL SQL
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadCSV
ReadHIVE

NLP – Perform distributed NLP on Big Data
CSV
Solr
Spark
PDF NLP
NLP
JOIN
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadPDF
ReadCSV

OCR – Perform distributed OCR on Big Data
Solr
Spark
PDF OCR
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadPDF
Plus extract
images

Streaming Analytics – With Kafka & Spark Streaming
Solr
Spark
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadKafka
Apply
various
transforms
K
a
f
k
a
Transform
Graph

Machine Learning – With Spark ML
Spark
Logistic
Regression
Score
Evaluate
Apply
various
transforms
TransformHIVE Split

Entity Resolution – Applying various distance algorithms & scoring
Spark
Dedup
Join &
Transform
DataSet 1
DataSet 2
HIVE
Filter low
Scores

Log Analytics
Spark
IP2Geo
ReadKafka
K
a
f
k
a
Graph
Apache
Logs
Parse Apache
Logs
Save
Solr
HBase
Elastic
Search
HIVE
SQL
HUE

Small Files Problem
CSV
Spark
CSV
Coalesce
HIVE
Read
HIVE
Save

Format Conversion
Spark
CSV
Read
AVRO
Save
JSON
Parquet
CSV
AVRO
JSON
Parquet

Loading Data into Solr, Elastic Search, HBase, HIVE
Spark
CSV
Read
AVRO
Save
JSON
Parquet
Solr
HBase
Elastic
Search
HIVE

Custom Nodes – Create & Use Custom Nodes which add custom features
Spark
Custom Node
Join &
Transform
DataSet 1
DataSet 2
HIVECustom Node

Dashboards – Combine output of various Workflows/Nodes into a Dashboard

SparkFlow

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SparkFlow

Similar to SparkFlow (20)

Recently uploaded

Recently uploaded (20)

SparkFlow