SlideShare a Scribd company logo
Reducing cost and time-to-market for Big Data Analytics
& Applications by 10X
Self-Service Big Data Analytics & Applications
Cut down from months to hours
Agenda
Problem
Sparkflows Solution
Differentiators
2
3
Data Analysts
Data
Engineers
Data
Scientists
Its challenging for users and get value out of the Data Lake
Data Lake
● Data Analytics, Data Preparation &
Blending
● Machine Learning
● Streaming Applications
● Batch Applications
● Dashboards & Visualization
Needs a lot of coding on Big
Data
4
Machine Learning
Classification
Regression
Clustering
Collaborative Filtering
Save/Load Model
Predict
Cross-Validator
NLP
CoreNLP
StanfordNLP
OCR
Tesseract
File Formats
CSV/TSV
Parquet
JSON
Avro
PDF
Images
Whole Files
Feature
Generation
Tokenization
TF, IDF
OneHotEncoder
StringIndexer
Imputer
Scaler
Data Sources/Sinks
HDFS
S3
Kafka, Flume, Twitter
HBase
Solr
ETL
Joins, Unions
Filter
SQL, Scala, Python
GeoIP
ConcatColumns
Column Filter
Dedup
5
Long time to
Production & Value
Hard to maintain and extend
the pipelines/applications
Very Hard to
Collaborate
Business Data Scientist Data Engineer IT
Very Complex
Deployment
Hard to handover code
Results In
Data Analysts
Data
Engineers
Data
Scientists
Spark
Relational
Batch + Streaming
Hadoop
Workflow / Application
Repository
Nodes Repository
Future
● 100+ Nodes
● Entity Resolution
● Machine Learning
● Data Wrangling / ETL / Drools
● Sentiment Analysis
● Recommendations
● Churn Prediction
● Log Analytics
● Workflow Designer
● Preview Mode
● Execution Engine
● Visualization
+ SQL / Scala / Python
7
Sparkflows Solution
Workflow Editor
How Sparkflows Works
Rich Visualizations &
Dashboards
100’s of Nodes
Batch & Streaming
Engine
Interactive Execution
Easy Deployment &
Configuration
Pre-built Workflows
Telco Churn Pred
Housing Price Pred
Bike Sharing Analysis
NY Taxi Data Analysis
Movie Lens
Recommendations
Confidential Property of Sparkflows.io
Sparkflows Product Stack
Streaming
Data
Kafka
Flume
Data
Sources
HIVE/HBase
HDFS/S3
Solr
RDBMS
Apache Spark Cluster
Databricks AWS
IBM
Bluemix
On
Prem
Azur
e
Visualizations
ETL/NLP/OCR
Model Building
Workflow Execution
Scala/SQL/Python
Data Wrangling
Data Analysis
Data Pipelines
Big Data Analytics /Applications
Visualization
Data Sinks
HIVE/HBase
HDFS/S3
Solr
RDBMS
10
Business Analyst
Data
Scientist
Data Engineer IT
Data Analytics for Business Use
Cases by dragging and
dropping nodes and using
various datasets.
Visualization and deep
understanding of the data
Build predictive models and apply
predictions
Do predictive and analytical
modeling with the drag-and-
drop capabilities
Write custom SQL, Scala, Python
to close the gaps
Blend static and real-time streams
to build complex data
pipelines
Build and deploy complex
pipelines in minutes.
Connect to various sources and
sinks including Kafaka, HDFS,
S3, HBase, Solr.
Build and expose custom nodes
​in Sparkflows for others to
use
Embed SQL, Scala, Python within
the workflow.
Easily configure multi-tenancy
and security for
Sparkflows users
Connect workflow results to
platform of choice ​for
visualization
Provision Hadoop
infrastructure, monitor
workflow jobs, and tune
performance
Why Now?
Big Trend towards building with Templates
11
Streamsets
iPhone Apps
Building Website
nifi
StreamAnalytix
Impetus
Alteryx
Dashboards
12
Combine output of various Workflows into Dashboards
Core Differentiators
13
Easy & Natural to use and Deploy
Deep Integration with Hadoop -
Security/Impersonation/HIVE/HBase/Solr
Custom Nodes - Users can write their own
Nodes and plug into the UI
Schema Propagation
Interactive Execution at Design Time
Rich Application Dashboards
Growing Repository of Workflows for
various Solutions
Building out of Complex Nodes by
Sparkflows - Dedup, Drools,
OpenNLP, StanfordNLP, Tesseract
etc.
Batch & Streaming - Nodes support
both Batch & Streaming workloads
Support for SQL, Scala, Jython as
Nodes of the workflow
Line of Products
14
Data Analytics
(Analytics / Wrangling
/ Machine Learning)
Streaming
Analytics
Applications
15
THANK YOU
Building Big Data Analytics & Applications is very costly & time
consuming
16
Customer
360
Fraud
Detection
Operations
Analytics
Cyber
Security
IoT
Analytics
Analytics
Application
s
Not enough users are able to extract great value from the Data Lake
Needs a lot of coding on Big
Data
17
Data Analytics, Data
Preparation &
Blending
Machine LearningStreaming Applications
Batch Applications
Visualizations

More Related Content

What's hot

Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
Sascha Dittmann
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
ETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure Databricks
Databricks
 
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
Microsoft Tech Community
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Lace Lofranco
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
 
Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008
Eduardo Castro
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudSQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
Mark Kromer
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat Patterson
Spark Summit
 
Disrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the CloudDisrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the Cloud
Jen Aman
 
Super charged prototyping
Super charged prototypingSuper charged prototyping
Super charged prototyping
Michael Stephenson
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
Databricks
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
Stepan Pushkarev
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Rukmani Gopalan
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
Matt Ingenthron
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Databricks
 
Managing your ML lifecycle with Azure Databricks and Azure ML
Managing your ML lifecycle with Azure Databricks and Azure MLManaging your ML lifecycle with Azure Databricks and Azure ML
Managing your ML lifecycle with Azure Databricks and Azure ML
Parashar Shah
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta Lake
Databricks
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
Databricks
 

What's hot (20)

Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
ETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure DatabricksETL Made Easy with Azure Data Factory and Azure Databricks
ETL Made Easy with Azure Data Factory and Azure Databricks
 
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
 
Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudSQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat Patterson
 
Disrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the CloudDisrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the Cloud
 
Super charged prototyping
Super charged prototypingSuper charged prototyping
Super charged prototyping
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
Managing your ML lifecycle with Azure Databricks and Azure ML
Managing your ML lifecycle with Azure Databricks and Azure MLManaging your ML lifecycle with Azure Databricks and Azure ML
Managing your ML lifecycle with Azure Databricks and Azure ML
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta Lake
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 

Similar to Sparkflows.io

Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
David Talby
 
Rajeev kumar apache_spark & scala developer
Rajeev kumar apache_spark & scala developerRajeev kumar apache_spark & scala developer
Rajeev kumar apache_spark & scala developer
Rajeev Kumar
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Mopuru Babu
 
H2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SFH2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SF
Sri Ambati
 
Informatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQLInformatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQL
sivakumar s
 
H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16
Sri Ambati
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpbigdata sunil
 
Chandan's_Resume
Chandan's_ResumeChandan's_Resume
Chandan's_ResumeChandan Das
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @Indix
Manoj Mahalingam
 
Denodo Design Studio: Modeling and Creation of Data Services
Denodo Design Studio: Modeling and Creation of Data ServicesDenodo Design Studio: Modeling and Creation of Data Services
Denodo Design Studio: Modeling and Creation of Data Services
Denodo
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
All Things Open
 
Sureh hadoop 3 years t
Sureh hadoop 3 years tSureh hadoop 3 years t
Sureh hadoop 3 years t
suresh thallapelly
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Databricks
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
Databricks
 
03_aiops-1.pptx
03_aiops-1.pptx03_aiops-1.pptx
03_aiops-1.pptx
FarazulHoda2
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann
 

Similar to Sparkflows.io (20)

Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Rajeev kumar apache_spark & scala developer
Rajeev kumar apache_spark & scala developerRajeev kumar apache_spark & scala developer
Rajeev kumar apache_spark & scala developer
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
 
H2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SFH2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SF
 
Informatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQLInformatica,Teradata,Oracle,SQL
Informatica,Teradata,Oracle,SQL
 
SivakumarS
SivakumarSSivakumarS
SivakumarS
 
H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16
 
Shrikanth
ShrikanthShrikanth
Shrikanth
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExp
 
Chandan's_Resume
Chandan's_ResumeChandan's_Resume
Chandan's_Resume
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @Indix
 
Denodo Design Studio: Modeling and Creation of Data Services
Denodo Design Studio: Modeling and Creation of Data ServicesDenodo Design Studio: Modeling and Creation of Data Services
Denodo Design Studio: Modeling and Creation of Data Services
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
 
Sureh hadoop 3 years t
Sureh hadoop 3 years tSureh hadoop 3 years t
Sureh hadoop 3 years t
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
 
03_aiops-1.pptx
03_aiops-1.pptx03_aiops-1.pptx
03_aiops-1.pptx
 
BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 

Recently uploaded

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 

Recently uploaded (20)

Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 

Sparkflows.io

  • 1. Reducing cost and time-to-market for Big Data Analytics & Applications by 10X Self-Service Big Data Analytics & Applications Cut down from months to hours
  • 3. 3 Data Analysts Data Engineers Data Scientists Its challenging for users and get value out of the Data Lake Data Lake ● Data Analytics, Data Preparation & Blending ● Machine Learning ● Streaming Applications ● Batch Applications ● Dashboards & Visualization Needs a lot of coding on Big Data
  • 4. 4 Machine Learning Classification Regression Clustering Collaborative Filtering Save/Load Model Predict Cross-Validator NLP CoreNLP StanfordNLP OCR Tesseract File Formats CSV/TSV Parquet JSON Avro PDF Images Whole Files Feature Generation Tokenization TF, IDF OneHotEncoder StringIndexer Imputer Scaler Data Sources/Sinks HDFS S3 Kafka, Flume, Twitter HBase Solr ETL Joins, Unions Filter SQL, Scala, Python GeoIP ConcatColumns Column Filter Dedup
  • 5. 5 Long time to Production & Value Hard to maintain and extend the pipelines/applications Very Hard to Collaborate Business Data Scientist Data Engineer IT Very Complex Deployment Hard to handover code Results In
  • 6. Data Analysts Data Engineers Data Scientists Spark Relational Batch + Streaming Hadoop Workflow / Application Repository Nodes Repository Future ● 100+ Nodes ● Entity Resolution ● Machine Learning ● Data Wrangling / ETL / Drools ● Sentiment Analysis ● Recommendations ● Churn Prediction ● Log Analytics ● Workflow Designer ● Preview Mode ● Execution Engine ● Visualization + SQL / Scala / Python
  • 8. Workflow Editor How Sparkflows Works Rich Visualizations & Dashboards 100’s of Nodes Batch & Streaming Engine Interactive Execution Easy Deployment & Configuration Pre-built Workflows Telco Churn Pred Housing Price Pred Bike Sharing Analysis NY Taxi Data Analysis Movie Lens Recommendations
  • 9. Confidential Property of Sparkflows.io Sparkflows Product Stack Streaming Data Kafka Flume Data Sources HIVE/HBase HDFS/S3 Solr RDBMS Apache Spark Cluster Databricks AWS IBM Bluemix On Prem Azur e Visualizations ETL/NLP/OCR Model Building Workflow Execution Scala/SQL/Python Data Wrangling Data Analysis Data Pipelines Big Data Analytics /Applications Visualization Data Sinks HIVE/HBase HDFS/S3 Solr RDBMS
  • 10. 10 Business Analyst Data Scientist Data Engineer IT Data Analytics for Business Use Cases by dragging and dropping nodes and using various datasets. Visualization and deep understanding of the data Build predictive models and apply predictions Do predictive and analytical modeling with the drag-and- drop capabilities Write custom SQL, Scala, Python to close the gaps Blend static and real-time streams to build complex data pipelines Build and deploy complex pipelines in minutes. Connect to various sources and sinks including Kafaka, HDFS, S3, HBase, Solr. Build and expose custom nodes ​in Sparkflows for others to use Embed SQL, Scala, Python within the workflow. Easily configure multi-tenancy and security for Sparkflows users Connect workflow results to platform of choice ​for visualization Provision Hadoop infrastructure, monitor workflow jobs, and tune performance
  • 11. Why Now? Big Trend towards building with Templates 11 Streamsets iPhone Apps Building Website nifi StreamAnalytix Impetus Alteryx
  • 12. Dashboards 12 Combine output of various Workflows into Dashboards
  • 13. Core Differentiators 13 Easy & Natural to use and Deploy Deep Integration with Hadoop - Security/Impersonation/HIVE/HBase/Solr Custom Nodes - Users can write their own Nodes and plug into the UI Schema Propagation Interactive Execution at Design Time Rich Application Dashboards Growing Repository of Workflows for various Solutions Building out of Complex Nodes by Sparkflows - Dedup, Drools, OpenNLP, StanfordNLP, Tesseract etc. Batch & Streaming - Nodes support both Batch & Streaming workloads Support for SQL, Scala, Jython as Nodes of the workflow
  • 14. Line of Products 14 Data Analytics (Analytics / Wrangling / Machine Learning) Streaming Analytics Applications
  • 16. Building Big Data Analytics & Applications is very costly & time consuming 16 Customer 360 Fraud Detection Operations Analytics Cyber Security IoT Analytics Analytics Application s Not enough users are able to extract great value from the Data Lake
  • 17. Needs a lot of coding on Big Data 17 Data Analytics, Data Preparation & Blending Machine LearningStreaming Applications Batch Applications Visualizations

Editor's Notes

  1. Makes building Big Data Applications Agile, much, much faster and predictable
  2. Benefits: Business Users Can Really Interact with Data & Experiment with Building Applications Rich Dashboards - Make day-to-day operations more efficient and provide insights into data and workflow performance Pre-Built Applications which can be easily extended or changed Use Cases Easy to Visualize and Implement