SlideShare a Scribd company logo
MISTMIST
Serverless proxy to Apache Spark
VADIM CHELYSHOVVADIM CHELYSHOV
github.com/dos65
hydrosphere.io
Scalalaz podcast (RUS)
Scala-digest (RUS)
HYDROSPHERE MISTHYDROSPHERE MIST
1.0.0-RC16
https://github.com/Hydrospheredata/mist
WHAT PROBLEM DOES IT SOLVE?WHAT PROBLEM DOES IT SOLVE?
Getting Big Data Projects to Production Is aGetting Big Data Projects to Production Is a
ChallengeChallenge
Only 15 percent of businesses reported deploying their
big data project to production, effectively unchanged
from last year (14 percent).
https://www.gartner.com/newsroom/id/3466117
Data engineers vs. data scientistsData engineers vs. data scientists
A common starting point is 2-3 data engineers for every
data scientist. For some organizations with more
complex data engineering requirements, this can be 4-5
data engineers per data scientist
https://www.oreilly.com/ideas/data-engineers-vs-
data-scientists
The lack of tooling?The lack of tooling?
The lack of right tooling?The lack of right tooling?
Getting Spark Projects to ProductionGetting Spark Projects to Production
How to
provide a way to run our job?
return a result?
report an error?
run multiply jobs in parallel?
LET'S COMPARE WITH USUALLET'S COMPARE WITH USUAL
THINGSTHINGS
DatabasesDatabases
Postgresql - 1995
CREATE OR REPLACE FUNCTION foo(param VARCHAR)
RETURNS TABLE
...
Web applicationsWeb applications
Java servlet - 1997
public class NewServlet extends HttpServlet {
@Override
protected void doGet(
HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
...
}
}
Big data processingBig data processing
Hadoop - 2005
Apache Spark - 2011
object MyApp {
def main(arg: String[]): Unit = {
val conf = new SparkConf().setAppName(...)
val sc = new SparkContext(conf)
...
}
}
./bin/spark-submit ...
RunRun
Databases SELECT * FROM foo(param)
Web applications curl
Spark Shell + Trigger
Return resultReturn result
Databases built-in protocol
Web applications built-in protocol
Spark Read from fs/storages
Result errorResult error
Databases explicit error information
Web
applications
HTTP CODE + explicit error
infomation
Spark exit code
ParallelismParallelism
Databases by default
Web applications by default
Spark Good luck!
What a hell is going on!
These thing aren't
about big data!
It isn't normal development experinceIt isn't normal development experince
A lot of actions over ssh-session to test, deploy and
run your code
Improssible to run job without direct shell
command
Process more that one job in parallel is difficult
Poor interfaces
Just a special service!Just a special service!
Artifacts/settings CRUD
API for launching jobs and receiving their status
Spark driver launcher
A library do describe operations using Spark
MIST BASICSMIST BASICS
What is Mist?What is Mist?
Serverless proxy to Apache Spark
What is Mist?What is Mist?
Serverless proxy      ?
Apache Spark      ✔
What is Mist?What is Mist?
It's a combinations of
Programming framework
HTTP/Async interface to run deploy and run spark
programms
Spark contexts / driver manager
ConceptsConcepts
Function
 user code with Spark program to be deployed on Mist
Artifact
 file (.jar or .py) that contains a Function
Context
 settings for Mist Worker where Function is being
executed
Job
 a result of the Function execution
Mist instancesMist instances
Mist Worker
 Spark driver application which invokes functions
Mist Master
 exposes http/async api
 stores functions/artifacts/contexts
 run/manage workers
 run functions on workers
Mist Master - Http ApiMist Master - Http Api
/v2/api/functions
/v2/api/artifacts
/v2/api/contexts
/v2/api/jobs
WORD COUNTWORD COUNT
Vanilla SparkVanilla Spark
object WordCount {
def main(arg: Array[String]): Unit = {
val conf = new SparkConf().setAppName(...)
val sc = new SparkContext(conf)
val textFile = sc.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
}
}
MistFnMistFn
object WordCount extends MistFn {
override def handle: Handle = {
arg[String]("path").onSparkContext((path: String, sc: SparkContext) => {
val counts = sc.textFile(path)
.flatMap(_.split(" "))
.map(w => w -> 1)
.reduceByKey(_ + _ )
// counts.saveAsTextFile("hdfs://...")
counts.collect().toMap
}).asHandle
}
}
Vanilla SparkVanilla Spark
./bin/spark-submit 
--class <main-class> 
--master <master-url> 
--deploy-mode <deploy-mode> 
--conf <key>=<value> 
... # other options
<application-jar> 
[application-arguments]
Mist-cli configMist-cli config
model = Context
name = mycontext
data {
spark-conf {
spark.master = <master url>
}
}
model = Artifact
name = wordcount
data.file-path = "./target/scala-2.11/<application-jar>"
model = Function
name = word-count
data {
path = <artifact_jar>
class-name = "HelloMist$"
context = mycontext
}
Mist-cli apply & runMist-cli apply & run
$ mist-cli -f apply conf
$ curl -X POST -d <json-input> 
'http://localhost:2004/v2/api/functions/word-count-example/jobs'
{"id":"00e5294d-77f4-463b-85b6-4b73185eab1a"}
A lot of actions over ssh-session to test, deploy and
run your code
Improssible to run job without direct shell
command
Process more that one job in parallel is difficult
Poor interfaces
SERVERLESS?SERVERLESS?
Parallelism/MultitenancyParallelism/Multitenancy
Usual 1 req ~= thread
Spark 1 req ~= process
ServerlessServerless
ContextContext
Mist Context describes parameters of Mist worker and
Spark context
model = Context
name = cluster_ctx
data {
precreated = false
spark-conf {
spark.master = yarn
spark.submit.deployMode = cluster
spark.executor.instances = 2
spark.executor.cores = 2
spark.executor.memory = 1G
}
}
Mist instancesMist instances
Mist Worker
 receives req -> invokes functions -> returns result
back to master
Mist Master
 queue requests
 sends them on workers
ApplyApply
RunRun
$ curl -X POST -d <json-input> 
'http://localhost:2004/v2/api/functions/word-count-example/jobs'
{"id":"00e5294d-77f4-463b-85b6-4b73185eab1a"}
RunRun
RunRun
ScalingScaling
model = Context
name = cluster_ctx
data {
precreated = false
max-parallel-jobs = 2
...
}
Worker modesWorker modes
Exclusive
 starts new worker for every request
Shared
 reuses worker instance
model = Context
name = cluster_ctx
data {
precreated = false
max-parallel-jobs = 2
worker-mode = "shared"
// or
worker-mode = "exlsusive"
...
}
Multi clusterMulti cluster
Contexts may be configured to work on different
clusters
model = Context
name = cluster_ctx
data {
precreated = false
spark-conf {
spark.master = yarn
}
}
model = Context
name = cluster_ctx
data {
precreated = false
spark-conf {
spark.master = spark://
Multi clusterMulti cluster
A lot of actions over ssh-session to test, deploy and
run your code
Improssible to run job without direct shell
command
Process more that one job in parallel is difficult
Poor interfaces
SPARK FUNCTIONSPARK FUNCTION
Pi EstimationPi Estimation
def main(arg: Array[String]): Unit = {
val count = sc.parallelize(1 to samples).filter { _ =>
val x = math.random
val y = math.random
x*x + y*y < 1
}.count()
println(s"Pi is roughly ${4.0 * count / samples}")
}
Pi EstimationPi Estimation
What signature is better?
def estimatePi1(samples: Int): Unit
// vs
def estimatePi2(samples: Int): Double
Word count againWord count again
def main(arg: Array[String]): Unit = {
val textFile = sc.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
}
Word count againWord count again
What signature is better?
def wordCount(from: String, to: String): Unit
// vs
def wordCount(from: String, to: String): Map[String, Int]
// or
sealed trait OutputFile
case class HdfsFile(path: String) extends OutputFile
case class S3File(path: String) extends OutputFile
def wordCount(from: String, to: String): OutputFile
HOW TO REPRESENT A SPARKHOW TO REPRESENT A SPARK
JOB?JOB?
Spark ApplicationSpark Application
def main(args: Array[String]): Unit
Array[String] => Unit
MistFnMistFn
Context is already described
Json input instead of arguments
Json output instead of Unit
Array[String] => Unit
// vs
(Json, SparkContext) => Json
SparkContext typesSparkContext types
SparkContext
SparkSession
StreamingContext
SQLContext
Array[String] => Unit
// vs
(Json, ?) => Json
GeneralizeGeneralize
// developer
// pi estimation: samples => pi
(Int, SparkContext) => Double
// word count - input path => output path
(String, SparkSession) => String
...
(A, ?) => B
(A, B, ?) => C
SPARK FUNCTIONSPARK FUNCTION
A lot of actions over ssh-session to test, deploy and
run your code
Improssible to run job without direct shell
command
Process more that one job in parallel is difficult
Poor interfaces
Future plansFuture plans
Release 1.0.0
AWS EMR integration - on-demand EMR clusters
...

More Related Content

What's hot

今時なウェブ開発をSmalltalkでやってみる?
今時なウェブ開発をSmalltalkでやってみる?今時なウェブ開発をSmalltalkでやってみる?
今時なウェブ開発をSmalltalkでやってみる?
Sho Yoshida
 
apidays LIVE Australia 2020 - Building distributed systems on the shoulders o...
apidays LIVE Australia 2020 - Building distributed systems on the shoulders o...apidays LIVE Australia 2020 - Building distributed systems on the shoulders o...
apidays LIVE Australia 2020 - Building distributed systems on the shoulders o...
apidays
 
A Modest Introduction To Swift
A Modest Introduction To SwiftA Modest Introduction To Swift
A Modest Introduction To Swift
John Anderson
 
Spark hands-on tutorial (rev. 002)
Spark hands-on tutorial (rev. 002)Spark hands-on tutorial (rev. 002)
Spark hands-on tutorial (rev. 002)
Jean-Georges Perrin
 
My Top 5 APEX JavaScript API's
My Top 5 APEX JavaScript API'sMy Top 5 APEX JavaScript API's
My Top 5 APEX JavaScript API's
Roel Hartman
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden KarauDebugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
Spark Summit
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
Minwoo Kim
 
Reactive Programming - ReactFoo 2020 - Aziz Khambati
Reactive Programming - ReactFoo 2020 - Aziz KhambatiReactive Programming - ReactFoo 2020 - Aziz Khambati
Reactive Programming - ReactFoo 2020 - Aziz Khambati
Aziz Khambati
 
InterConnect: Java, Node.js and Swift - Which, Why and When
InterConnect: Java, Node.js and Swift - Which, Why and WhenInterConnect: Java, Node.js and Swift - Which, Why and When
InterConnect: Java, Node.js and Swift - Which, Why and When
Chris Bailey
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
Rustam Aliyev
 
H2O World - Intro to R, Python, and Flow - Amy Wang
H2O World - Intro to R, Python, and Flow - Amy WangH2O World - Intro to R, Python, and Flow - Amy Wang
H2O World - Intro to R, Python, and Flow - Amy Wang
Sri Ambati
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Sematext Group, Inc.
 
A Lifecycle Of Code Under Test by Robert Fornal
A Lifecycle Of Code Under Test by Robert FornalA Lifecycle Of Code Under Test by Robert Fornal
A Lifecycle Of Code Under Test by Robert Fornal
QA or the Highway
 
Why You Should Use TAPIs
Why You Should Use TAPIsWhy You Should Use TAPIs
Why You Should Use TAPIs
Jeffrey Kemp
 
Apache spark basics
Apache spark basicsApache spark basics
Apache spark basics
sparrowAnalytics.com
 
Embuk internals
Embuk internalsEmbuk internals
Embuk internals
Sadayuki Furuhashi
 
API Performance
API PerformanceAPI Performance
API Performance
Faizal Fikhri Zakaria
 
GDG Almaty Meetup: Reactive full-stack .NET web applications with WebSharper
GDG Almaty Meetup: Reactive full-stack .NET web applications with WebSharperGDG Almaty Meetup: Reactive full-stack .NET web applications with WebSharper
GDG Almaty Meetup: Reactive full-stack .NET web applications with WebSharper
granicz
 
Terraform day02
Terraform day02Terraform day02
Terraform day02
Gourav Varma
 
Server Side Swift
Server Side SwiftServer Side Swift
Server Side Swift
Jens Ravens
 

What's hot (20)

今時なウェブ開発をSmalltalkでやってみる?
今時なウェブ開発をSmalltalkでやってみる?今時なウェブ開発をSmalltalkでやってみる?
今時なウェブ開発をSmalltalkでやってみる?
 
apidays LIVE Australia 2020 - Building distributed systems on the shoulders o...
apidays LIVE Australia 2020 - Building distributed systems on the shoulders o...apidays LIVE Australia 2020 - Building distributed systems on the shoulders o...
apidays LIVE Australia 2020 - Building distributed systems on the shoulders o...
 
A Modest Introduction To Swift
A Modest Introduction To SwiftA Modest Introduction To Swift
A Modest Introduction To Swift
 
Spark hands-on tutorial (rev. 002)
Spark hands-on tutorial (rev. 002)Spark hands-on tutorial (rev. 002)
Spark hands-on tutorial (rev. 002)
 
My Top 5 APEX JavaScript API's
My Top 5 APEX JavaScript API'sMy Top 5 APEX JavaScript API's
My Top 5 APEX JavaScript API's
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden KarauDebugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
Reactive Programming - ReactFoo 2020 - Aziz Khambati
Reactive Programming - ReactFoo 2020 - Aziz KhambatiReactive Programming - ReactFoo 2020 - Aziz Khambati
Reactive Programming - ReactFoo 2020 - Aziz Khambati
 
InterConnect: Java, Node.js and Swift - Which, Why and When
InterConnect: Java, Node.js and Swift - Which, Why and WhenInterConnect: Java, Node.js and Swift - Which, Why and When
InterConnect: Java, Node.js and Swift - Which, Why and When
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
H2O World - Intro to R, Python, and Flow - Amy Wang
H2O World - Intro to R, Python, and Flow - Amy WangH2O World - Intro to R, Python, and Flow - Amy Wang
H2O World - Intro to R, Python, and Flow - Amy Wang
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
 
A Lifecycle Of Code Under Test by Robert Fornal
A Lifecycle Of Code Under Test by Robert FornalA Lifecycle Of Code Under Test by Robert Fornal
A Lifecycle Of Code Under Test by Robert Fornal
 
Why You Should Use TAPIs
Why You Should Use TAPIsWhy You Should Use TAPIs
Why You Should Use TAPIs
 
Apache spark basics
Apache spark basicsApache spark basics
Apache spark basics
 
Embuk internals
Embuk internalsEmbuk internals
Embuk internals
 
API Performance
API PerformanceAPI Performance
API Performance
 
GDG Almaty Meetup: Reactive full-stack .NET web applications with WebSharper
GDG Almaty Meetup: Reactive full-stack .NET web applications with WebSharperGDG Almaty Meetup: Reactive full-stack .NET web applications with WebSharper
GDG Almaty Meetup: Reactive full-stack .NET web applications with WebSharper
 
Terraform day02
Terraform day02Terraform day02
Terraform day02
 
Server Side Swift
Server Side SwiftServer Side Swift
Server Side Swift
 

Similar to Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vadim Chelyshov, Software engineer at Provectus

Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
Stepan Pushkarev
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
Chetan Khatri
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Euangelos Linardos
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
Vienna Data Science Group
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
Wisely chen
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
Lindsay Holmwood
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
jeykottalam
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
Sri Ambati
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
Ned Shawa
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
Chris George
 
Writing robust Node.js applications
Writing robust Node.js applicationsWriting robust Node.js applications
Writing robust Node.js applicationsTom Croucher
 
Building Testable PHP Applications
Building Testable PHP ApplicationsBuilding Testable PHP Applications
Building Testable PHP Applications
chartjes
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
Databricks
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
Russell Jurney
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
Benjamin Bengfort
 
Infrastructure-as-code: bridging the gap between Devs and Ops
Infrastructure-as-code: bridging the gap between Devs and OpsInfrastructure-as-code: bridging the gap between Devs and Ops
Infrastructure-as-code: bridging the gap between Devs and Ops
Mykyta Protsenko
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 

Similar to Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vadim Chelyshov, Software engineer at Provectus (20)

Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 
Writing robust Node.js applications
Writing robust Node.js applicationsWriting robust Node.js applications
Writing robust Node.js applications
 
Building Testable PHP Applications
Building Testable PHP ApplicationsBuilding Testable PHP Applications
Building Testable PHP Applications
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Infrastructure-as-code: bridging the gap between Devs and Ops
Infrastructure-as-code: bridging the gap between Devs and OpsInfrastructure-as-code: bridging the gap between Devs and Ops
Infrastructure-as-code: bridging the gap between Devs and Ops
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 

More from Provectus

Choosing the right IDP Solution
Choosing the right IDP SolutionChoosing the right IDP Solution
Choosing the right IDP Solution
Provectus
 
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Provectus
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare Organizations
Provectus
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and Beyond
Provectus
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
Provectus
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Provectus
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
Provectus
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
Provectus
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
Provectus
 
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky..."Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
Provectus
 
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2..."Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
Provectus
 
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma..."Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
Provectus
 
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ..."Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
Provectus
 
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
Provectus
 
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
Provectus
 
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti..."Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
Provectus
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
Provectus
 
How to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMHow to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAM
Provectus
 

More from Provectus (20)

Choosing the right IDP Solution
Choosing the right IDP SolutionChoosing the right IDP Solution
Choosing the right IDP Solution
 
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare Organizations
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and Beyond
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
 
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky..."Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
 
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2..."Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
 
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma..."Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
 
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ..."Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
 
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
 
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
 
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti..."Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
 
How to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMHow to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAM
 

Recently uploaded

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vadim Chelyshov, Software engineer at Provectus

  • 4. WHAT PROBLEM DOES IT SOLVE?WHAT PROBLEM DOES IT SOLVE?
  • 5. Getting Big Data Projects to Production Is aGetting Big Data Projects to Production Is a ChallengeChallenge Only 15 percent of businesses reported deploying their big data project to production, effectively unchanged from last year (14 percent). https://www.gartner.com/newsroom/id/3466117
  • 6. Data engineers vs. data scientistsData engineers vs. data scientists A common starting point is 2-3 data engineers for every data scientist. For some organizations with more complex data engineering requirements, this can be 4-5 data engineers per data scientist https://www.oreilly.com/ideas/data-engineers-vs- data-scientists
  • 7. The lack of tooling?The lack of tooling?
  • 8.
  • 9. The lack of right tooling?The lack of right tooling?
  • 10. Getting Spark Projects to ProductionGetting Spark Projects to Production How to provide a way to run our job? return a result? report an error? run multiply jobs in parallel?
  • 11. LET'S COMPARE WITH USUALLET'S COMPARE WITH USUAL THINGSTHINGS
  • 12. DatabasesDatabases Postgresql - 1995 CREATE OR REPLACE FUNCTION foo(param VARCHAR) RETURNS TABLE ...
  • 13. Web applicationsWeb applications Java servlet - 1997 public class NewServlet extends HttpServlet { @Override protected void doGet( HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { ... } }
  • 14. Big data processingBig data processing Hadoop - 2005 Apache Spark - 2011 object MyApp { def main(arg: String[]): Unit = { val conf = new SparkConf().setAppName(...) val sc = new SparkContext(conf) ... } } ./bin/spark-submit ...
  • 15. RunRun Databases SELECT * FROM foo(param) Web applications curl Spark Shell + Trigger
  • 16. Return resultReturn result Databases built-in protocol Web applications built-in protocol Spark Read from fs/storages
  • 17. Result errorResult error Databases explicit error information Web applications HTTP CODE + explicit error infomation Spark exit code
  • 18. ParallelismParallelism Databases by default Web applications by default Spark Good luck!
  • 19. What a hell is going on! These thing aren't about big data!
  • 20. It isn't normal development experinceIt isn't normal development experince A lot of actions over ssh-session to test, deploy and run your code Improssible to run job without direct shell command Process more that one job in parallel is difficult Poor interfaces
  • 21. Just a special service!Just a special service! Artifacts/settings CRUD API for launching jobs and receiving their status Spark driver launcher A library do describe operations using Spark
  • 23. What is Mist?What is Mist? Serverless proxy to Apache Spark
  • 24. What is Mist?What is Mist? Serverless proxy      ? Apache Spark      ✔
  • 25. What is Mist?What is Mist? It's a combinations of Programming framework HTTP/Async interface to run deploy and run spark programms Spark contexts / driver manager
  • 26. ConceptsConcepts Function  user code with Spark program to be deployed on Mist Artifact  file (.jar or .py) that contains a Function Context  settings for Mist Worker where Function is being executed Job  a result of the Function execution
  • 27. Mist instancesMist instances Mist Worker  Spark driver application which invokes functions Mist Master  exposes http/async api  stores functions/artifacts/contexts  run/manage workers  run functions on workers
  • 28. Mist Master - Http ApiMist Master - Http Api /v2/api/functions /v2/api/artifacts /v2/api/contexts /v2/api/jobs
  • 30. Vanilla SparkVanilla Spark object WordCount { def main(arg: Array[String]): Unit = { val conf = new SparkConf().setAppName(...) val sc = new SparkContext(conf) val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") } }
  • 31. MistFnMistFn object WordCount extends MistFn { override def handle: Handle = { arg[String]("path").onSparkContext((path: String, sc: SparkContext) => { val counts = sc.textFile(path) .flatMap(_.split(" ")) .map(w => w -> 1) .reduceByKey(_ + _ ) // counts.saveAsTextFile("hdfs://...") counts.collect().toMap }).asHandle } }
  • 32. Vanilla SparkVanilla Spark ./bin/spark-submit --class <main-class> --master <master-url> --deploy-mode <deploy-mode> --conf <key>=<value> ... # other options <application-jar> [application-arguments]
  • 33. Mist-cli configMist-cli config model = Context name = mycontext data { spark-conf { spark.master = <master url> } } model = Artifact name = wordcount data.file-path = "./target/scala-2.11/<application-jar>" model = Function name = word-count data { path = <artifact_jar> class-name = "HelloMist$" context = mycontext }
  • 34. Mist-cli apply & runMist-cli apply & run $ mist-cli -f apply conf $ curl -X POST -d <json-input> 'http://localhost:2004/v2/api/functions/word-count-example/jobs' {"id":"00e5294d-77f4-463b-85b6-4b73185eab1a"}
  • 35. A lot of actions over ssh-session to test, deploy and run your code Improssible to run job without direct shell command Process more that one job in parallel is difficult Poor interfaces
  • 39. ContextContext Mist Context describes parameters of Mist worker and Spark context model = Context name = cluster_ctx data { precreated = false spark-conf { spark.master = yarn spark.submit.deployMode = cluster spark.executor.instances = 2 spark.executor.cores = 2 spark.executor.memory = 1G } }
  • 40. Mist instancesMist instances Mist Worker  receives req -> invokes functions -> returns result back to master Mist Master  queue requests  sends them on workers
  • 42. RunRun $ curl -X POST -d <json-input> 'http://localhost:2004/v2/api/functions/word-count-example/jobs' {"id":"00e5294d-77f4-463b-85b6-4b73185eab1a"}
  • 45. ScalingScaling model = Context name = cluster_ctx data { precreated = false max-parallel-jobs = 2 ... }
  • 46.
  • 47. Worker modesWorker modes Exclusive  starts new worker for every request Shared  reuses worker instance model = Context name = cluster_ctx data { precreated = false max-parallel-jobs = 2 worker-mode = "shared" // or worker-mode = "exlsusive" ... }
  • 48. Multi clusterMulti cluster Contexts may be configured to work on different clusters model = Context name = cluster_ctx data { precreated = false spark-conf { spark.master = yarn } } model = Context name = cluster_ctx data { precreated = false spark-conf { spark.master = spark://
  • 50. A lot of actions over ssh-session to test, deploy and run your code Improssible to run job without direct shell command Process more that one job in parallel is difficult Poor interfaces
  • 52. Pi EstimationPi Estimation def main(arg: Array[String]): Unit = { val count = sc.parallelize(1 to samples).filter { _ => val x = math.random val y = math.random x*x + y*y < 1 }.count() println(s"Pi is roughly ${4.0 * count / samples}") }
  • 53. Pi EstimationPi Estimation What signature is better? def estimatePi1(samples: Int): Unit // vs def estimatePi2(samples: Int): Double
  • 54. Word count againWord count again def main(arg: Array[String]): Unit = { val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") }
  • 55. Word count againWord count again What signature is better? def wordCount(from: String, to: String): Unit // vs def wordCount(from: String, to: String): Map[String, Int] // or sealed trait OutputFile case class HdfsFile(path: String) extends OutputFile case class S3File(path: String) extends OutputFile def wordCount(from: String, to: String): OutputFile
  • 56. HOW TO REPRESENT A SPARKHOW TO REPRESENT A SPARK JOB?JOB?
  • 57. Spark ApplicationSpark Application def main(args: Array[String]): Unit Array[String] => Unit
  • 58. MistFnMistFn Context is already described Json input instead of arguments Json output instead of Unit Array[String] => Unit // vs (Json, SparkContext) => Json
  • 60. GeneralizeGeneralize // developer // pi estimation: samples => pi (Int, SparkContext) => Double // word count - input path => output path (String, SparkSession) => String ... (A, ?) => B (A, B, ?) => C
  • 62. A lot of actions over ssh-session to test, deploy and run your code Improssible to run job without direct shell command Process more that one job in parallel is difficult Poor interfaces
  • 63. Future plansFuture plans Release 1.0.0 AWS EMR integration - on-demand EMR clusters ...