Блохин Леонид - "Mist, как часть Hydrosphere"

Mist
https://github.com/Hydrospheredata/mist
www.provectus.com
© Provectus, Inc.
1
• Леонид Блохин
• Big Data Engineer
• lblokhin@provectus.com
• +7 (917) 295 - 40 - 49

Mist
• HydroSphere
• Spark
• Why We Needed a Mist
• Running
• Configuration
• Spark Job at Mist
• Road Map
www.provectus.com
2

Mist
www.provectus.com
3
http://hydrosphere.io/
Hydrosphere – Opensource Big Data and Analytics platform
with DevOps culture in mind.

Mist
www.provectus.com
4
http://hydrosphere.io/

Mist
www.provectus.com
5
http://spark.apache.org/
Apache Spark™ is a fast and general engine for
large-scale data processing.

Mist
• Mist is a thin service on top of Spark which makes it possible to execute Scala & Python Spark Jobs
from application layers and get synchronous, asynchronous, and reactive results as well as provide
an API to external clients.
• It implements Spark as a Service and creates a unified API layer for building enterprise solutions
and services on top of a Big Data lake.
www.provectus.com
7

Mist
● HTTP and Messaging (MQTT) API
● Scala & Python Spark job execution
● Works with Standalone, Mesos, Yarn any Spark config
● Support for Spark SQL and Hive
● High Availability and Fault Tolerance
● Persist job state for self healing
● Async and sync API, JSON job results
www.provectus.com
8
Why We Needed a Mist

Mist
Build the project
git clone https://github.com/hydrospheredata/mist.git
cd mist
./sbt/sbt -DsparkVersion=1.5.2 assembly
Create configuration file
Run
spark-submit --class io.hydrosphere.mist.Mist
--driver-java-options "-Dconfig.file=/path/to/application.conf"
target/scala-2.10/mist-assembly-0.2.0.jar
www.provectus.com
9
Running

Mist
www.provectus.com
Configuration
10
# spark master url can be either of three: local, yarn, mesos (local by default)
mist.spark.master = "local[*]"
# number of threads: one thread for one job
mist.settings.threadNumber = 16
# http interface (off by default)
mist.http.on = true
mist.http.host = "192.168.10.13"
mist.http.port = 2003

Mist
www.provectus.com
Configuration
11
# MQTT interface (off by default)
mist.mqtt.on = true
mist.mqtt.host = "192.168.10.33"
mist.mqtt.port = 1883
# mist listens this topic for incoming
requests
mist.mqtt.subscribeTopic = "foo"
# mist answers in this topic with the results
mist.mqtt.publishTopic = "foo"

Mist
www.provectus.com
Configuration
12
# recovery job (off by default)
mist.recovery.on = true
mist.recovery.multilimit = 10
mist.recovery.typedb = "MapDb"
mist.recovery.dbfilename = "file.db"

Mist
www.provectus.com
Configuration
13
# default settings for all contexts
# timeout for each job in context
mist.contextDefaults.timeout = 100 days
# mist can kill context after job finished (off by default)
mist.contextDefaults.disposable = false
# settings for SparkConf
mist.contextDefaults.sparkConf = {
spark.default.parallelism = 128
spark.driver.memory = "10g"
spark.scheduler.mode = "FAIR"
}

Mist
www.provectus.com
Configuration
14
# settings can be overridden for each context
mist.contexts.foo.timeout = 100 days
mist.contexts.foo.sparkConf = {
spark.scheduler.mode = "FIFO"
}
mist.contexts.bar.timeout = 1000 second
mist.contexts.bar.disposable = true
# mist can create context on start, so we don't waste time on first request
mist.contextSettings.onstart = ["foo"]

Mist
Spark Job at Mist
Mist Scala Spark Job
In order to prepare your job to run on Mist you should extend scala object from MistJob and implement abstract method
doStuff :
def doStuff(context: SparkContext, parameters: Map[String, Any]): Map[String, Any] = ???
def doStuff(context: SQLContext, parameters: Map[String, Any]): Map[String, Any] = ???
def doStuff(context: HiveContext, parameters: Map[String, Any]): Map[String, Any] = ???
www.provectus.com
15

Mist
Spark Job at Mist
Example:
object SimpleContext extends MistJob {
override def doStuff(context: SparkContext, parameters: Map[String, Any]): Map[String, Any] = {
val numbers: List[BigInt] = parameters("digits").asInstanceOf[List[BigInt]]
val rdd = context.parallelize(numbers)
Map("result" -> rdd.map(x => x * 2).collect())
}
}
Building Mist jobs
Add Mist as dependency in your build.sbt:
libraryDependencies += "io.hydrosphere" % "mist" % "0.2.0"
www.provectus.com
16

Mist
Spark Job at Mist
Mist Python Spark Job
Import mist and implement method doStuff.
The following are Spark Contexts aliases to be used for convenience:
job.sc = SparkContext
job.sqlc = SQL Context
job.hc = Hive Context
www.provectus.com
17

Mist
Spark Job at Mist
for examplimport mist
class MyJob:
def __init__(self, job):
job.sendResult(self.doStuff(job))
def doStuff(self, job):
val = job.parameters.values()
list = val.head()
pylist = []
count = 0
while count < list.size():
pylist.append(list.head())
count = count + 1
list = list.tail()
rdd = job.sc.parallelize(pylist)
result = rdd.map(lambda s: 2 * s).collect()
return result
if __name__ == "__main__":
job = MyJob(mist.Job())
www.provectus.com
18

Mist
www.provectus.com
19
mosquitto_pub -h 192.168.10.33 -p 1883 -m
'{
"jarPath":"/vagrant/examples/target/scala-2.11/mist_examples_2.11-0.0.1.jar",
"className":"SimpleContext$",
"parameters":{"digits":[1,2,3,4,5,6,7,8,9,0]},
"external_id":"12345678",
"name":"foo"
}' -t 'foo'

Mist
www.provectus.com
21
{"success":true,"payload":
{"result":[2,4,6,8,10,12,14,16,18,0]},
"errors":[],
"request":{"jarPath":"src/test/resources/mistjob_2.10-1.0.jar","className":"
SimpleContext$","name":"foo","parameters":{"digits":[1,2,3,4,5,6,7,8,9,0]},"
external_id":"12345678"}
}

Mist
www.provectus.com
22
● Super parallel mode Support multi JVM
● Cluster mode and node framework
● Add logging
● Restification
● Support streaming contexts/jobs
● Apache Kafka support
● AMQP support
● Web UI
Your contributions are very welcome on Github!
https://github.com/Hydrospheredata/mist
Road Map

Thanks!
Questions?
www.provectus.com
23
Леонид Блохин
Skype: leonid_niko
Email: lblokhin@provectus.com
www.provectus.com

Блохин Леонид - "Mist, как часть Hydrosphere"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Блохин Леонид - "Mist, как часть Hydrosphere"

Similar to Блохин Леонид - "Mist, как часть Hydrosphere" (20)

More from Provectus

More from Provectus (20)

Recently uploaded

Recently uploaded (20)

Блохин Леонид - "Mist, как часть Hydrosphere"