Apache Spark MLlib
● What is Apache Spark ?
● What is MLlib ?
Spark – What is it ?
● Alternative to Map Reduce for certain applications
● A low latency cluster computing system
● For very large data sets
● May be 100 times faster than Map Reduce
● Used with Hadoop / HDFS
● Uses in memory cluster computing
● Memory access faster than disk access
● Has API's written in Scala / Java / Python
Spark MLlib – What is it ?
● Spark Machine Learning Library
● Provided with Spark Install
● Code in Scala / Java / Python
● Contain libraries
– Spark.ml ( V1.2 )
● Provides common functionality
– classification, regression, clustering
– collaborative filtering, dimensionality reduction
● See our Hadoop book from Apress / Springer
– “Big Data Made Easy”
● Look out for our Apache Spark based book
– from Packt in 2015
Spark Eco system
● Feel free to contact us at
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems