Big	Data	friends
Scala	and	FP
a.k.a.	Noootsab
Proud	husband	and	father
	po	l'bonLidjeu
Have	to	wear	glasses	since	Maths	graduation	in	'03
Learn	to	dress	well	since	CS	graduation	in	'05
Lost	myself	since	expertize	in	Geomatic	and	GIS
Risking	myself	in	 	(GIS,	Big	Data	and	Scala)
Public	interest	work:	co-founded	
Helper	and	organizer	of	
Scala	
NextLab
Wajug
Devoxx4Kids
trainer
WHY	I	mean	it
and	others	do...
Scala	has	a	reputation	to	be	accessible
It	eases	the	maths	(mostly	[matrix]		algebra)
The	CS	world	is	changing	(fast)
It	shifts	from	the	cloud	to	analysis
That	is,	from	IT	needs	to	Market
opportunities
Reused
knowledgeFact:	syntax	close	to	Java,	C#,	Ruby,	...
Cause:	Object	Oriented
case	class	Person(		name:String,	
																				first:String,	
																				age:Double,	
																				gender:Gender,	
																				father:Option[Person],	
																				mother:Option[Person],	
																				children:List[Child]=Nil
)	{
		def	incAge(n:Int):Person	=	copy(age	=	age+n)
		def	newSon(child:Person):(Person,	Person)	=	{
				val	newChild	=	this.gender	match	{
								case	Male	=>	child.copy(father	=	Some(this))
								case	Female	=>	child.copy(mother	=	Some(this))
				}
				(newChild,	this.copy(children	=	newChild	::	children)
		}
}
				val	_Noah	=	Person("Petrella",	"Noah",	
																							age=4,	Male,
																							mother=Some(Sandrine)
																							father=None)
				val	boringNoootsab	=	Person("Petrella",	"Andy",	
																																32,	Male,	
																																father=Some(Arcangelo),	
																																mother=Some(Nadine))
				val	(Noah,	happyNoootsab)	=	boringNoootsab.newSon(_Noah)
Following	the	wave
Fact:	Functional	Programming	ftw
Cause:	Scalable	Language
Please,	bear	with	me...
WHO
alot
mainly	data
fans
Coursera
10⁶	online	students
PHP	→	Scala
Concurrency	primitives
Play
Type	safety
Ecosystem
Twitter
REPL
case	classes
productivity	gains
concise	code
Scala	school
Tens	of	open	source	libs
Netflix
	
Billion	devices
Historical	events
Real-time	analytics
Proper	API	(Option)
Async	(Try)
Scalatra	+	ScalaTest
And	more
AirBnB
Snips	(smart	cities,	...)
Tuplejump	(analytic	platform)
eBay	(analytics)
BBC	(Future	Media	project)
Virdata	(IoT	analytic	platform)
Ooyala	(video	analytic	platform)
LinkedIn
Functional	Programming
in	a	nutshell
INPUT	x
FUNCTION	f:
OUTPUT	f(x)
source	wikipedia:	http://en.wikipedia.org/wiki/Function_(mathematics)
Input	x
can	be	a	function...
Defines	a	general	process
that	could	behave	differently
listOfNames	map	{	name	=>	DB.getByName(name)	}
listOfPersons	flatMap	{	person	=>	person.friends	}
listOfFriends	filter	{	(f:Friend)	=>	f.met	moreThan	(10	years)	}
listOfOldFriends.count(_.person.gender	!=	me.gender)
Output	x
bah...	can	be	a	function	as	well...
Prepares	a	process
that	will	be	available	for	later	usage
def	authentication(manager:SecurityManager):	User=>Authentication
def	source(url:String):	Authentication=>DataRepo=>Data
//[...]
val	authenticate	=	authentication(FakeSecurityManager)
val	settings	=	source("/settings")
def	request	=	{
		val	user	=	//...
		val	auth	=	authenticate(user)
		val	settingsFetcher	=	settings(auth)
		//	and	so	on
}
Show	me
	def	lm(x:List[Double],	y:List[Double]):((Double,	Double),	Double=>Double)	=	{
		val	n	=	x.size
		val	ẍ	=	x.sum.toDouble	/	n
		val	ÿ	=	y.sum.toDouble	/	n
		val	Sp	=	((x	·-	ẍ)	·*	(y	·-	ÿ)	sum)	/	(n-1)
		val	Sx2	=	((x	·-	ẍ)	·^	2	sum)	/	(n-1)
		val	ß1	=	Sp	/	Sx2
		val	ß0	=	ÿ	-	ß1	*	ẍ
		val	coefs	=	(ß0,	ß1)
		val	predict	=	(d:Double)	=>	ß0	+	ß1	*	d
		(coefs,	predict)
}
def	test(ß0:Double	=	18.1d,	ß1:Double	=	6d,	error:Int=>List[Double])	=	{
		val	n	=	10000
		val	x:List[Double]	=	-n.toDouble	to	n	by	1	toList
		val	e	=	error(2*n+1)
		val	y:List[Double]	=	ß0	·+:	(ß1	·*:	x)	·+:	e
		lm(x,	y)
}
val	error	=	rnorm(mean=0,	sigma=5)	//	gen	gaussian	nbs	
val	model	=	test(103,	7,	error)
on
github
Lazy
yeah	yeah...	I'll	do	it
lazy	val	app:App	=	initializeApp()
def	logDebug(m:	=>	String)=	if	(LOG.debugEnabled)	LOG.error(m)	else	()
Avoid	computations
Delayed	initialization
Sooo	laaazy
Come	back...	in	a	potential	future
TL;	DW
val	app:Future[App]	=	initializeApp()
val	http:Future[HttpClient]	=	app.map(	_.http.client	)
def	isOk(url:String):Future[Boolean]	=	
				http.flatMap(client	=>	client.get(url)	)
								.map(	_.code	)
								.filter(	_	==	200	)
								.recoverWith	{
												case	x:CommunicationException	=>	isOk(url)
								}.recover	{
												case	e:	Throwable	=>	false
								}
Code...
now(I	promised)
class	LazyCons[+A](a:A,	t:	=>	Lazy[A])	extends	Lazy[A]	{
		val	head	=	Some(a)
		lazy	val	tail	=	t
}
def	fetch(file:String):Lazy[Future[String]]	=	{
		val	texts	=	io.Source.fromFile(new	java.io.File(file)).getLines
		def	readLine(texts:Iterator[String]):Lazy[Future[String]]	=	//...
		readLine(texts)
}
for	the	fun	→	
val	fibs:Stream[Int]	=	0	#::	1	#::	((fibs	zip	fibs.drop(1))	map		((_:Int)	+	(_:Int)).tupled)
on
github
Mashup
A	function	could	either	
→	be	called	on	data	(method,	sync)
→	be	sent	to	the	data	(message,	async)
A	function	composes
A	function	is	a	delayed	computation
...
...
Spark
...
...
What	if	I	compose	all	the
computations
Then	I	send	the	whole	shebang	to	where	the	data
are?
Map/Reduce	:	degenerated	case
Spark	:	generalized	case	(see	next
talks)
.↓.↓.
Funky	code
trait	Data	{
		def	dependent:List[Double]
		def	observed:Matrix
		def	bootstrap(proportion:Double):Future[Data]
}
trait	Model	{
		type	Coefs
		def	apply(data:Data):Future[(Coefs,	List[Double]=>Future[Double])]
}
def	bagging(model:Model)(agg:Aggregation[model.Coefs],	n:Int)(data:Data):Future[model.Coef
		def	exec:Future[model.Coefs]	=		for	{
																																				sample					<-	data.bootstrap(0.6)
																																				(coefs,	_)	<-	model(sample)
																																		}	yield	coefs
		val	execs:List[Future[model.Coefs]]	=	List.fill(n)(exec)
		val	coefsList:Future[List[model.Coefs]]	=	Future.sequence(execs)
		val	result:Future[model.Coefs]	=	coefsList	map	agg
		result
}
on
github
Enough!
Thanks	^_^
Poke	me:
→	for	Scala	training
→	for	fun	with	Data
→	with	Books	ideas

Scala and-fp-in-big-data