Chris	Riddell	
10	things	your	boss	wants	
you to	know	about	Cloud	
Computing	and	Big	Data
• The	Cloud:
o SaaS:	Software	as	a	service
o PaaS:	Platform	as	a	service
o IaaS:	Infrastructure	as	a	service
• Why	the	cloud?
o Elastic – it	scales
o Massive	capacity
o Pay	by	the	hour,	pay	by	the	minute	etc – pay	only	for	what	you	use
o No	datacenter	required!
• http://www.hyphenet.com/blog/wp-
content/uploads/2014/04/layers-of-cloud-
computing.png
1. What the cloud is
• Unstained	the	technology	your	using.	Read	the	
documentation!
• Couple	everything	loosely
o If	one	element	of	your	system	fails,	the	system	shouldn’t	fail
o Read	about	Netflix	Chaos	Monkey
• OK,	How	do	we	loosely	couple?	Answer:	
Separate	concerns
o Work	catalogs	
o Message	queues
o DBs	that	have	read	replicas,	NOT	hosted	on	the	same	server	
as	the	web	app
2. How to design for the cloud:
Horizontal scaling
• Scale	OUT	not	up
o Add	more	nodes	instead	of	increasing	the	size	of	your	nodes
o You	can	automate	that!	(See	AWS	Auto	Scaling)
• Put	security	everywhere
• Know	your	storage	options
o Are	flat	files	best?	A	NoSQL DB?	An	SQL	DB?
• Design	for	failure
o Have	a	backup	plan	ready
2. How to design for the cloud:
Horizontal scaling (cont)
• Why	Big	Data?
o 40	Zettabytes of	data	will	be	created	by	2020
§ 2.3	Trillion	gigabytes	per	day
o 18.9	Billion	network	connections	by	2016
o 400	Million	tweets	per	day
o 4.4	Million	jobs	generated	to	support	big	data	by	2015!
• IBM’s	4	V’s	of	Big	Data
o Volume
o Variety
o Velocity
o Veracity
o http://api.ning.com/files/tRHkwQN7s-
V9zyWeGmW9pYmXjhhHYlanslQxjZT53dE40q*P5F5tBhOzSnqCMX
hql1ARnq0dAQAp4VR9lOy2ik8tDQqI3FHh/dv2.jpg
o http://www-01.ibm.com/software/data/bigdata/images/4-Vs-of-
big-data.jpg
3. What Big Data is
• Before	cloud	computing	existed,	you	had	to	
have	access	to	a	Datacenter	to	analyze	it
• Pro	Tip:	Collect	and	store	all	the	data	points	you	
can	in	your	apps!
o Pro	Tip	Advanced:	Set	up	a	centralized	logging.	This	is	great	
for	error	logging	and	alerts,	and	also	to	log	and	keep	data	
that	might	be	useful	in	the	future.
3. What Big Data is (cont)
• SQL:
o Schema
o Relational
o Complex	SQL	functions	and	queries
o Fully	Indexed	(query	on	any	column)
• NoSQL:
o Schema-less
o Non-relational
o Can	usually	only	query	on	one	column
o Normally	1	column	indexed	only
o Usually	need	a	MapReducejob	to	process	(next	slide)
o Joins	are	expensive
• http://smartdatacollective.com/sites/smartdatacoll
ective.com/files/RDBMSvsNoSQL.jpeg
4. SQL vs NoSQL
• Big	Data	is	often	stored	in	a	NoSQL database,	or	
may	be	completely	unstructured	(eg log	files)
• MapReduce!
o 2	operations:	Map	and	reduce
o Usually	a	multi-node	service
o http://bigdatanerd.files.wordpress.com/2011/11/mapreduce
.gif
• Hive
o You	can	use	SQL	(ish)	on	NoSQL datasetsets!
5. How to process Big Data
• AWS	Redshift
o A	high	availability,	relational	data	warehouse	DOES	HAVE	
SCHEMA!
o New	technology!	A	relational	data	warehouse	that	operates	on	a	
cluster	of	nodes
o Cost	is	a	fraction	of	traditional	solutions
o Integrates	with	pre-built	BI	engines	(Tableau,	Jasper,	
MicroStrategy)
• AWS	SimpleDB
o A	“Simplified”	SQL	that	scales	very	far
o Still	indexed
o Only	has	Strings!
o SQL	queries	are	limited	to	the	very	basics
6. I lied. There are solutions
“between” NoSQL and SQL
• So	we	have	cut	up	out	data	into	something	more	
interesting	and	consumable.	
o Visualize	it!	Explore!	Really	understand	your	customers	
problem
o Fun	exercise:	Check	out	tableau	- Seriously!	
http://www.tableausoftware.com/beginners-data-
visualization
• So	many	tools	exist	to	do	this	for	web	streams	
already
o Google	analytics,	etc
• Many	start	ups	have	their	own	custom-built	KPI	
dashboards
7. How to make big data useful
• Actually,	it’s	pretty	hard	not	to	these	days
o To	analyze	huge	datasets,	you	could	easily	spend	hundreds	of	
thousands	of	powerful	enough	infrastructure (servers	etc)
o The	then	ongoing	cost	to	manage	the	infrastructure
• Your	data	streams	are	likely	to	come	from	the	
cloud,	so	keep	them	in	there
8. How to combine Big Data and
The Cloud
• Leverage	other	peoples	work
o Don’t	re-invent	the	wheel:	Use	libraries	and	PaaS services
• Understand	what	the	end-user	needs.
o Understand	the	problem.	Feel	the	pain.	Confirm	the	solution.
o Example:	Does	it	need	to	be	real	time?
§ Well	there	is	no	such	thing	as	“real”	time.	So	define	it.
• 1	hour?
• 1	day?
• Get	a	feel	for	what	technology	works	where
o Practice	Practice Practice
• Cloud	computing	is	all	about	agility.	Run	a	test	and	
see	if	it	works!
10. How to make it work. And
fast.

10 things your boss wants you to know about Cloud Computing and Big Data