SlideShare a Scribd company logo
1 of 38
Download to read offline
Twi$er:	@BDaaSmeetup	
Hashtag:	#BDaaS
Our	Sponsor	
Big-Data-as-a-Service.		
On-Prem,	Cloud,	or	Hybrid.	It’s	BDaaS.
BDaaS	Meetup	
•  Welcome	and	IntroducCons	
•  PresentaCon	by	Nanda	Vijaydev	
	—	Distributed	Data	Science,	DevOps,	and	Docker	
•  Q&A	and	Discussion	
Twi$er:	@BDaaSmeetup	
Hashtag:	#BDaaS
Nanda	Vijaydev	
•  Data	scienCst	and	director	of	soluCons	at	BlueData	
•  Prior	to	BlueData,	was	a	soluCons	architect	at	Silicon	Valley	
Data	Science	
•  More	than	10	years	experience	in	data	management	and	data	
science	
•  Has	worked	with	dozens	of	organizaCons	to	deploy	Hadoop,	
Spark,	&	data	science	environments	using	Docker	containers
with
Distributed Data Science
DevOps, and Docker
BDaaS	Meetup	
June	9,	2017	
	
Nanda	Vijaydev	
	
	
				@NandaVijaydev
Outline
•  Evolu>on	of	Data	Science	Opera>ons	
•  Distributed	Data	Science	on	Docker	
•  Challenges	and	Key	Requirements	
•  Demo	
•  Key	Takeaways	
•  Q	&	A
Understand	
Business	Problem	
Acquire/Collect	
Analyze/Model	Reflect/Evaluate	
Deploy/
Disseminate	
A	pre$y	picture	
(ideal	workflow)	
A	not	so	pre$y	picture	
(workflow	in	reality)	
Data Science Tasks and Roles
Data
Engineer / Data
Scientist
Core Data
Scientist
Statisticians / Data
Scientists
Data AnalystData Analyst
Data
Engineer / Data
Scientist
Which	do	you	
prefer	to	use:	
SAS,	R,	or	
Python?	
Source:	“SAS,	R,	or	Python	Survey	2016:	Which	Tool	Do	Analy>cs	Pros	Prefer?”,	Burtch	Works,	July	2016		
Preferred Language of Analytics Pros
Evolution of Data Science Operations
Sampling	
Modeling	&	
Tuning	
Reports	
(e.g.	credit	
card	offer)	
TradiConal	Data	Science	&	AnalyCcs	
Distributed	
Systems	
Acquire	
Data	
Model	
Tune	
Deploy	
Distributed	Data	Science	&	AnalyCcs
What Often Happens …
Faulty	AssumpCons	
•  IT	team	thinks	they	understand	
requirements	and	use	cases	
•  Assumes	infrastructure	&	systems	
will	work	for	most	use	cases	
•  Assumes	all	data	scien>sts	will	use	
similar	toolsets	
•  Build	the	infrastructure	first,	then	
onboard	the	data	scien>sts	…
A More Realistic Journey
	
	
Onboarding	data	scien>sts	
Con>nuous	infrastructure	provisioning	
Base-R,	SQL,	
Python,	Java	
Established	use	
cases	
Need	to	analyze	
higher	data	
volumes	
SparkR,	PySpark,	
spark-sql,	H2O,	
Zeppelin		
Numpy,	
Scipy,	NLTK,	
JupyterHub,	
with	Spark	
AddiConal	
modules	for	
Python	users	
R	user	base	is	
adopCng	more	
Big	Data	
R-Studio,	
Shiny	Server	
with	Spark	+	
H2O	
Use	cases,	requirements,	&	tools	
will	con>nue	to	evolve	over	>me
DevOps for Data Science Operations
Source:	Rob	Nendorf,	Allstate,	“DevOps	for	Data	Science”
What Data Science Teams Need:
•  Access	to	data	with	full	fidelity	
•  Ability	to	quickly	iterate	&	validate	findings	
•  Access	to	necessary	tools	and	models	
•  Ability	to	scale	environments	on-demand	
•  Ability	to	share	models	and	code	
•  Ability	to	deploy	and	integrate	the	solu>on
Data Science – Usage Scenarios
1.  End-to-end	analysis	on	local	laptops	/	
worksta>ons	using	RStudio,	Jupyter		
2.  Preprocess	on	Hadoop/Spark,	download	
and	analyze	locally	using	RStudio/Jupyter	
3.  Preprocess	and	analyze	on	Hadoop/Spark
Single	node	laptops	/	worksta>ons:	
•  Using	single	node	instances	with	more	resources	
•  Projects	like	ff,	bigmemory	for	R	
	
Distributed	processing:	
•  SparkR/sparklyr	with	RStudio	and	Spark	cluster	
•  Jupyter/Zeppelin	notebook	with	PySpark	and	Spark	cluster	
•  Hadoop	clusters	
•  Sandbox	that	can	be	scaled	on	demand	
Scaling Options for Data Science
Accessing Aggregate Data from HDFS
•  Preprocessed	or	par>>oned	data	can	be	stored	in	HDFS	
•  Can	be	accessed	directly	from	RStudio/Jupyter	using	RHadoop	client	for	
aggrega>on/modeling
Distributed Data Science
on Docker
with
R Environment with RStudio Server
•  Install	local	Spark		if	
not	already	available	
•  Connect	to	Spark	
cluster	
•  Set	appropriate	
Spark	configura>ons	
for	op>mal	
performance	
Spark with sparklyr from RStudio
Python Environment with Jupyter
•  Users	work	in	their	
familiar	notebooks	
	
•  BlueData	provisions	
mul>-node	Spark	
clusters		
	
PySpark with Anaconda from Jupyter
Environments: Scale Up vs. Scale Out
R	Packages,	
Python,	SQL	
UI	/	
Notebooks	
Scale	Up	
Frameworks	
Compute	
Data	
Local	Compute	
(Laptops)	
	
Spark	(SQL,	Scala,	Python,	Java,	MLlib)	+	H2O	
	
	
Spark	(SQL,	SparkR,	
Scala,	Java,	MLlib)	+	
H2O	
	
						Scale	Out	
RStudio	+	R	+	Spark	
+	sparklyr	
Jupyter	+	Python	+	
Spark	
Zeppelin	+	R	+	
Python	+	Spark	
Spark		
	
R	
Spark		
	
R	
Spark		
	
R	
Spark		
	
R	
Spark		
	
	
Spark		
	
	
Spark		
	
	
Spark		
	
	
Spark		
	
	
Spark		
	
	
Spark		
	
	
Spark
Scalable Data Science: Challenges
•  How	do	you	keep	up	with	the	constant	evoluCon	of	new	
versions	and	tools?	
–  The	data	science	ecosystem	is	evolving	very	quickly	(e.g.	rapid	pace	of	new	Spark	versions)	
–  Related	tools	(e.g.	RStudio,	Jupyter,	Zeppelin)	have	to	keep	pace	to	support	new	features	
–  New	versions	of	Spark	and	other	tools	require	different	versions	of	libraries	and	packages	
•  One	monolothic	cluster	won’t	cut	it	…	how	do	you	support	
the	variaCons?	
–  Different	use	cases	&	users	need	different	op>ons,	versions,		packages	
–  Workloads	change	…	adding	new	packages	or	scaling	clusters	up	and	down	is	cumbersome
Scalable Data Science: Challenges
•  How	do	you	make	it	easy	for	your	data	scienCsts	to	get	what	
they	need?	
–  Data	scien>sts	are	comfortable	with	their	desktop	tools,	not	distributed	compu>ng	
–  They	need	on-demand	environments	with	instant	access	to	their	preferred	tools	and	data	
•  How	do	you	manage	user	access	for	IDEs	/	notebooks	and	
data	sources?		
–  Given	the	different	layers	of	the	stack,	this	can	be	complex	and	challenging	for	enterprises	
•  And	more	…	repeatability,	elasCcity,	scalability,	security,	
performance	...
IOBoost™	-	Extreme	performance	and	scalability	
Elas>cPlane™	-	Self-service,	mul>-tenant	clusters	
DataTap™	-	In-place	access	to	data	on-prem	or	in	the	cloud	
Blue	Data	EPIC™	Soaware	Plaborm	
Data	Scien>sts	 Developers	 Data	Engineers	
	
Data	Analysts	
	
BI/Analy>cs	Tools	
NFS	 HDFS	
Platform for Scalable Data Science
Compute	
Storage	
On-Premises	 Public	Cloud	
EC2	
S3	
Bring-Your-Own
Multi-Tenant Environments
Distributed	clusters	with	
Jupyter	&	Zeppelin		
notebooks	
Links	to	available	services	
and	notebooks
Pre-Integrated Docker-Based Images
DOCKER-BASED	
IMAGES	OF	
YOUR	CHOICE:	
SAME	FOR	ON-
PREM,	AWS,	OR	
ANY	CLOUD
On-Demand Spark + R Environments
Just	a	few	mouse	
clicks	to	a	fully	
configured	
cluster	(e.g.	with	
Spark	+	RStudio	
Server)
Scalable Data Science with R & Python
Deploy	on-demand	Spark	clusters	with	
RStudio	(sparklyr),	Zeppelin,	or	Jupyter
Spark (via Zeppelin Notebook)
Turnkey	Spark	clusters	on	Docker,	
with	Zeppelin,	Jupyter,	and	SparkR	
pre-integrated
Scale to Production (Compute + Data)
Compute:	Add	worker	nodes	
Data:	Point	analy>cs	to	storage	(HDFS,	S3,	NFS)
Distributed Data Science Operations
Data	Scien>sts	
Spark	2.0	+		
Jupyter	
Notebook	
Spark	1.6.1	
+		Zeppelin	
Notebook	
JupyterHub	 RStudio		
BRING	
YOUR	OWN	
TensorFlow	
Hadoop	
(Hive,	M/R)	
Datameer	
Launch	 Launch	 Launch	 Launch	
Launch	Launch	 Launch	 Launch	
Shared	Data,	Code,	and	Results	
Users	&	Security	
Orchestra>on	&	Mgmt	
Data	Analysts	Data	Engineers	
Comprehensive	management	of	secure,	scalable,	&	reproducible	data	science	environments	
ON-PREMISES	 CLOUD
DEMO
Distributed Data Science: Takeaways
•  Opera>onalizing	distributed	data	science	is	hard	work	
–  Unique	requirements	for	access	to	data,	models,	tools,	etc.	
•  Need	to	bring	a	DevOps	approach	to	data	science	opera>ons	
–  Support	for	fast,	itera>ve	prototyping	and	reproducibility	
–  Requires	ul>mate	flexibility	as	tools	evolve	and	new	op>ons	emerge	
•  Leverage	a	turnkey	purpose-built	plalorm	(e.g.	BlueData	EPIC)	
–  Bring	DevOps	agility	to	distributed	data	science,	powered	by	Docker	
–  Provide	ability	to	share	code,	models,	&	data	with	secure	mul>-tenancy	
–  Enable	on-demand	environments	with	a	choice	of	data	science	tools
Thank You
TRY BLUEDATA EPIC ON AWS
For	more	informa>on:	
www.bluedata.com	
sales@bluedata.com	
www.bluedata.com/aws
Q&A
www.bluedata.com
Wrap-Up	
•  We’ll	share	the	slides	and	video	recording	
•  SuggesCons	for	future	meetups?	
•  Next	meeCng	TBD	–	we’ll	keep	you	posted
Thank	you		
for	a$ending!	
Thanks	to	our	sponsor:	
www.bluedata.com	
Twi$er:	@BDaaSmeetup	
Hashtag:	#BDaaS

More Related Content

Similar to Distributed Data Science, DevOps, and Docker

A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseInside Analysis
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Drupal Webinar: Ignite and Accelerate Your Drupal 7 to Drupal 9 Migration
Drupal Webinar: Ignite and Accelerate Your Drupal 7 to Drupal 9 MigrationDrupal Webinar: Ignite and Accelerate Your Drupal 7 to Drupal 9 Migration
Drupal Webinar: Ignite and Accelerate Your Drupal 7 to Drupal 9 MigrationCyber-Duck
 
Cloudera hadoop developer training
Cloudera hadoop developer trainingCloudera hadoop developer training
Cloudera hadoop developer trainingMagnific Trainings
 
How SAS Institute Drove Digital Transformation Through DevOps - AppSphere16
How SAS Institute Drove Digital Transformation Through DevOps - AppSphere16How SAS Institute Drove Digital Transformation Through DevOps - AppSphere16
How SAS Institute Drove Digital Transformation Through DevOps - AppSphere16AppDynamics
 
Data DevOps: An Overview
Data DevOps: An OverviewData DevOps: An Overview
Data DevOps: An OverviewScott W. Ambler
 
360 digital transformation profile
360 digital transformation   profile360 digital transformation   profile
360 digital transformation profileKamal Singh
 
DELWP’s Data Lake: Investing in Asset Wealth for Public/Community Benefit – B...
DELWP’s Data Lake: Investing in Asset Wealth for Public/Community Benefit – B...DELWP’s Data Lake: Investing in Asset Wealth for Public/Community Benefit – B...
DELWP’s Data Lake: Investing in Asset Wealth for Public/Community Benefit – B...Amazon Web Services
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWKent Graziano
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionWeCloudData
 
Salesforce Org lifecycle management : empowering admins
Salesforce Org lifecycle management : empowering adminsSalesforce Org lifecycle management : empowering admins
Salesforce Org lifecycle management : empowering adminsJitendra Zaa
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Data Con LA
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldKaren Lopez
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise WideDatabricks
 

Similar to Distributed Data Science, DevOps, and Docker (20)

A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Drupal Webinar: Ignite and Accelerate Your Drupal 7 to Drupal 9 Migration
Drupal Webinar: Ignite and Accelerate Your Drupal 7 to Drupal 9 MigrationDrupal Webinar: Ignite and Accelerate Your Drupal 7 to Drupal 9 Migration
Drupal Webinar: Ignite and Accelerate Your Drupal 7 to Drupal 9 Migration
 
Cloudera hadoop developer training
Cloudera hadoop developer trainingCloudera hadoop developer training
Cloudera hadoop developer training
 
How SAS Institute Drove Digital Transformation Through DevOps - AppSphere16
How SAS Institute Drove Digital Transformation Through DevOps - AppSphere16How SAS Institute Drove Digital Transformation Through DevOps - AppSphere16
How SAS Institute Drove Digital Transformation Through DevOps - AppSphere16
 
Data DevOps: An Overview
Data DevOps: An OverviewData DevOps: An Overview
Data DevOps: An Overview
 
360 digital transformation profile
360 digital transformation   profile360 digital transformation   profile
360 digital transformation profile
 
DELWP’s Data Lake: Investing in Asset Wealth for Public/Community Benefit – B...
DELWP’s Data Lake: Investing in Asset Wealth for Public/Community Benefit – B...DELWP’s Data Lake: Investing in Asset Wealth for Public/Community Benefit – B...
DELWP’s Data Lake: Investing in Asset Wealth for Public/Community Benefit – B...
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Screw DevOps, Let's Talk DataOps
Screw DevOps, Let's Talk DataOpsScrew DevOps, Let's Talk DataOps
Screw DevOps, Let's Talk DataOps
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
 
Salesforce Org lifecycle management : empowering admins
Salesforce Org lifecycle management : empowering adminsSalesforce Org lifecycle management : empowering admins
Salesforce Org lifecycle management : empowering admins
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
 
Dice overview
Dice overviewDice overview
Dice overview
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database World
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 

Distributed Data Science, DevOps, and Docker

Editor's Notes

  1. Nanda Vijaydev, Director of Solutions, BlueData Nanda has more than 10 years of experience in Data Management and Data Science. At BlueData, Nanda works with Hadoop, Spark, and related technologies to build software solutions for Big Data analytics use cases. She has worked on multiple Data Science and Big Data projects for large enterprises in the healthcare, media, telecommunications, and other industries. Prior to BlueData, she was a principal solutions architect at Silicon Valley Data Science and director of solutions engineering at Karmasphere. She has an in-depth understanding of Data Management tools including dataintegration, ETL, data warehousing, reporting, Hadoop, and Spark.
  2. The role of “Data” has changed from an asset that was critical to monitoring and managing business operations (think the world of BI and Reports etc) to being a source of competitive advantage, and has become a strategic enterprise asset that will be the source of new products and services for all organizations. This change combined with Big data technologies, and ability to use more types of data to refine the analytics, the traditional water fall model used has quickly evolved into an iterative, closedloop cycle with emphasis on continuous improvement. The faster organizations can bring data and teams together and iterate , the more likely they are to gain competitive advantage and create disproportionate value. So how does one bring this analytics agility and velocity? If you want to train a statistical model on very large amounts of data, you'll need three things: 
a storage platform capable of holding all of the training data, 
a computational platform capable of efficiently performing the heavy-duty mathematical computations required, and 
a statistical computing language with algorithms that can take advantage of the storage and computation power. Waterfall/Slow(er) Small(ish) Data Single Server Static Results Iterative/Ongoing Big/Fast Data Multiple Servers Results are ‘Big’
  3. Assumes there is an inflection point and the need for previous infrastructure goes away
  4. User requirements are on-going and infrastructure need is continuous
  5. Data Science is not a one-time process Build and evaluate in a sandbox, then evaluate and deploy at scale Minimize recoding of the models – it’s not sustainable for continuous evaluation Run environments should mimic build environments at a larger scale
  6. R-Studio cluster provisioned on Spark2.1 in Bluedata
  7. R-Studio cluster provisioned on Spark2.1 in Bluedata
  8. Our vision for BlueData EPIC is to provide a single software platform for Big-Data-as-a-Service that supports both on-prem and cloud deployments for Big Data.
  9. Rapid provisioning of data science environments at scale (Cloud or On-Prem) Sharing of data, code, and results (this is a big deal!) Easily customize environments Reproducibility (environments, results)
  10. Leverage a turnkey DevOps platform for Data Science (e.g. BlueData EPIC)