Data	
Strategy	&	Tactics	
Sanjay	Sabnis:			Enterprise	Data	Architect	
Technology	Innovation	Summit
High	Level	Goals	
•  Data	Ingestion	
•  Data	Discovery	
•  Data	Analytics	
•  Data	Visualization
Data	
•  Transactional	
–  Payments,	Reservations	
•  Non-Transactional	
–  ClickStream,	Mobile	
•  Social	
–  Twitter,	Facebook,	Disqus,	Quora	
•  Application	Data	
–  Logs,	Metrics	…	
•  External	Data	Sets	
–  Demographics	…
Values	
•  Audience	Creation	
•  Market	Analytics	
•  Search	
–  Key-Word,	Ad-Hoc			
•  Predictive	and	Prescriptive	Analytics	
–  Recommendation,	Look-Alike,	Discriminant	Analytics	
•  Location	aware	analytics	
– where,	when,	what	
•  Co-relation	between	Data	Sets	
– Transactional,	non-transactional,	social		…	
•  Application	Performance	KPIs
Features	
•  Metadata	Management	
•  Data	Governance	
•  Data	Lineage	
•  Data	Access	
•  Security	
•  Data	Democratization
		 •  High	Memory	foot	print	
•  Elastic	Computing	Power	
•  Storage	–	Long	Term	and	Short	Term	
•  Network	Bandwidth	
•  API-Micro	Services/Container	
Architecture	
•  Streaming	
•  Encryption	
•  HA-DR	
Infrastructure
File	Streaming	
File	Upload	
Streaming	-		
Applications	
Streaming	Infrastructure	
HDFS	
Spark	Stream	Processing	
Real	Time	Streaming/Batch	
ETL	(Lambda	Architecture)	
Computed	Data	
Real	time	
Data	
Ingestion	
Services	
ETL	
UI	Portals/	
Dashboards	
OLAP	
Data	Encryption	
Data	Analysis	
Portal	Access	
Portal	
mlLib	
API	Access	
Raw	Data	
Real	Time	
Hadoop/YARN	
Batch	
Batch	
Real	Time	
Data	Encryption	
Simple	Reference	Architecture	
-	For	Light	Weight	API	Micro	Services	
API	Gateways		
Authentication	&	Authorization	
Apache	Zeppelin	
NoSQL	
	
Streaming	
	
Apache	NiFI
Tactical	Approach	
Analyze	
Integrate	
Strategy	
Architecture	Use	
Case	
Create	Road	Map-		
Short	Term	and	Long	Term	
Set	Value	and	ROI	Targets	
Use	Case	POC	/POV	
Analytics	Model	
Metrics	&	Scorecards	
Asses	value	&	effectiveness		
Architecture	&	Frame	work	
Selection	
Tool	Evaluation,	Setup	
Data	Modeling/Design	
	
Data	Source	/Mapping	
Integration		-	ETL/Streaming	
Automation/Scheduling	
Monitoring
Policy	
•  Regulatory	Compliance	
•  Audit	
•  Lineage	
•  Meta	Data	Management	
•  Data	Encryption	at	Rest	and	in	Motion	
•  Access	Management
Revenue	Models	
•  Open	API	
•  Subscription	Based	API	
•  Data-as-a-Service	–	(DaaS)
Culture	
•  DevOps	–	Need	to	Build	Automation	
•  Need	Environment	to	quickly	build	PoC	
•  Sufficient	Resources	for	Data	Wrangling	
•  Agile	Development	Methodology	
•  Talent
Trending	Technologies	
•  Computing	
– GPU	Based	
– In	Memory	
– Edge	Computing	
•  In	Memory	Computing	
•  New	Data	Science	Models		
–  Age/Sex	Based	on	Name	and	Time
Community	Engagement	
•  Open	Data	API	to	tap	into	community	talent	
to	build	models	
•  To	bring	in	new	talent	and	ideas	
•  To	learn	from	others/industry	trends
My	Recent	Experience	
•  Cluster	is	setup	to	run	cutting	edge	open	source	
In-memory	technologies	to	meet	low-latency	needs.	
	
•  Can	ingest	data	more	than	100	Mil	events	per		
minute	in	real	time	securely.	
	
•  Network	setup	to	accommodate	more	than	1000	nodes	
to	meet	analytics	processing	demand	
	
•  Cluster	is	polyglot	of	databases	and	tools	to	meet	the		
needs	of	big	data	ecosystem	
	
•  15k	Cores	,	150	TB	Memory,	15	PB	Store	
•  On-Premise	and	AWS	Experience
Thank	You

Data Strategy