The First Step in Information Management
www.firstsanfranciscopartners.com
Produced	by:
MONTHLY SERIES
Brought	to	you	in	partnership	with:
April 6, 2017
Building	a	Flexible	and	Scalable	Analytics	Architecture
Polling	Question	
§ Where	is	your	organization	in	its	readiness	to	develop	a
formal	Big	Data	and	analytics	architecture?
− We	have	no	plans	or	architecture	for	analytics,	but	want	to	
have	one.
− We	have	a	strategy	and	are	planning	an	architecture.	
− We	have	started	to	implement	a	planned	architecture	for	
analytics.
− None	of	the	above.
pg 2© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Typical	Problems
§ Analytics	teams	are	set	up	outside	of	any	formal	
set	of	guardrails	
− They	do	good	work,	for	a	while	
− Then	they	start	to	ask	“Why	isn’t	this	more	
organized?”
§ A	CIO	decides	that	a	company	needs	to	do	better	
with	data,	and	acquires	$15	million	in	Big	Data	
technology	and	sets	up	a	data	lab
− A	few	sponsors	start	to	use	the	lab,	but	costs	of	
operation	seem	to	exceed	the	benefits	from	their	
efforts
− Someone	asks	“Why	did	we	do	this?”	
§ Both	were	missing	a	clear	plan	manifested	in	a	
formal	architecture	
pg 3© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Topics	For	Today’s	Webinar
§ What	is	a	big	data	and	analytics	architecture?	
§ When	should	big	data	and	analytics	
architectures	be	employed?
§ An	architecture	for	big	data	systems:
key	components
§ Best	practices	
§ Q&A
pg 4© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Combine?
www.firstsanfranciscopartners.com
What	is	a	Big	Data	and	Analytics	Architecture?
A	Definition	of	Architecture
§ The	art	and	discipline	of	designing	buildings	and	structures,	from	the	macro-level	of	
urban	planning	to	the	micro-level	of	creating	furniture	and	machine	parts.	
§ The	design	of	any	complex	object	or	system.	It	may	refer	to	the	implied	architecture	of	
abstract things	such	as	music	or	mathematics,	the	apparent	architecture	of	natural	
things	such	as	geological	formations	or	living	things,	or	explicitly	planned	architecture	
of	human-made	things	such	as	buildings,	machines,	organizations,	processes,	software	
and	databases.	
§ The	organized	arrangement	of	component	elements	to	optimize	the	function,	
performance,	feasibility,	cost	and/or	aesthetics	of	an	overall	structure.	
pg 6© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
From	The	DAMA	Guide	to	the	Data	Management	Body	of	Knowledge
Definition	of	Architectures	for	Big	Data	and	Analytics	
§ Therefore,	the	Big	Data	and	Analytics	
architecture	is	an	arrangement	of	
elements	that	are	used	to	manage	and	
leverage	enormous	amounts	of	data	to	
perform	analytics.		
pg 7© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Considerations	for	Architectures	for	Big	Data	and	Analytics	
§ Avoid	a	Winchester	house		
− Complicated	with	many	permutations	and	
variables
− It	is	additive
− Making	a	mistake	can	get	expensive	if	you	bolt	
on	an	incompatible	set	of	elements	
§ Ensure	you	need	Big	Data	for	Analytics
§ Consider	characteristics	that	optimize	the	
function,	performance,	feasibility	and	cost
pg 8© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Rooms: 160; Doorways: 467; Doors: 950; Fireplaces: 47 (gas,
wood, coal); Bedrooms: 40
Constructed 1884 – 1922 (38 continuous years); Cost: $5.5M
Blueprints: Never made; Individual rooms sketched out by Sarah
Winchester on paper or other media (e.g., tablecloths)
All design – no architecture
Elements	of	a	Big	Data	and	Analytics	Architecture
pg 9© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Organization	Elements	Functional	Elements	 Technology	Elements	
Data	
Consumption
Data	Supply	
Chain/Logistics
Data	
Management	
Pedigree	and	Preparation	
Landing/Staging	
Model/Metrics	Management	
Data	Reduction	
Glossary	Management	
Machine	Learning/AI	
Data	Governance	
Data	Operations	Data	Ingestion
Reference	and	Master	Data	
Competency	Centers	
Self-Service/Data	Citizens	
ETL/Virtualization	
Distributed Processing	
Metadata	
Data	Quality/Hygiene
Lake,	Pond,	Warehouse	
HDFS,	Columnar	and	Graph
Data	Streaming	
Data	Glossary
Data	Lake	Management	
Taxonomy/Ontology	
Web	Services	
Policy	and	Process	
Data	Analysts	and	Scientists	
Collaboration,	Decision-Making	
Access-Publish,	Subscribe,	Notify	
Access	tools	– BI,	Analytics
Applications		
Analytics	– Descriptive,	
Predictive,	Prescriptive	
Business/Tech.	Planning	
Security,	Privacy	
Business	Continuity
Two	Lenses	to	Derive	an	Effective	Architecture	
pg 10© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Form
Developing	the	
architecture	so	all	
stakeholders	can	
actually	understand	
and	develop	it	
Progression
Develop	architectures	
that	are	best	fit	for	
purpose	and	effective,	
no	matter	how	simple	
or	complex
Forms	of	Architectures	for	Big	Data	and	Analytics	
§ Architecture	forms:	
1. Abstract – Enable	and	convey	insight	so	it	can	be	considered	and	adopted
2. Apparent – Obvious	structure	so	it	can	be	used	to	manage	data	as	well	as	
interface	with	people	and	processes
3. Explicitly	Planned	– Must	be	comprehensive,	not	just	a	technology	stack	and	
a	bunch	of	abstract	arrows,	so	you	can	mange	and	sustain	the	environment		
§ Your	Big	Data	and	Analytics	architecture	needs	to	consider	all	three	forms.
pg 11© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Progression	of	Architecture	
pg 12© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Recognized Data-Driven
Insight-
Driven	
Audience	
expands	
Scattered	
Analytics
Value	has	
limited	
audience	
Embedded
• Support	tactical	
operations
• Monetization	of	
data
Isolated
Your	architecture	will	not	be	static.
Presenting	only	the	ultimate	future	state	is	not	practical.
www.firstsanfranciscopartners.com
When	Should	Big	Data	and	Analytics	
Architectures	be	Employed?
These	will	affect	architecture	and	progression:
§ Veracity (the	4th V)	– Scattered,	isolated,	
reactive	
§ Variety – Consolidating	content,	insight	from	
more	than	just	rows	and	columns
§ Volume – Consolidated	content,	tactical	uses,	
monetization	
§ Velocity – Business	velocity,	not	just	data	
velocity		
§ “Net	New”	– Generation	skipping,	balance	
alongside	meeting	traditional	needs	
Factors	That	Trigger	the	Need	for	Formal	Architecture	
pg 14© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
First	Progression	– Isolated,	Scattered	Analytics	(Abstract)	
pg 15© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Organization	elements	Functional		elements	 Technology	elements	
Data	
Consumption
Data	supply	chain	
/	logistics
Data	
Management	
Landing/Staging	 Data	Operations	ETL
Data	Analysts
Access-Publish,	Subscribe,	Notify	
Access	tools	– BI,	Analytics,	Analytics- Descriptive,	
Predictive,	Prescriptive	
HDFS,	Columnar	and	Graph
Isolated,	Scattered	Analytics
pg 16© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Legacy	 Usage	
EDW	
Predictive	
Analytics
Claims	
Customer	
Client	Data	 Hadoop	
BI	and	
Reporting	
ETLIngest	
Data	Scientist	Spark
Analyst
Second	Progression	– Recognized	Value	(Abstract)	
pg 17© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Organization	elements	Functional		elements	 Technology	elements	
Data	
Consumption
Data	supply	chain	
/	logistics
Data	
Management	
Landing	/	Staging	
Glossary	Management	 Data	Governance	
Data	Operations	Data	Ingestion
Reference	&	Master	Data	
Competency	Centers	
ETL /	Virtualization	
Data	Quality	/	Hygiene
Lake,	Pond	,	Warehouse	
HDFS,	columnar	&	Graph
Data	Glossary
Data	Lake	Management	
Policy	and	Process	
Data	Analysts	and	Scientists	Access-publish,	subscribe,	notify	
Access	tools	– BI,	Analytics,	Analytics- Descriptive,	
Predictive,	Prescriptive	
Security,	Privacy
Example	– B2B	Insurance	Company	and	Data	Monetization		
§ Insight	driven,	
monetizing	data	as	
separate	line	of	
business
− Data	lake,	
Hadoop,	
Dedicated,	
isolated	data	
science	area;	
isolated	
monetization	
area		
pg 18© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Legacy	 Usage	
Hybrid	Data	Architecture
New	Data	
Products	
EDW	
Predictive	
Analytics
Claims	
Customer	
Client	Data	
Hadoop	
Lake		
Ingest,	
pedigree	
BI	and	
Reporting	
Governance,	Data	ManagementETL
Spark
Third	Progression	– Data	Driven	(Abstract)	
pg 19© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Organization	elements	Functional		elements	 Technology	elements	
Data	
Consumption
Data	supply	chain	
/	logistics
Data	
Management	
Pedigree	and	preparation	
Landing	/	Staging	
Model	/	metrics	management	
Data	Reduction	
Glossary	Management	
Machine	Learning	/	AI	
Data	Governance	
Data	Operations	Data	Ingestion
Reference	&	Master	Data	
Competency	Centers	
Self	Service	/	Data	Citizens	
ETL /	Virtualization	
Distributed Processing	
Metadata	
Data	Quality	/	Hygiene
Lake,	Pond	,	Warehouse	
HDFS,	columnar	&	Graph
Data	Streaming	
Data	Glossary
Data	Lake	Management	
Taxonomy	/	Ontology	
Web	services	
Policy	and	Process	
Data	Analysts	and	Scientists	
Collaboration,		Decision	Making	
Access-publish,	subscribe,	notify	 Access	tools	– BI,	Analytics,	
Applications		
Analytics- Descriptive,	
Predictive,	Prescriptive	
Business	/	Tech.	Planning	
Security,	Privacy	
Business	Continuity
New	
Applications	
Usage	
Example	– “Net	New;”	Generation-Skipping,	Prescriptive		
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Logical	Data	Warehouse
Exploration	&	
Discovery
EDW	
Predictive	
Analytics
Applications
Streaming	IoT
Social	
Digital	Content	
eMail,	Docs	
Hadoop	
Data	Lake	
Ingest,	pedigree	
BI	and	
Reporting	
Governance,	Data	Management
pg 20
Spark
Data	Products	
Citizen	Data	
Scientist	
Storm
Pre-processing,	validation	
Hadoop	
connector
Example	– Data	and	Analytics	Technology	Stack	Apparent		
pg 21© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Data	Intake	
Data	Preparation:	
Permissions,	Dictionary,	Indexing,	
Pedigree	
Data	Landing	Zone (Data	Lake)	
Data	Transformation, Reduction	
Analytical	Data	
Assets	
Analytical	Computing	
Infrastructure	
BI	/	Reporting	
Assets	
Model	Server,	Data	Access	
BI	
Tool
Analytics
Tool
Sources	 Insights	
BI	and	
Reports	
Analytics	
Results	
Monetized	
Data	
Results	
Data	
Portal
Data Stores
Analytics
Data Layer
Data Mart LayerTransactional Application
Data Layer
Data Warehouse Layer
Integrated Data Layer
Content
External Internal
Content
Data Integration ServicesData Movement Services
Data Quality Services Data Access Services
Data	Management	Services
Enterprise	
Services Environment Management Services Security Services
Maste
r Data
Maste
r Data
Event
Data
Event
Data
Maste
r Data
Event
Data
Integrated	
Master	Data
Integrated	
Event	Data
Conformed
Dimensions
Atomic
Facts
Derived
Facts
History
Operational
Conformed
Master
Data
Integrated
Event Data
Analytic
Conformed
Master
Data
Derived
Facts +
History
Cubes –
(Multi-
Dimensional
Analytics)
Advanced
Analytics
(Statistical
Analysis, Data
Mining, etc.)
Archived Data Layer Metadata
Layer Ontologies /
Dictionaries
Business
Rules
Operational
Metadata
Technical
Master Data
Archive
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Reference	Data	Architecture	with	Services
www.firstsanfranciscopartners.com
An	Architecture	for	Big	Data	Systems:
Key	Components
FSFP	Reference	Architecture	– Abstract	Type		
§ Like	an	I-beam,	the	
data	architecture	
needs	to	take	the	
load	of	meeting	
business	objectives,	
and	distribute	that	
load	to	supportive	
structures	
pg 23© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
DATA	INSIGHT	ARCHITECTURE
Wrangling
Layer
Management	Layer				
Data	Access	Layer
Business	Strategy
FSFP	Reference	Architecture	– Abstract
DATA	INSIGHT	ARCHITECTURE
pg 24© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
1
Data	
Movement/
Logistics	
Cross-
Generation
Abstraction	
Processes	
and	
Mapping
Data	
Virtual’n,	
Services		
Management	Layer		
Metadata,	Lineage,	Work	Flow,	Models,	Reference	Data,	Rules,	Canonical	Data
Data	Access	Layer
BI/Reporting,		Analytics,	Mobile	
Vintage	Area	
Legacy	applications	and		
data	structures,	
traditional	methods			
Mission:	To	Serve	and	
Protect		
Contemporary	Area
New	apps and	data	
structures,	Agile	
methods	
Mission:	Flexible,	
Responsive	
Business	Strategy
FSFP	Reference	Architecture	– Apparent
DATA	INSIGHT	ARCHITECTURE
pg 25© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
1
Data	Life	
Cycles	
Management	
Data	Usage
Vintage	Area	 Contemporary	Area
Business	Strategy	
Legacy	BI	and	Reporting
Data	Warehouse,	ODS,	Mart	
ETL,	EAI,	Replication	
Data	Lake,	Pond
NoSQL	(HDFS,	Graph)
Advanced	Analytics	
RDBMS,	SQL,	In-Memory	
Appliance	
Metadata Lineage Reference	Data	
Alignment	
Data	Monetization
Visualization Data		Wrangling	Mobile Logical	DW
Unstructured	Data
FSFP	Reference	Architecture	– Explicit	
DATA	INSIGHT	ARCHITECTURE
pg 26© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
1
Data
movement /
logistics
Cross-
generation
Abstraction
Processes
&
Mapping
Vintage	Area	 Contemporary	Area	
Business	Strategy	
Vintage
Views
DBMS
Future
Apps
Data
Movement
/Logistics
Cross-
Generation
Abstraction
Processes
and
Mapping
Web
Services
Distributed
Processing
Data	
Virtual’n
$
Monetization
EDW
RDBMS
Ext’l
Data	
Unstr’d
Data	
Ingestion,	
pedigree	
Agile	
Apps	
Vintage	
Apps	
Management	Layer		
Metadata,	Lineage,	Work	Flow,	Models,	Reference	Data,	Rules,	Canonical	Data
Data	Access	Layer
BI/Reporting,		Analytics,	Mobile
DBMS
ETL
ETL
NoSql
Lake	
DM IoT
Pre	
process
pg 28
FSFP	Reference	Architecture	– Data	Access	Focus		
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
DATA	INSIGHT	ARCHITECTURE
Data Supply
Life Cycles
and Supply
Chains -
Movement
/Logistics
Management	Layer		
Data	Access	Layer
Vintage	
Area	
Contemporary	
Area
Business	Strategy	
Portals
Report,	BI,	
Query
Workbenches Labs	
Web	Services,	Data	Virtualization	
Mobile
www.firstsanfranciscopartners.com
Best	Practices
pg 30
Best	Practices	
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Apply	different	lenses	
Consider	Forms
Consider	
Progressions	
Reconcile	old	to	new	
Understand	
business	needs
Reconcile	
current-state	
technology	and	
future-state	
technology
Apply	the	“I-beam”
Address
Vintage	and	
Contemporary	
systems	
Have	a	plan	
Establish	
priorities		
Identify	where	
you	start	
Identify	who	is	
affected
Have	a	Methodology	
§ Establish	(but	with	a	defined	architecture)
a	Sandbox,	PoC	
§ Define	the	Vision	of	value	and	return	
§ Perform	Alignment	
§ Assess	the	V’s,	culture	and	organization	
readiness	
§ Define	long-term	requirements	for	use	
§ Define	operating	models	
§ Design	the	Analytics	Architecture
§ Develop	a	realistic	roadmap	
§ Transition	to	a	sustainable	architecture	
pg 31© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Discovery	
Action	 Strategy	
Vision	and	
Alignment	
Requirements
Architecture	and	
Design		
Assessment
Implementation	
and	Operation		
Roadmap
Initiation
Measurement	
and	Sustaining
Operating	Model		
Copyright:	First	San	Francisco	Partners,	2017
Questions?
pg 32© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
MONTHLY SERIES
Thank	you!
See	you	Thursday,	May	4	for	our	next	DIA	webinar,
The	Role	of	a	Data	Scientist	(Interview	with	a	CDS)
John	Ladley			@jladley
john@firstsanfranciscopartners.com
Kelle	O’Neal			@kellezoneal
kelle@firstsanfranciscopartners.com
Layer
Characteristic Transactional	
Application
Data	Warehouse Data	Mart
Data	produced	via	the	
automation	of	business	
processes
View	of	data	across	the	
enterprise.		Supports	
dissemination,	derivation	
of	knowledge	and	history
Purpose
Data	Life	Cycle
Data	Operations
Data	Model
Data	structured	and	
filtered	to	support	specific	
information	needs	of	small	
groups	of	users.		
All	base	(non-derived)	data	
originates	here
Derivations	(including	
aggregations)	produced	
here,	and	history	is	
inferred
Data	from	Warehouse	is	
transformed		to	support	
specific	reporting
Create	/	Source	/	Read	/	
Update	/	Delete	/	Archive
Extract	/	Transform	/	Load	
/	Derive	/	Publish	/	Archive
Subscribe	/	Transform	/	
Archive
Normalized	to	3NF
Subject	Oriented	/	
Snowflaked /	Conformed	
Dimensions
Information	Requirement	
Oriented	/	Snowflaked /		
Conformed	Dimensions
• Much more is needed than the above
• Definitions are a technical reference; explanations help stakeholders to
understand the reference architecture
Need	Definitions,	Explanations	– Not	Just	Picture
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

DI&A Webinar: Building a Flexible and Scalable Analytics Architecture