Copyright	©	2016	Splunk	Inc.
SAP	SuccessFactors
Alfred	Wan
Director,	Service	Delivery	Operations
HCM	Suite	Service	Owner
2
Agenda
• Splunk	@	SAP	SuccessFactors	
• ITSI	Proof	of	Concept
• Creative	Projects	w/	Splunk
Splunk @	SAP	SuccessFactors
4
About	SAP	SuccessFactors
• Market	leader	in	cloud-based	human	resources	application	industry
• Supports	5000	customers	and	50	million	registered	end	users
• 10	global	datacenters	across	US,	Europe,	APJ
• 10,000+	VMs/BMs	in	production
5
About	Me	/	Our	Team
• HCM	Suite	Service	Owner	(Performance	Engineering	Background)
• Team	of	90	people:	San	Francisco,	Reston,	Budapest,	Bangalore,	Shanghai
• Charter:	application	management	for	our	HCM	suite
• Focus:	application	troubleshooting,	performance	tuning,	capacity	management,	
root	cause	analysis,	long-term	proactive	measure
• Deal	with	everything	that	impacts	applications	(by	involving	responsible	teams)
6
Before	Splunk:	Manual,	Inefficient
• Had	to	analyze	logs	manually
• Used	various	scripts	(shell,	perl,	etc.)
• Could	only	view	one	server	at	a	time
• No	aggregated	view
• Lack	of	visibility/analytics
• Heavily	rely	on	experienced	individual
7
Choosing	Splunk
• Initially	purchased	Splunk	for	log	management	
• Saved	100+	log	request	tickets	per	week
• Attending	SplunkLive!	expanded	my	perception	of	Splunk
• Dashboards	and	Reporting
• Various	Visualization	
• Correlation
• Started	using	it	more	extensively	&	gained	multiple	additional	
advantages
• Reduce	MTTR
• Business	Analysis
• Environment	Certification
“We	still	haven’t	
unleashed	the	full	
power	of	Splunk,	
but	we’re	getting	
better	at	it.	We	
now	see	it	as	
much	more	than	
a	log	catching	&	
indexing	tool.”
8
Splunk Benefits
• Increased	efficiency
– Automated	previously	manual	processes
– Consolidated	worthwhile	tools,	retired	unnecessary	tools
• Higher	availability	and	performance
– Achieving	99.9%	availability
– Better	customer	performance	experience
• Gain	of	functional	time	
– Freed	up	engineering	resources
– Eliminating	redundancies	in	troubleshooting
• Information	discovery,	visibility
– Centralized,	more	precise	log	management	
– Aggregated	views	help	us	understand	navigation	usage,	outages,	etc.	
“Sharing	clues	and	
our	findings	about	
major	activities	
that	we	discover	in	
Splunk with	
executives	&	peers	
helps	our	business.”
9
Splunk at	SAP	SuccessFactors
• Splunk Enterprise
• 5 TB	license
• 10	data	centers	globally,	multiple	environments
• US	Ashburn	datacenter	example:
– 8	indexers	
– 2	syslogs,	1	job	server,	1	search	head
– 1500+	forwarders
10
Logs	indexed	at	SAP	SuccessFactors
• Syslogs
• F5 load	balancer	logs
• Switch	logs
• Firewall	logs
• Blade	logs
• OS	level	auditing	logs
• Web	server	logs
• Application	server	logs
• Database	logs
• Internal	APM	metrics
• …
11
Day	to	Day	Operations
• Centralize	90+	logs	from	various	of	tiers,	
applications	into	Splunk
• Real-time	troubleshooting	capability
• Transparent	view	in	one	stop	shop	
• Connect	the	dots,	reveal	comprehensive	status	of	
the	application	suite
• Monitoring	the	key	components	from	dashboard
• Splunk	Alerts	via	email	or	API
11
ITSI	Proof	of	Concept
13
New	Site	Availability	Goals
• Reduce	MTTR	by	90%
• Optimize	the	troubleshooting/investigation	process	
• Leverage	existing	tools	as	much	as	possible
• All	within	3	months!
14
Challenges	with	existing	solution
14
CHALLENGES
Multiple	point	solutions
Pingdom,	Zabbix,	Optier,	
Solarwinds
Minimal	indication	of	
root	cause
Repeated	escalations	and	
War	Rooms
Rapid	growth	across	
10	datacenters
PAIN	POINTS
CONCERNED	
Complexity,	teams	operating	in	
silos,	massive	infrastructure
ESCALATED
Long	resolution	times
STRESSED
Resource	drain	and	
missed	project	deadlines
Splunk	IT	Service	Intelligence
Machine	Learning-Powered,	Analytics-Driven	IT	Operations
Simplify	service	operations	
Prioritize	incidents	with	context Redefine	the	role	of	IT
Combine	events	&	metrics	
across	silos	with	ease,	
flexibility	&	scale	in	days
Unify	siloed monitoring	
Leverage	machine	learning	to	
detect	anomalies	&	highlight	
events	that	matter	
Deliver	business	&	service	context	to	prioritize	
incident	investigation	&	action
Support	decisions	&	communicate	results	
with	powerful	service-level	insights
16
Personalized	visualizations	of	our	HCM	service
• Visualize	contextual	inter-relationships	across	
each	layer	of	the	stack	from	Web,	Application	
through	Database	and	Network
• Illustrate	business	and	service	activity	using	
indicators	aligned	with	strategic	goals	
• Once	all	the	data	was	in	Splunk	we	were	able	
to	build	the	KPI’s	and	create	glass	tables	
within	a	few	days
• Goal	is	to	create	synthetic	service	health	
scores	with	weighted	KPI	to	facilitate	early	
alerting	and	fast	resolution
16
17
Organized	View	of	Performance	Indicators
17
Our	Operations	team	will	be	able	to:
• Build	out	Deep	Dives	to	correlate	critical	
KPI’s	for	each	layer	of	the	stack
• Example:	we	can	now	track	high	load	
on	servers	and	tie	it	back	to	Java	
Exceptions,	long	running	SQL	queries	
• Compare	performance	over	time	and	in	
real	time	to	understand	trends	and	
identify	systemic	issues
18
Real-Time	View	of	Service	and	KPI	Health	Scores
Site	Operations	team	will	be	able	to:
• Get	early	warning	of	emerging	incidents	
with	a	heat	map	of	service	health	and	
KPI	scores,	metrics,	sparklines	and	alerts
• Drill	down	into	service	and	entity	details	
for	in-depth	triage	
18
19
Tweak	KPI	Weight	for	Health	Score
19
Sliding	bar	approach	makes	the	process	
very	intuitive
Simulated	Severity	helps	user	understand	
how	the	weight	impacts	health	score
20
Learn	What’s	Normal	and	Abnormal
20
Baseline	normal	operations	and	alert	on	
anomalous	conditions
Identify	abnormal	trends	and	patterns	in	
KPI	data
21
Baseline	Trends	to	Adapt	Thresholds
21
Use	statistics	to	dynamically	adapt	KPI	
thresholds	by	time
Maintain	and	preserve	learned	thresholds	
to	monitor	KPI	and	service	behavior
Creative	Projects	w/Splunk
23
Long-term	APM	Data	Retention
Integrated	Splunk	with	our	in	house	
APM	solution	for	longer	data	retention	
using	the	ODATA	API’s.
• Identify	usage	patterns	
• Capacity	planning	for	“surge”	periods
• Insights	to	improve	overall	business
• Proactively	detect	anomalies	and	
remediate
• Business	analysis	with	historic	data
• Machine	learning	with	historic	data	to	
project	usage	growth	scientifically
23
24
Log	Verbosity	Analysis	&	Log	Fingerprinting
• Analyze	where	the	most	redundant	logs	are	coming	from
• Reduce	the	verbosity	of	top	contributors
• Develop	log	fingerprint	for	highly	redundant	and	heavy,	but	necessary	log	entries
• Replace	heavy	raw	log	entries	with	the	light	fingerprint	entry	
• Build	Splunk	lookup	table	to	match	the	fingerprint	with	raw	log	entry
• Goal	is	to	improve	the	logging	efficiency	so	that	we	can	do	more	with	Splunk
24
Thank	You
y.wan@sap.com

SAP-SuccessFactors Customer Presentation