2. 2
Page:
2
Agenda
! Sco9abank
–
Background
! My
Department
–
PCM
–
What
do
We
Do
! Why
Did
We
Introduce
Splunk
! Where
do
we
u9lize
Splunk
! Future
uses
of
Splunk
! Final
Comments
3. 3
Page:
3
3
Sco9abank
-‐
What
we
offer
-‐
Our
Goal
• Sco9abank
is
one
of
North
America’s
premier
financial
ins9tu9ons,
and
Canada’s
most
interna9onal
bank
and
conduct
business
in
over
55
countries.
• Sco9abank
and
its
affiliates
offer
a
broad
range
of
products
and
services,
including
retail,
commercial,
corporate
and
investment
banking
to
more
than
21
million
customers
around
the
world
• Our
goal
is
to
be
the
best
Canadian-‐based
interna9onal
financial
services
company
• Strategic
focus
–
diversifica9on,
by
both
geography
and
business,
4. 4
Page:
4
Sco9abank
-‐
Canada’s
most
interna9onal
bank
4
5. 5
Page:
5
My
Background
and
Role
! My
Role
–
Director
of
PCM
(Performance
&
Capacity
Management)
! Some
of
my
key
accountabili9es:
– Manage
applica9on
performance
and
availability
across
cri9cal
bank
plaorms
– Mainframe
Applica9on
Performance
Focus
and
Resource
Usage
– Mainframe
Chargeback
Repor9ng
and
Process
– Historical
Trending
and
Analysis
of
Key
Applica9on
Performance
! Our
Department
has
deployed
and
manages
the
following
key
technologies
– Splunk
– Dell
Quest
Applica9on
Performance
Management
Tools
– SAS
Analy9cs
! We
have
a
team
of
7
doing
the
above
work
! I
have
been
working
at
Sco9abank
for
41
years
6. 6
Page:
6
The
Good
Old
Days
–
When
I
started
in
IT!
A
Punch
Card
For
Computer
Input
IBM
Card
Reader
7. 7
Page:
7
Why
Did
PCM
Deploy
Splunk?
• Reduce
MTTR
(Mean
Time
To
ResoluCon)
– Improve
Applica9on
Availability
• Reduce
ApplicaCon
Coding
Issues
Recovery
Time
– Improve
Developer
produc9vity
and
reduce
costs
in
correc9ng
problems.
• OperaConal
Intelligence
– Developed
custom
monitoring
for
technologies
which
did
not
have
robust
monitoring
solu9ons:
ê DataPower
is
an
example
– Enhance
Opera9onal
view
of
Business
&
Opera9onal
Metrics
ê Improve
trending
of
System
and
Applica9on
errors
8. Splunk
Helps
Reduce
MTTR
MTTI – Mean Time To Identify => problem or degradation
MTTK – Mean Time To Know => root cause established
MTTF – Mean Time To Fix => create fix or bypass to incident & implement
MTTV – Mean Time To Verify => confirm service restored successfully
Splunk
Console
16. Splunk
Repor9ng
of
Commercial
Internet
Ac9ve
Users
On
Feb.
20
at
08:40
am
EST.
Splunk
Integrated
with
Google
Maps
17. 17
Page:
17
Aler9ng:
Sample
Alerts
OpenSSO
Errors
and
Degraded
Response
Time
Investment
PlaPorm
ApplicaCon
Failed
IniCalizaCon
HR
Web
(HORIZON)
ApplicaCon
Cannot
connect
to
HR
Database
Intralink
IMSConnect
TransacCon
Failures
(idenCfies
IMS
issues)
18. Wire
Payment
Volumes
Incoming
FTP
ConnecCon
Counts
Real
Time
View
of
Business
Volumes
Account
Transfer
Volumes
19. 19
Page:
19
Splunk
For
IBM
DataPower
Devices
• IBM
WebSphere
DataPower
Integra9on
Appliance
is
a
non-‐disrup9ve
network
device
that
provides
common
message
transforma9on,
integra9on
and
rou9ng
func9ons
on
a
single
plaorm.
Bank
implemented
this
hardware
to
eliminate
obsolete
SNA
protocol
and
simplify
communica9on
protocols
between
MVS
Host
IMS
and
mid-‐9er
plaorms
at
the
Bank.
• PCM
developed
a
comprehensive
monitoring
solu9on
using
Splunk
to
monitor
both
the
physical
appliance
and
the
applica9ons
services
as
IBM
does
not
provide
an
out
of
the
box
monitoring
solu9on.
21. Current
Projects-‐
Applica9on
Health
Check
Solu9on
with
Dawn
InfoTek
AQA
Health
Check
AQA Health Check Application
easyNotification add-on
Logs
easyNoCficaCon
Pager
Email
Voice
SMS
Mobile
Applica9on
ApplicaCons
Web
Web
Services
•
Synthe9c
Transac9ons
•
Specific
JVMs
Dev
PCM
Manager
Opera9on
22. Current
Projects
-‐
Building
a
DB2
on
AS400
Monitoring
Solu9on
With
Datavail
RPG/SQL
Flat
File
Metrics
Indexed
>>>
FTP
>>>
AS400
Metrics
to
Splunk
25. Splunk
has
Become
our
Swiss
Army
Knife
For
Building
Cri9cal
Monitoring
Solu9ons
• We
have
yet
to
find
a
situaCon
where
we
could
not
address
a
monitoring
gap
with
Splunk.
• You
can’t
manage
what
you
don’t
measure!
Catch
LDAP
update
errors
Report
on
Workflow
Queue
Depth
trends
Alert
When
FTP
Service
Goes
Down
Report
on
ETL
Errors
For
Finance
Alert
on
AcCve
Directory
Signon
Failures
JVM
Health
Check
iWay
Batch
Performance
Monitoring
Consolidated
Dashboard
For
Sr
Management