Supporting Enterprise System Rollouts  with Splunk  Clint Sharp, Cricket Communications
Overview <ul><li>Introduction </li></ul><ul><ul><li>Who Am I? </li></ul></ul><ul><ul><li>Who is Cricket? </li></ul></ul><u...
Who Am I? Clint Sharp Director of Application Operations Cricket Communications Application Operations IT Operations Cente...
Who is Cricket? <ul><li>The Company </li></ul><ul><li>$2 billion in Revenue </li></ul><ul><li>5 million Subscribers </li><...
Cricket’s Architecture Service Gateway EMS Business Works Tibco Enterprise Services Bus (ESP) CID RSR CID ISR CID CSR MyCr...
Trial and Rollout
Why Splunk? <ul><li>Needed visibility into middleware platform </li></ul><ul><li>Providing logs to developers was resource...
First Use Case,  In-house Middleware Performance 2011-07-28 09:21:47 server=sandapcspapl1 adaptor=APL call=ValidateActivat...
Expansion of Forwarders <ul><li>Initial return on invested time was stellar </li></ul><ul><ul><li>Slow roll-out would seem...
Expansion of Forwarders, cont’d Round 1 Splunk Server Test Middleware Server Log Server
Expansion of Forwarders, cont’d Round 2 Initial Deployment QA Environment Trial
Expansion of Forwarders, cont’d Full Deployment
New Use Cases
What Data? HTTP Logs Exception Logs CRM Logs Business Events Logs POS Logs F5 Logs Operational Statistics IT Data Business...
Why That Data? <ul><li>Solving User Problems </li></ul><ul><ul><li>User reports error, searching across multiple systems’ ...
Finding Errors
Bigger Picture <ul><li>Systems Monitoring </li></ul><ul><ul><li>Web Services Operational Statistics </li></ul></ul><ul><li...
Web Services Operational Stats
Activations and Sales
Work Orders By Type
Found Data <ul><li>Call Center Calls </li></ul><ul><ul><li>To see if any particular market was experiencing problems </li>...
New Data <ul><li>F5 Logs </li></ul><ul><ul><li>F5 gives us visibility into virtually every major application </li></ul></u...
F5 VIP by Source IP
The Big Project (“Billing”)
Billing Nine-figure Cost 600+ People 2 year implementation
Splunk Contributions to Billing <ul><li>Ease developer access to log files from all environments </li></ul><ul><ul><li>Sim...
Billing Trial <ul><li>Vendor has performance issues, first Friday after Trial Launch </li></ul><ul><ul><li>Unable to diagn...
F5 VIP Tap Point Service Gateway Tibco Enterprise Services Bus (ESP) Web Logic Biller Oracle Database App Server Point of ...
F5 iRule <ul><li>when HTTP_REQUEST { </li></ul><ul><li>set hsl [HSL::open -proto UDP -pool syslog] </li></ul><ul><li>if { ...
F5 Log Data Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=IRB COMMENT=XML SRC=10.12.18.199 DST=10.3.0.22 URI=Cri...
Biller Dashboard
<ul><li>Splunk usage will grow organically, let it </li></ul><ul><li>More Data = More Benefit </li></ul><ul><li>Interestin...
Questions? Clint Sharp, Cricket Communications
Upcoming SlideShare
Loading in...5
×

Supporting Enterprise System Rollouts with Splunk

1,824

Published on

At Cricket Communications, Splunk started as a way to correlate all of our data into one view to help our operations team keep processes humming. Then we gave secured access to our developers, now they’re addicted. In fact, Splunk is critical in helping us speedup deployment of new systems (like our recent multi-million dollar billing system implementation). Learn how we use Splunk to display key metrics for the business, track overall system health, track transactions, optimize license usage, and support capacity
planning.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,824
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Splunk has a single integrated software architecture that provide all the capabilities needed to access, index and analyze and secure IT data, scale from a laptop to terabytes a day across the enterprise APIs and Commands include - (WMI, Registry, OPSEC LEA, DBI, JMS, VMWare API, other APIs)
  • How can you leverage Splunk?
  • How can you leverage Splunk?
  • This snapshot is taken from an outage. If you guessed the line with the big bow in the middle of it on top is the cause, you probably should be right but it’s a trick. To prove that correlation does not equal causation, in this graph you can see 199.38.44.8 spiking in traffic actually *because* we had an outage. These particular source has really terrible retry logic that blasts us with traffic any time we have an outage.
  • How can you leverage Splunk?
  • Splunk has a single integrated software architecture that provide all the capabilities needed to access, index and analyze and secure IT data, scale from a laptop to terabytes a day across the enterprise APIs and Commands include - (WMI, Registry, OPSEC LEA, DBI, JMS, VMWare API, other APIs)
  • Supporting Enterprise System Rollouts with Splunk

    1. 1. Supporting Enterprise System Rollouts with Splunk Clint Sharp, Cricket Communications
    2. 2. Overview <ul><li>Introduction </li></ul><ul><ul><li>Who Am I? </li></ul></ul><ul><ul><li>Who is Cricket? </li></ul></ul><ul><li>Trial & Rollout </li></ul><ul><ul><li>Why Splunk? </li></ul></ul><ul><ul><li>Expansion </li></ul></ul><ul><li>New Use Cases </li></ul><ul><ul><li>What Data? </li></ul></ul><ul><ul><li>Why That Data? </li></ul></ul><ul><ul><li>Bigger Picture </li></ul></ul><ul><li>The Big Project (“Billing”) </li></ul><ul><li>Conclusion/Questions </li></ul>
    3. 3. Who Am I? Clint Sharp Director of Application Operations Cricket Communications Application Operations IT Operations Center (ITOC) Configuration Management ETL and BI Operations 15 Employees/50 Contractors
    4. 4. Who is Cricket? <ul><li>The Company </li></ul><ul><li>$2 billion in Revenue </li></ul><ul><li>5 million Subscribers </li></ul><ul><li>Facilities Based </li></ul><ul><li>4300 Employees </li></ul><ul><li>IT </li></ul><ul><li>200+ Employees </li></ul><ul><li>200+ Splunk Users </li></ul><ul><li>Large Custom Development Shop </li></ul><ul><ul><li>Primarily Integrators </li></ul></ul><ul><ul><li>Custom CRM </li></ul></ul>
    5. 5. Cricket’s Architecture Service Gateway EMS Business Works Tibco Enterprise Services Bus (ESP) CID RSR CID ISR CID CSR MyCricket SOAP API Web Logic Biller Oracle Database SOAP API App Server Point of Sale Oracle DB SOAP API Data-Guard Biller Replica & Back-Office Systems
    6. 6. Trial and Rollout
    7. 7. Why Splunk? <ul><li>Needed visibility into middleware platform </li></ul><ul><li>Providing logs to developers was resource-intensive </li></ul><ul><li>Huge tool potential </li></ul>
    8. 8. First Use Case, In-house Middleware Performance 2011-07-28 09:21:47 server=sandapcspapl1 adaptor=APL call=ValidateActivationPayment type=Requests val=1 newval=109083 oldval=109082 2011-07-28 09:21:47 server=sandapcspapl1 adaptor=APL call=GetCustomerInformation type=ResponseTime val=1142 newval=1142 oldval=1318 2011-07-28 09:21:47 server=sandapcspapl1 adaptor=APL call=UpdateActivationPayment type=Successful val=3 newval=103334 oldval=103331 2011-07-28 09:21:47 server=sandapcspapl1 adaptor=APL call=ValidateActivationPayment type=RequestsOneMinuteCount val=1 newval=1 oldval=0 2011-07-28 09:21:47 server=sandapcspapl1 adaptor=APL call=PostPaygoPayment type=Successful val=6 newval=178006 oldval=178000
    9. 9. Expansion of Forwarders <ul><li>Initial return on invested time was stellar </li></ul><ul><ul><li>Slow roll-out would seem standard </li></ul></ul><ul><ul><li>To maximize our investment, determined we wanted to move quickly </li></ul></ul><ul><li>More data gives us more visibility </li></ul><ul><li>Quickly became addicted to being able to find data across all different points in the architecture </li></ul>
    10. 10. Expansion of Forwarders, cont’d Round 1 Splunk Server Test Middleware Server Log Server
    11. 11. Expansion of Forwarders, cont’d Round 2 Initial Deployment QA Environment Trial
    12. 12. Expansion of Forwarders, cont’d Full Deployment
    13. 13. New Use Cases
    14. 14. What Data? HTTP Logs Exception Logs CRM Logs Business Events Logs POS Logs F5 Logs Operational Statistics IT Data Business Data New Data!
    15. 15. Why That Data? <ul><li>Solving User Problems </li></ul><ul><ul><li>User reports error, searching across multiple systems’ log data saves huge amount of time </li></ul></ul>
    16. 16. Finding Errors
    17. 17. Bigger Picture <ul><li>Systems Monitoring </li></ul><ul><ul><li>Web Services Operational Statistics </li></ul></ul><ul><li>Business Data </li></ul><ul><ul><li>Activations and Sales </li></ul></ul><ul><ul><li>Work Orders by Type </li></ul></ul><ul><li>Found Data </li></ul><ul><ul><li>Call Center Calls </li></ul></ul><ul><ul><li>Website Text Messages to Customers </li></ul></ul>
    18. 18. Web Services Operational Stats
    19. 19. Activations and Sales
    20. 20. Work Orders By Type
    21. 21. Found Data <ul><li>Call Center Calls </li></ul><ul><ul><li>To see if any particular market was experiencing problems </li></ul></ul><ul><ul><li>Determine which customers were calling </li></ul></ul><ul><li>Website Text Messages to Customers </li></ul><ul><ul><li>Inappropriate messages </li></ul></ul><ul><ul><li>Who sent that? </li></ul></ul>
    22. 22. New Data <ul><li>F5 Logs </li></ul><ul><ul><li>F5 gives us visibility into virtually every major application </li></ul></ul><ul><ul><ul><li>Most critical applications are load balanced </li></ul></ul></ul><ul><ul><ul><li>Through simple iRule can gather source IP, destination IP, response time, node and pool </li></ul></ul></ul><ul><ul><ul><li>Expanded iRule finds API call in the XML and logs that too </li></ul></ul></ul><ul><ul><ul><li>Splunk provides an excellent way to slice and dice that data </li></ul></ul></ul>
    23. 23. F5 VIP by Source IP
    24. 24. The Big Project (“Billing”)
    25. 25. Billing Nine-figure Cost 600+ People 2 year implementation
    26. 26. Splunk Contributions to Billing <ul><li>Ease developer access to log files from all environments </li></ul><ul><ul><li>Simplifies collaboration with configuration management teams who run pre-production </li></ul></ul><ul><ul><li>Productivity loss in terms of lost developer and operator time to email around pre-production logs would have been extremely costly </li></ul></ul><ul><li>F5 logs assist greatly in performance testing </li></ul><ul><ul><li>Can tell when the load test scripts are failing to generate the prescribed load </li></ul></ul><ul><li>Operational Readiness Testing </li></ul><ul><ul><li>Helps to identify exceptions in logs to monitor for in transition to production </li></ul></ul><ul><ul><li>Alerts built in Splunk in pre-production and production to monitor new applications </li></ul></ul>
    27. 27. Billing Trial <ul><li>Vendor has performance issues, first Friday after Trial Launch </li></ul><ul><ul><li>Unable to diagnose vendor problem from our middleware </li></ul></ul><ul><li>Full launch expands scale from by an estimated 10x to 20x API traffic volumes from Trial </li></ul><ul><li>No way to definitively know that system will perform </li></ul><ul><ul><li>Extensive Joint Performance Testing with vendor, but as with all testing based on models, unsure if production will match testing in any way </li></ul></ul><ul><li>A need defined to be able to isolate vendor performance from Cricket’s middleware performance </li></ul><ul><ul><li>A tap between Cricket and the vendor is needed </li></ul></ul><ul><ul><li>A Ha! moment. Insert F5 VIP in front of their load balancer, with only one node, their VIP, simply to capture log data so Splunk can chart it </li></ul></ul>
    28. 28. F5 VIP Tap Point Service Gateway Tibco Enterprise Services Bus (ESP) Web Logic Biller Oracle Database App Server Point of Sale Oracle DB Data-Guard Biller Replica & Back-Office Systems Tap F5 VIP EMS Business Works CID RSR CID ISR CID CSR MyCricket SOAP API SOAP API SOAP API
    29. 29. F5 iRule <ul><li>when HTTP_REQUEST { </li></ul><ul><li>set hsl [HSL::open -proto UDP -pool syslog] </li></ul><ul><li>if { [HTTP::method] ne &quot;POST&quot; } { </li></ul><ul><li>event disable all </li></ul><ul><li>return } </li></ul><ul><li>if { (not [HTTP::header exists &quot;Content-Length&quot;]) or ([HTTP::header &quot;Transfer-Encoding&quot;] contains &quot;chunked&quot;) } { </li></ul><ul><li>event HTTP_REQUEST disable </li></ul><ul><li>event HTTP_REQUEST_DATA disable </li></ul><ul><li>event HTTP_RESPONSE disable </li></ul><ul><li>} else { </li></ul><ul><li>set http_request_time [clock clicks -milliseconds] </li></ul><ul><li>set service [getfield [HTTP::uri] &quot;/&quot; 4] </li></ul><ul><li>HTTP::collect [HTTP::header Content-Length] } </li></ul><ul><li>} </li></ul><ul><li>when HTTP_REQUEST_DATA { </li></ul><ul><li>set call [findstr [HTTP::payload] &quot;-WSDL|&quot; 6 &quot;,&quot;] </li></ul><ul><li>HTTP::release </li></ul><ul><li>} </li></ul><ul><li>when HTTP_RESPONSE { </li></ul><ul><li>set response &quot;POOL=[LB::server pool] NODE=[LB::server addr] STATUS=[HTTP::status] RESPONSETIME=[expr [clock clicks -milliseconds] - $http_request_time]&quot; </li></ul><ul><li>if { [info exists service] and [info exists response] } { </li></ul><ul><li>HSL::send $hsl &quot;DEVICE=SANPRODF5 ENV=PROD APP=IRB COMMENT=XML SRC=[IP::client_addr] DST=[IP::local_addr] URI=$service CALL=$call $responsen&quot; </li></ul><ul><li>} </li></ul><ul><li>} </li></ul>
    30. 30. F5 Log Data Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=IRB COMMENT=XML SRC=10.12.18.199 DST=10.3.0.22 URI=CricketAccountPort CALL=manageAccountArrangement_1 POOL=cvg-prod-irb-80 NODE=155.90.153.62 STATUS=200 RESPONSETIME=232 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=IRB COMMENT=XML SRC=10.12.18.197 DST=10.3.0.22 URI=CricketPaymentPort CALL=queryBucketBalances_1 POOL=cvg-prod-irb-80 NODE=155.90.153.62 STATUS=200 RESPONSETIME=87 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=IRB COMMENT=XML SRC=10.12.18.199 DST=10.3.0.22 URI=CricketAccountPort CALL=getBillCycleDates_1 POOL=cvg-prod-irb-80 NODE=155.90.153.62 STATUS=200 RESPONSETIME=70 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=IRB COMMENT=XML SRC=10.12.18.200 DST=10.3.0.22 URI=OMOrderPort CALL=searchOrder_2 POOL=cvg-prod-irb-80 NODE=155.90.153.62 STATUS=200 RESPONSETIME=75 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=IRB COMMENT=XML SRC=10.12.18.200 DST=10.3.0.22 URI=CricketPaymentPort CALL=queryBucketBalances_1 POOL=cvg-prod-irb-80 NODE=155.90.153.62 STATUS=200 RESPONSETIME=84 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=IRB COMMENT=XML SRC=10.12.18.197 DST=10.3.0.22 URI=CricketSystemPort CALL=getTimestamp_1 POOL=cvg-prod-irb-80 NODE=155.90.153.62 STATUS=200 RESPONSETIME=69 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=CSP3 COMMENT=TCP SRC=10.72.0.133 SRCPORT=56759 DST=10.0.4.36 DSTPORT=80 OPEN=1195 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=CSP3 COMMENT=XML SRC=10.72.0.133 DST=10.0.4.36 URI=/csp-contentportal/CPortal CALL=debitAccount POOL=sandapcsp-80 NODE=10.0.24.106 STATUS=500 RESPONSETIME=1094 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=IRB COMMENT=XML SRC=10.12.18.198 DST=10.3.0.22 URI=CricketPaymentPort CALL=queryBucketBalances_1 POOL=cvg-prod-irb-80 NODE=155.90.153.62 STATUS=200 RESPONSETIME=82 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=CIDISR COMMENT=TCP SRC=96.17.70.167 SRCPORT=42971 DST=10.12.0.78 DSTPORT=443 OPEN=2233 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=CIDISR COMMENT=TCP SRC=96.17.70.155 SRCPORT=38332 DST=10.12.0.78 DSTPORT=443 OPEN=654 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=IRB COMMENT=XML SRC=10.12.18.195 DST=10.3.0.22 URI=CricketSystemPort CALL=getTimestamp_1 POOL=cvg-prod-irb-80 NODE=155.90.153.62 STATUS=200 RESPONSETIME=69 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=IRB COMMENT=XML SRC=10.12.18.199 DST=10.3.0.22 URI=CricketPaymentPort CALL=createAccountAdjustment_1 POOL=cvg-prod-irb-80 NODE=155.90.153.62 STATUS=200 RESPONSETIME=102 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=IRB COMMENT=XML SRC=10.12.18.196 DST=10.3.0.22 URI=CricketAccountPort CALL=queryDetailedInfo_1 POOL=cvg-prod-irb-80 NODE=155.90.153.62 STATUS=200 RESPONSETIME=124 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=CSP3 COMMENT=TCP SRC=10.72.0.133 SRCPORT=56761 DST=10.0.4.36 DSTPORT=80 OPEN=1080 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=CSP3 COMMENT=XML SRC=10.72.0.133 DST=10.0.4.36 URI=/csp-contentportal/CPortal CALL=getAccountInfo POOL=sandapcsp-80 NODE=10.0.24.149 STATUS=200 RESPONSETIME=977 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=CIDISR COMMENT=TCP SRC=96.17.70.155 SRCPORT=38326 DST=10.12.0.78 DSTPORT=443 OPEN=1307 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=CIDISR COMMENT=TCP SRC=96.17.70.155 SRCPORT=38325 DST=10.12.0.78 DSTPORT=443 OPEN=1396 Jul 29 08:28:55 10.3.0.61 DEVICE=SANPRODF5 ENV=PROD APP=CIDISR COMMENT=TCP SRC=64.211.145.189 SRCPORT=44340 DST=10.12.0.78 DSTPORT=443 OPEN=67
    31. 31. Biller Dashboard
    32. 32. <ul><li>Splunk usage will grow organically, let it </li></ul><ul><li>More Data = More Benefit </li></ul><ul><li>Interesting data can come from unintended sources </li></ul><ul><li>Always look for new sources of data, even write code to generate it if necessary </li></ul>August 15, 2011
    33. 33. Questions? Clint Sharp, Cricket Communications

    ×