Splunk as a_big_data_platform_for_developers_spring_one2gx
Upcoming SlideShare
Loading in...5

Splunk as a_big_data_platform_for_developers_spring_one2gx






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Splunk as a_big_data_platform_for_developers_spring_one2gx Splunk as a_big_data_platform_for_developers_spring_one2gx Presentation Transcript

  • A Big Data Platform for Developers Damien Dallimore Developer Evangelist at Splunk© 2012 SpringOne 2GX. All rights reserved. Do not distribute without permission.
  • About me•  Developer Evangelist at Splunk since July 2012•  Splunk Community Member •  Splunk for JMX •  SplunkJavaLogging •  SplunkBase – Apps and Answers•  Splunk Architect and Administrator•  Coder •  Been paying my mortgage developing Enterprise Java solutions most of my career•  Kia Ora •  I do not have a speech impediment, I am from Aotearoa, so please restrain all your sheep, Lord of the Rings and Kim Dotcom heckles until beer o’clock !!2
  • Agenda•  Overview of the Splunk platform•  Splunk for Developers •  Custom Visualization Demo•  Splunk Java SDK•  Spring Integration Splunk Extensions •  Integration Adaptors Demo•  Some other JVM/Java related tools •  SplunkJavaLogging •  Splunk for JMX•  Questions3
  • What is
  • So What is Splunk, Exactly?•  Splunk is an engine for machine data •  It’s software – download and install it in 5 minutes,•  Provides visibility, reporting and search across “freemium” model all your IT systems and infrastructure •  Runs on all modern platforms•  Doesn’t lock you into a fixed schema •  Open and extensible architecture 5  
  • Indexes any Machine Data •  Capture events from logs in real time •  Run scripts to gather system metrics, connect to APIs and databases •  Listen to syslog, raw TCP/UDP, gather Windows events •  Universally indexes any data format so it doesn’t need adapters, “schema on the fly” •  Stream in data directly from your application code •  Decode binary data and feed in Windows Linux/Unix Virtualization Applications Databases Network •  Registry •  Configurations •  Hypervisor •  Web logs •  Configurations •  Configurations •  Event logs •  Syslog •  Guest OS •  Log4J, JMS, •  Audit/query •  syslog •  File system •  File system •  Guest Apps JMX logs •  SNMP •  sysinternals •  Ps, iostat, top •  .NET events •  Tables •  netflow •  Code and •  Schemas scripts6  
  • Centralizes Data Across the Environment•  Splunk Universal Forwarder sends data to Splunk Indexer from remote systems•  Uses minimal system resources, easy to install and deploy•  Delivers secure, distributed, real-time universal data collection for tens of thousands of endpoints Indexing/Search   Server   Splunk  Forwarders  7  
  • Scales to TBs/day and Thousands of Users•  Automatic load balancing linearly scales indexing•  Distributed search and MapReduce linearly scales search and reporting 8  
  • Provides Strong Machine Data Governance •  Provides comprehensive controls for data security, retention and integrity •  Single sign-on integration enables pass-through authentication of user credentials9  
  • Splunk and Apache Hadoop MR/HDFS•  Splunk is an implementation of the Map Reduce algorithmic approach•  It is not Apache Hadoop MapReduce(MR) the product•  Splunk is not agnostic of its underlying data source , optimized to Splunk Index files•  Real time vs Batch Jobs•  Optimal for time series based data•  End to End Integrated Big Data Solution•  Fine grained protection of access and data using role based permissions•  Data retention and aging controls•  Users can submit “Map Reduce” jobs without needing to know how to code a job •  Splunk Search Language vs Pig/Sawzill•  But why not get the best of both worlds •  Splunk Hadoop Ops •  Splunk Hadoop Connect •  Shuttl (archiving to HDFS / S3)10
  • Splunk Has Four Primary Functions •  Searching and Reporting (Search Head) •  Indexing and Search Services (Indexer) •  Local and Distributed Management (Deployment Server) •  Data Collection and Forwarding (Forwarder) A  Splunk  install  can  be  one  or  all  roles…    11  
  • Getting Data into Splunk Agent and Agent-less Approach for Flexibility. syslog   Local  File  Monitoring   log  files,  config  files   TCP/UDP   dumps  and  trace  files   syslog  compa>ble  hosts   and  network  devices   Windows  Inputs   Scripted  Inputs   Event  Logs   shell  scripts  custom   performance  counters   Mounted  File  Systems   WMI   Ac>ve     parsers  batch  loading   registry  monitoring   hostnamemount   Event  Logs  Performance   Directory     AcAve  Directory  monitoring   code   shell   virtual   host   perf   Unix,  Linux  and  Windows  hosts   Windows  hosts   Custom  apps  and  scripted  API  connec>ons   Windows  hosts   Agent-­‐less  Data  Input   Splunk  Forwarder  12  
  • Universal Data Forwarder Forward  data  without  negaHvely  impacHng  producHon  performance.  •  Delivers secure, distributed, Universal  Forwarder  Deployment   real-time universal data collection for 10’s of thousands of endpoints Logs   Messages   ConfiguraHons   Metrics   Scripts  •  Extends Splunk data fabric to large scale private cloud and desktop environments•  Uses minimal system resources, easy to install and deploy Central  Deployment  Management   –  < half memory and footprint of Splunk 4.1; <1% of single core Monitor  files,  changes  and  the  system  registry;  capture  metrics  and  status.   13
  • Horizontal Scaling Load balanced search and indexing for massive, linear scale out. Distributed  Search   Forwarder       Auto  Load   Balancing  14  
  • Multiple Datacenters Index and store locally. Distribute searches to datacenters, networks & geographies. Headquarters   Distributed Search London   Hong  Kong   Tokyo   New  York  15  
  • Send Data to Other SystemsRoute raw data in real time or send alerts based on searches. Service  Desk   Event  Console   Problem  InvesHgaHon   SIEM  
  • High Availability / DR Combine auto load balancing and data replication. Distributed  Search   Primary  Cluster   Secondary  Cluster   Data  Clone   Splunk  Forwarders   Auto  Load  Balancing  17
  • Integrate External Data Extend search with lookups to external data sources. LDAP,  AD   Watch     Lists   CMDB   CRM/ ERP  Correlate  IP  addresses  with  locaHons,  accounts  with  regions   18  
  • Integrate Users and Roles Integrate authentication with LDAP and Active Directory. LDAP,  AD     Splunk  Flexible  Roles   CapabiliHes  &  Filters   Users  and  Groups   Manage   Indexes   Share   Searches   Save   Searches  Problem  InvesHgaHon   Problem  InvesHgaHon   Problem  InvesHgaHon   Manage   Users   NOT   tag=PCI   App=ERP   …   Map LDAP & AD groups to flexible Splunk roles. Define any search as a filter.19  
  • Centralized Licensing Management Groups, Stacks, and Pools for Enterprise Deployments. Problem  InvesHgaHon  20  
  • Deployment Monitoring Keep Tabs On Your Splunk Enterprise Deployment. Licenses   Sourcetypes   Indexers   Forwarders  21  
  • Real-time SearchData   Monitor  Input   Parsing  Pipeline   Real-­‐Hme   •  Source,  event  typing   Real-­‐Hme   Search   Parsing  Queue   Index  Queue   •  Character  set   Buffer   Process   TCP/UDP  Input   normalizaHon   •  Line  breaking   •  Timestamp  idenHficaHon   Scripted  Input   •  Regex  transforms   Indexing   Pipeline   Raw  data     Index  Files   Index  22  
  • Real-time Alerting source=“/var/log/secure.log”  “BAD  SU”  Data   Monitor  Input   Parsing  Pipeline   Real-­‐Hme   •  Source,  event  typing   Real-­‐Hme   Search   Parsing  Queue   Index  Queue   •  Character  set   Buffer   Process   TCP/UDP  Input   normalizaHon   •  Line  breaking   •  Timestamp  idenHficaHon   Scripted  Input   •  Regex  transforms   Indexing   Pipeline   Raw  data   Index  Files   Index   23  
  • New Approach to Heterogeneous Data Universal Indexing Search-time Knowledge Flexibility and Fast Time to Value•  No data normalization •  Knowledge applied at •  Normalization as it’s•  Automatically handles search-time needed timestamps •  No brittle schema to •  Faster implementation•  Parsers not required work around •  Easy search language•  Index every term & •  Multiple views into the •  Multiple views into the pattern “blindly” same data same data•  No attempt to •  Splunk helps find “understand” up front transactions, patterns and trends24  
  • Inside Universal Indexing AutomaHc  event  boundary  idenHficaHon  AutomaHc  Hmestamp  normalizaHon   ...enable  accurate  searching  and   trending  by  Hme  across  all  data:   25  
  • Inside Search-time Knowledge Extraction AutomaHcally  discovered  fields   And  user-­‐defined  fields  ...  enable  staHsHcs  and  precise  search   on  specific  fields:  26  
  • Inside Search-time Knowledge Extraction Searches  saved  as  event  types   Plus  tagging  of  event  types,  hosts  and  other  fields  ...  enable  normalized  reporHng,  knowledge   sharing  and  granular  access  control.   27  
  • Splunk for Developers28
  • Splunk  &  Developers   Custom/ Accelerate development & Machine  Data   SplunkUI Existing testing (Splunk Apps) Applications SDKs Integrate data from Splunk Search, chart and graph into your existing IT Save and schedule searches as alerts Export search results environment for operational Manage inputs and indexes visibility Add & remove users and roles REST API Build custom solutions to deliver real-time business insights from Big Data Engine29
  • Splunk in the Developer Community•  Over 1,000 unique visitors per week to dev.splunk.com•  Over 500 followers on Twitter @splunkdev•  Over 350 enterprise developer trial licenses granted
  • Acceleratedevelopment &testing
  • How does Splunk Accelerate Dev/Test?•  Splunk frees you from upfront database design for analytics •  late binding schema•  Developers and QA/test engineers don’t have to ask IT/Ops to get logs off machines •  Role base access to all data within one console without having to log into production systems •  All events are indexed and accessible in real-time in one place. •  Ad-Hoc real-time monitoring and historical investigation searchable from one place •  Correlations and insights across multiple tiers.•  Splunk lets you find issues quickly, so you can fix issues quickly•  Integrate Splunk search results into testing assertions32
  • StubHub & Splunk Engineering uses Splunk to investigate“Splunk  filled  a  vacuum  we  didn’t   bugsknow  we  had.” QA uses it during dev cycles- Nathan Pratt, Tech Lead, Tools& Automation, StubHub•  Started with Site Operations to resolve issues•  Grew to engineers, QA, upper management in technology•  Release requirement – Projects are required to certify that all logs are Splunk-friendly High-level view of application errors - used by site operations, engineering, and upper management 33
  • Integrate Splunk intoyour IT environment
  • Integration into existing IT tools The Splunk development platform is optimized for core enterprise developer skills Splunk UI Your application REST API communicates directly with a (Splunk Apps) Splunk instance for search, management SDKs and admin •  Provides full control to the developer REST API •  Use any language or tool that supports splunkd HTTP SDKs provide broad coverage of the REST API in popular languages •  Log directly to Splunk from any app •  Build a UI on any web stack •  Integrate into existing infrastructure35  
  • Splunk REST API•  Exposes an API method for every feature in the product •  Whatever you can do in the UI – you can do through the API. •  Run searches •  Manage Splunk configurations•  API is RESTful •  Endpoints are served by splunkd •  Requests are GET, POST, and DELETE HTTP methods •  Responses are Atom XML Feeds •  JSON coming in 5.0 •  Search results can be output in CSV/JSON/XML/Raw36
  • Developer Platform SDKs•  We want to make it as easy as possible for developers to build Big Data apps on top of the Splunk platform•  Several different language offerings, Software Development Kits (SDKs) •  Javascript, Java, Python, PHP, C#(private), Ruby(private)•  All Splunk functionality is accessible via our SDKs•  Get Data into Splunk•  Execute Splunk Searches, get data out of Splunk•  Manage Splunk•  Customized User Interfaces37
  • Comcast & Splunk Content browsed, purchased and Customer profile watched All tracked by time + and MAC address / device assignments and MAC address Correlate usage and profile data to analyze customer behavior: •  Revenues driven by content browsed •  Improving local content mix •  Better search results •  Tailor content promotion38
  • Bosch & Splunk Healthcare ManagementSplunking data sent from Evidence-ARM-based devices based•  Uses the Java SDK to send data Telehealth to Splunk Cardiac Rhythm Monitoring 39
  • Splunk as anintegrated,enterprise-ready BigData platform
  • Splunk  =  Integrated,  Enterprise-­‐ready  Big  Data  Plajorm   •  No need to write MapReduce jobs, just get data into Splunk and analyze •  Splunk delivers real-time insight – like clickstream analysis, IT early-warning systems, security and fraud protection •  Late-binding schema allows for faster, more flexible data insight gathering •  Data collection is integrated •  Distributed architecture offers scale-out capabilities with access control •  Out-of-the-box reporting and analytics capabilities •  SDKs cover over 170 REST API endpoints41  
  • Socialize & Splunk“Splunk eliminates the need towrite large MapReduce jobsto get meaningful informationout of our data. This meanswe can get powerful stats andinformation to our keystakeholders in a fraction ofthe time.”- Isaac Mosquera, CTO,Socialize 42
  • Visualizing Splunk with the SDKs•  Splunkweb has rich, but sometimes limited, visualization options•  You can use the SDKs to extract data from Splunk using a search, and visualize it•  Real-time searches can be especially powerful•  Using the Javascript SDK you can integrate with third party charting librarys like Google Charts & D3.43
  • Realtime Twitter Visualization Demo•  Twitter feeds being “firehosed” into Splunk and searched over in realtime•  Uses the Splunk Javascript SDK to stream the realtime search results from Splunk into a totally customized web based user interface•  Visualization of most popular hashtags with interactive pie chart,word cloud and geo heatmap using D3 Javascript SDK Browser45
  • Realtime Twitter Demo46
  • Splunk Java SDK(Software Development Kit)47
  • Get the Java SDK•  Open sourced under the Apache v2.0 license•  Clone from Github : git clone https://github.com/splunk/splunk-sdk-java.git•  Project level support for Eclipse and Intellij IDE’s•  Pre-requisites •  JRE 6+ •  Ant ( Maven support is in the works ) •  Splunk installed•  Loads of code examples •  Project examples folder •  Unit Tests •  http://dev.splunk.com •  http://gist.github.com/damiendallimore•  Comprehensive coverage of the REST API48
  • Java SDK Class Model HTTPService Resource Service ResourceCollection Entity EntityCollection Application Index Input InputCollection SavedSearchCollection•  Collections use a common mechanism to create and remove entities•  Entities use a common mechanism to retrieve and update property values, and access entity metadata•  Service is a wrapper that facilitates access to all Splunk REST endpoints 49
  • Key Java SDK Use cases •  Connect and Authenticate •  Manage •  Input Events •  Search50
  • Connect and Authenticatepublic static Service connectAndLoginToSplunkExample() { Map<String, Object> connectionArgs = new HashMap<String, Object>(); connectionArgs.put("host", ”somehost"); connectionArgs.put("username", ”spring"); connectionArgs.put("password", ”integration"); connectionArgs.put("port", 8089); connectionArgs.put("scheme", "https"); // will login and save the session key which gets put in the HTTP Authorization header Service splunkService = Service.connect(connectionArgs); return splunkService; }51
  • Managepublic static void getServerInfoExample() { Service splunkService = connectAndLoginToSplunkExample(); ServiceInfo info = splunkService.getInfo(); System.out.println("Info:"); for (String key : info.keySet()) System.out.println(" " + key + ": " + info.get(key)); Entity settings = splunkService.getSettings(); System.out.println("nSettings:"); for (String key : settings.keySet()) System.out.println(" " + key + ": " + settings.get(key)); }52
  • Input Eventspublic static void logEventToSplunkExample() { Service splunkService = connectAndLoginToSplunkExample(); // Get a Receiver object Receiver receiver = splunkService.getReceiver(); // Set the sourcetype Args logArgs = new Args(); logArgs.put("source", ”http-rest"); logArgs.put("sourcetype", ”spring-example"); // Log an event into the spring index receiver.log(”spring", logArgs, ”SpringOne 2GX rocks"); }•  Other Input transports •  HTTP REST Streaming •  Raw TCP Oneshot & Streaming •  Raw UDP & Syslog 53
  • Search•  Search query •  a set of commands and functions you use to retrieve events from an index or a real-time stream , "search index=spring error OR exception | head 10”•  Saved search •  a search query that has been saved to be used again and can be set up to run on a regular schedule•  Search job •  an instance of a completed or still-running search operation.Using a search ID you can access the results of the search when they become available. Job results are saved for a period of time on the server and can be retrieved•  Search Modes •  Normal : asynchronous , poll job for status and results •  Realtime : same as normal, but stream is kept open a results streamed in realtime •  Blocking : synchronous , a job handle is returned when search is completed •  Oneshot : synchronous , no job handle is returned, results are streamed •  Export : synchronous, not a search per say, doesn’t create a job, results are streamed oldest to newest 54
  • Blocking Searchespublic static void exportSearchExample() { Service splunkService = connectAndLoginToSplunkExample(); String searchQuery = "search error OR exception | head 10"; Args queryArgs = new Args(); queryArgs.put("earliest_time", "-1d@d"); queryArgs.put("latest_time", "now"); // perform the export , blocks here InputStream stream = splunkService.export(searchQuery, queryArgs); processInputStream(stream); }public static void simpleSearchExample() { Service splunkService = connectAndLoginToSplunkExample(); String searchQuery = "search error OR exception| head 10"; Args queryArgs = new Args(); queryArgs.put("earliest_time", "-3d@d"); queryArgs.put("latest_time", "-1d@d"); // perform the search , blocks here InputStream stream = splunkService.search(searchQuery, queryArgs); processInputStream(stream); } 55
  • Non Blocking Searchpublic static void searchJobExample() { Service splunkService = connectAndLoginToSplunkExample(); String outputMode = "csv";// xml,json,csv // submit the job Job job = splunkService.getJobs().create("search index=spring error OR fatal | head 10"); while (!job.isDone()) { try {Thread.sleep(500);} catch (Exception e) {} } Args outputArgs = new Args(); outputArgs.put("output_mode", outputMode); InputStream stream = job.getResults(outputArgs); processInputStream(stream, outputMode); // uses xml stream, opencsv and gson } 56
  • Realtime Searchpublic static void realTimeSearchExample() { Service splunkService = connectAndLoginToSplunkExample(); Args queryArgs = new Args(); queryArgs.put("earliest_time", "rt-5m"); queryArgs.put("latest_time", "rt"); // submit the job Job job = splunkService.getJobs().create("search index=spring exception OR error”, queryArgs); … } 57
  • Alternate JVM LanguagesScala Groovy ClojureJavascript(Rhino) JRuby PHP(Quercus)Ceylon Kotlin Jython We don’t need SDK’s for these languages , we can just use the Java SDK !58
  • Groovyclass SplunkJavaSDKWrapper { static main(args) { //connect and login def connectionParameters = [host:”somehost",username:"spring",password:"integration"] Service service = Service.connect(connectionParameters) //get Splunk Server info ServiceInfo info = service.getInfo() def splunkInfo = [:] for (key in info.keySet()) splunkInfo.put(key,info.get(key)) printSplunkInfo(splunkInfo) } static printSplunkInfo(splunkInfo) { println "Info” splunkInfo.each { key, value ->println key + " : " + value} } } 59
  • Scalaimport com.splunk.Service._ import scala.collection.mutable.HashMap import scala.collection.JavaConversions._ object SplunkJavaSDKWrapper { def main(args: Array[String]) = { //connect and login val connectionArgs = HashMap[String, Object]("host" ->”somehost”,"username" ->”me”,"password" ->”foo") val service = connect(connectionArgs) //get Splunk Server info val info = service.getInfo // Scala/Java conversion val javaSet = info.keySet val scalaSet = javaSet.toSet //print out Splunk Server info for (key <- scalaSet) println(key + ":" + info.get(key)) } } 60
  • Spring Integration Splunk Extensions Special thanks to Jianwei Li(Jarred) & Mark Pollack for creating this !61
  • Spring Integration •  Spring Integration is an extension to core Spring •  Based on “Enterprise Integration Patterns” model •  Messaging model and Declarative Adaptors •  Makes it easier to build integration solutions62
  • Spring Integration Splunk Adaptors•  Splunk Java SDK makes it easier to use the REST API•  Building on this , the Spring Integration Adaptors make it easier for Spring/Java developers to declaratively build data integration solutions and utilize the power of the Splunk platform•  https://github.com/SpringSource/spring-integration-extensions•  Inbound Adaptor –  Search and export the data from Splunk and push into message channels –  Filter, transform, export to other destinations•  Outbound Adaptor –  Can consume data acquired by other Integration adaptors(Twitter, JDBC…) and push it into Splunk for indexing, searching and visualization63
  • Spring Integration Splunk Inbound Adaptor •  Blocking, Non Blocking, Saved & Realtime Searches •  Exporting64
  • Spring Integration Splunk Outbound Adaptor •  HTTP REST Input •  TCP Input65
  • XML ConfigurationCommon Splunk settings<int-splunk:server id="splunkServer" host=”somehost" port="8089" userName=”damien"password=”foobar"/>Searching/exporting from Splunk<int-splunk:inbound-channel-adapter id="splunkInboundChannelAdapter” auto-startup="true"search="search index=spring error OR exception” splunk-server-ref="splunkServer” channel="inputFromSplunk" mode="blocking" initEarliestTime="-1d"> <int:poller fixed-rate="5" time-unit="SECONDS"/> </int-splunk:inbound-channel-adapter>Inputting events to Splunk<int-splunk:outbound-channel-adapter id="splunkOutboundChannelAdapter" auto-startup="true"order="1” channel="outputToSplunkWithMessageStore" splunk-server-ref="splunkServer” pool-server-connection="true" index="spring" sourceType="twitter-feed" source="spring-integration-httprest” ingest="submit"> </int-splunk:outbound-channel-adapter>66
  • Spring Integration Splunk Twitter Demo67
  • SplunkJavaLogging68
  • SplunkJavaLogging•  A logging framework to allow developers to as seamlessly as possible integrate Splunk best practice logging semantics into their code and transport events directly to Splunk.•  Custom handler/appender implementations(REST and Raw TCP) for the 3 most prevalent Java logging frameworks in play. Splunk events directly from your code. •  LogBack •  Log4j •  java.util.logging•  Better handling of stacktraces•  All code and examples is on Github69
  • Splunk for JMX70
  • Splunk for JMX•  SplunkBase App for monitoring JVM Applications•  Out of the box dashboards for JVM level monitoring (java.lang domain) •  Memory , Threading, GC, CPU etc…•  Very simple configuration to wire up monitoring of any Mbeans from applications (Tomcat, Jboss, Cassandra, Coherence etc…)•  Hotspot, JRockit, IBMJ9, OpenJDK•  Poll JMX attributes and operations , index data over time, correlate with other data•  Supports large scale deployments of JVMs•  Extensible and Customizable•  Many connectivity options •  RMI , IIOP •  Direct Process Attachment •  MX4J Hessian, Burlap and Soap•  Freely available download from SplunkBase & all code is on Github71
  • Learn More. Stay Connected.At SpringOne 2GX :•  Come by our booth •  Splunk demos ,Q & A •  SDK code•  Tee Shirts !!Web :•  Developer Platform : http://dev.splunk.com•  SplunkBase : http://splunk-base.splunk.com•  Twitter : @splunkdev , @damiendallimore•  Email : devinfo@splunk.com , ddallimore@splunk.com•  Blog : http://blogs.splunk.com/dev•  Github : http://github.com/splunk•  Splunk Live! Events and Online Videos at http://www.splunk.com72
  • Thanks for coming.73