2. Disclaimer
During the course of this presentation, we may make forward looking statements regarding future events or the
expected performance of the company. We caution you that such statements reflect our current expectations and
estimates based on factors currently known to us and that actual events or results could differ materially. For important
factors that may cause actual results to differ from those contained in our forward-looking statements, please review
our filings with the SEC. The forward-looking statements made in the this presentation are being made as of the time
and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or
accurate information. We do not assume any obligation to update any forward looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change at any
time without notice. It is for informational purposes only and shall not, be incorporated into any contract or other
commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include
any such feature or functionality in a future release.
2
3. Agenda
3
Damien’s Section
What is messaging
JMS + Demo
AMQP + Demo
Kafka + Demo
Custom message handling
Architecting for scale
Nimish’s Section
Using ZeroMQ
Using JMS for underutilized computers
Question time
8. What is messaging ?
Messaging infrastructures facilitate the sending/receiving of messages between distributed systems
Message can be encoded in one of many available protocols
A common paradigm involves producers and consumers exchanging via topics or queues
8
Topics (publish subscribe)
Queues (point to point)
TOPIC
QUEUE
9. Why are messaging architectures used ?
Integrating Legacy Systems
Integrating Heterogeneous Systems
Distributed Applications
Cluster Communication
High Performance Streaming
9
11. The data opportunity
Easily tap into a massive source of valuable inflight data flowing around the veins
Don’t need to access the application directly ,pull data off the messaging bus
I can not think of a single industry vertical that does not use messaging
11
12. Getting this data into Splunk
Many different messaging platforms and protocols
JMS (Java Message Service)
AMQP (Advanced Message Queueing Protocol)
Kafka
Nimish will cover some more uses cases also
12
13. JMS
Not a messaging protocol , but a programming interface to many different
underlying message providers
WebsphereMQ , Tibco EMS , ActiveMQ , HornetQ , SonicMQ etc…
Very prevalent in the enterprise software landscape
DEMO
13
14. AMQP
RabbitMQ
Supports AMQP 0.9.1, 0.9, 0.8
Common in financial services and environments that need high performance
and low latency
DEMO
14
15. Kafka
Cluster centric design = strong durability and fault tolerance
Scales elastically
Producers and Consumers communicate via topics in a Kafka node cluster
Very popular with open source big data / streaming analytics solutions
DEMO
15
16. Custom message handling
These Modular Inputs can be used in a multitude of scenarios
Message bodies can be anything : JSON, XML, CSV, Unstructured text, Binary
Need to give the end user the ability to customize message processing
So you can plugin your own custom handlers
Need to write code , but it is really easy , and there are examples on GitHub
I’m a big data pre processing fan
16
20. Achieving desired scale
AMQP Mod Input
AMQP Queue
20
Single Splunk Instance
With 1 Modular Input instance , only so much performance / throughput can be achieved
You’ll hit limits with JVM heap , CPU , OS STDIN/STDOUT Buffer , Splunk indexing pipeline
21. So go Horizontal
AMQP Queue
21
Splunk Indexer Cluster
Universal Forwarders
AMQP Broker
AMQP Mod Input AMQP Mod Input
23. About Me
• Principal Systems Engineer at Splunk in the NorthEast
• Session Speaker at all past Splunk .conf user conferences
• Catch me on the Splunk Blogs
23
24. Problem with Getting Business Data from JMS
The goal is to index the business message contents into Splunk
Message Uncertainty Principal:
If you de-queue the message to look at it, you have affected the TXN
If you use various browse APIs for content, you may miss it
– Message may have already been consumed by TXN
Suggestion: Use a parallel queue to log the message
– Suggestion: Try ZeroMQ
24
25. Why use ZeroMQ
Light Weight
Multiple Client language support (Python, C++, Java, etc)
Multiple design patterns (Pub/Sub, Pipeline, Request/Reply, etc)
Open Source with community support
25
31. Getting Events out of Splunk
31
Splunk SDK
Use Cases:
– In Depth processing of Splunk events in a queued manner
– Use as pivot point to drop off events into a Complex Event Processor
– Batch Processing of Splunk events outside of Splunk
Divide and Conquer Approach as seen in last slide
32. Java Example using SDK to load ZeroMQ
String query=search;
Job job = service.getJobs().create(query, queryArgs);
while (!job.isDone()) {
32
Thread.sleep(100);
job.refresh();
}
// Get Query Results and store in String str… (Code Omitted)
// Assuming single line events
StringTokenizer st = new StringTokenizer(str, "n");
while(st.hasMoreTokens()) {
String temp= st.nextToken();
sock.send(temp.getBytes(), 0);
byte response[] = sock.recv(0);
}
36. Applications for Distributing Work
Application Server would free up computing resources
Work could be pushed to underutilized computers
Examples:
– Massive Mortgage Calculation Scenarios
– Linear Optimization Problems
– Matrix Multiplication
– Compute all possible paths for combinatorics
36
38. Algorithm
Application servers push requests to queues, which may include data
in the request object called a Unit of Work
JMS client implements doWork() interface to work with data
Message Driven Bean receives finished work and implements
doStore() interface
What does this have to do with Splunk?
– Time Series results can be stored in Splunk for further or historical analytics
38
From Auckland
Dev evang , ex customer
5th Conf
Make Apps , Cut code
Through messaging background , a lot of integration work in many different industrys , particularly in the enterprise Java space.
Everything 100% open source use , reuse , whatever.
Collaborate
Community
answers.splunk.com for support is best
Enterprise Service Buses
Multi tier apps ,asynch processing
Apache Storm
That pretty broadly covers most enterprise software scenarios.
Interoperablity not guaranteed
message producers and consumers may be implemented differently
You “plugin” the underlying message provider implemention
Wire level protocol, hence better interoperabilty than JMS and better performance
Usual messaging features such as , Flow control , guaranteed delivery, quality of service etc…
JP Morgan chase
1.0 is an entirely different protocol , any demand for this ??
Swiftmq
Apache apollo
Apache qpid
Manage access to the cluster with Apache Zookeeper
Data streams can be partitioned over multiple machines in the cluster
Apache storm spout
If you have to opportunity to get the data into an optimal format for Splunk , do it.
Handle custom payloads , even binary
Efficient use of license
Pre compute some values that might not be best suited to the Splunk search language
Inputting the setting into stanza
Send message
Show reversed output
Your only limits are going to be your ability to provision Splunk nodes.
Same pattern applys to other Mod Inputs
Works with queues , not pub sub topics (you’ll get duplicates)