This document provides an agenda and overview for a presentation on application logging best practices. It discusses the reality of growing volumes of event log data from various sources. While event logs currently have some structure, the structure is inconsistent and non-standard. The document argues that with proper interpretation, event logs can provide real business value by offering insights into operations, security, business intelligence, and customer experience. It presents an approach using Splunk to gain intelligence from application logs quickly through late structure binding and schema-less searching versus traditional analytics with early structure binding. The document also discusses liberating application data through purposeful semantic logging with clear key-value pairs.
Best Practices for Building Robust Data Platform with Apache Spark and DeltaDatabricks
This talk will focus on Journey of technical challenges, trade offs and ground-breaking achievements for building performant and scalable pipelines from the experience working with our customers.
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
Slides from my talk at the Hadoop User Group Ireland meetup on June 13th 2016: building a data pipeline to ingest data from sources of different nature into Hadoop in minutes (and no coding at all) using the Open Source Streamsets Data Collector tool.
This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
A complex real-time data workflow implementation is very challenging. This session will describe the architecture of a data platform that provides a single, secure, high-performance system that can be deployed in a hybrid cloud architectures. We will present how to support simultaneous, consistent and high-performance access through multiple industry open source and cloud compatible standards of streaming, table, TSDB, object, and file APIs. A new serverless technology is also used in the architecture to support a dynamic and flexible implementations. The presenter will also outline how the platform was integrated with the Spark eco-system, including AI and ML tools, to simplify the development process
Best Practices for Building Robust Data Platform with Apache Spark and DeltaDatabricks
This talk will focus on Journey of technical challenges, trade offs and ground-breaking achievements for building performant and scalable pipelines from the experience working with our customers.
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
Slides from my talk at the Hadoop User Group Ireland meetup on June 13th 2016: building a data pipeline to ingest data from sources of different nature into Hadoop in minutes (and no coding at all) using the Open Source Streamsets Data Collector tool.
This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
A complex real-time data workflow implementation is very challenging. This session will describe the architecture of a data platform that provides a single, secure, high-performance system that can be deployed in a hybrid cloud architectures. We will present how to support simultaneous, consistent and high-performance access through multiple industry open source and cloud compatible standards of streaming, table, TSDB, object, and file APIs. A new serverless technology is also used in the architecture to support a dynamic and flexible implementations. The presenter will also outline how the platform was integrated with the Spark eco-system, including AI and ML tools, to simplify the development process
Machine-generated data is one of the fastest growing and complex areas of big data. It's also one of the most valuable, containing a definitive record of all user transactions, customer behavior, machine behavior, security threats, fraudulent activity and more. Join us as we explore the basics of machine data analysis and highlight techniques to help you turn your organization’s machine data into valuable insights. This introductory workshop includes a hands-on(bring your laptop) demonstration of Splunk’s technology and covers use cases both inside and outside IT. Learn why more than 13,000 customers in over 110 countries use Splunk to make business, government, and education more efficient, secure, and profitable.
Machine-generated data is one of the fastest growing and complex areas of big data. It's also one of the most valuable, containing some of the most important insights: where things went wrong, how to optimize the customer experience, the fingerprints of fraud. Join us as we explore the basics of machine data analysis and highlight techniques to help you turn your organization’s machine data into valuable insights—across IT and the business. This introductory workshop includes a hands-on (bring your laptop) demonstration of Splunk’s technology and covers use cases both inside and outside IT. Learn why more than 13,000 customers in over 110 countries use Splunk to make their organizations more efficient, secure, and profitable.
To view recording of this webinar please use below URL:
http://wso2.com/library/webinars/2016/06/analytics-in-your-enterprise/
Big data spans many fields and brings together technologies like distributed systems, machine learning, statistics and Internet of Things (IoT). It has now become a multi-billion dollar industry with use cases ranging from targeted advertising and fraud detection to product recommendations and market surveys.
Some use cases such as urban planning can be slower (done in batch mode), while others such as the stock market needs results in milliseconds (done is a streaming fashion). Different technologies are used for each case; MapReduce for batch analytics, complex event processing for real-time analytics and machine learning for predictive analytics. Furthermore, the type of analysis ranges from basic statistics to complicated prediction models.
This webinar will discuss the big data landscape including
Concepts, use cases and technologies
Capabilities and applications of the WSO2 analytics platform
WSO2 Data Analytics Server
WSO2 Complex Event Processor
WSO2 Machine Learner
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...confluent
In this talk we’ll look at the relationship between three of the most disruptive software engineering paradigms: event sourcing, stream processing and serverless. We’ll debunk some of the myths around event sourcing. We’ll look at the inevitability of event-driven programming in the serverless space and we’ll see how stream processing links these two concepts together with a single ‘database for events’. As the story unfolds we’ll dive into some use cases, examine the practicalities of each approach-particularly the stateful elements-and finally extrapolate how their future relationship is likely to unfold. Key takeaways include: The different flavors of event sourcing and where their value lies. The difference between stream processing at application- and infrastructure-levels. The relationship between stream processors and serverless functions. The practical limits of storing data in Kafka and stream processors like KSQL."
Presentation by Smart ERP Solutions providing hands on deep dive into the PeopleSoft Alert Framework. The Alerts feature, which is a PeopleSoft Enterprise Component, enables you to alert your organization to errors, changes, and stalled transactions. It is a tool that is not limited to developers. If you can write a PeopleSoft Query, you can create an Alert. With alerts, you can scan PeopleSoft tables and receive alerts when exceptions are found. These alerts can include a link to the PeopleSoft page where you can review or correct the issue. In this session, we take a detailed look at how to set up alerts, how to take advantage of some of the different options, and prove real-world examples of how alerts can help you be proactive in your business.
This session takes an in-depth look at:
- Trends in stream processing
- How streaming SQL has become a standard
- The advantages of Streaming SQL
- Ease of development with streaming SQL: Graphical and Streaming SQL query editors
- Business value of streaming SQL and its related tools: Domain-specific UIs
- Scalable deployment of streaming SQL: Distributed processing
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming Data
Cloudera Partner SkillUp
Tim Spann
Principal Developer Advocate in Data In Motion for Cloudera
tspann@cloudera.com
using apache nifi, apache kafka and apache flink in a hybrid environment
cloudera dataflow
cloudera streams messaging manager
cloudera sql streams builder
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
Presented at SplunkLive! Frankfurt 2018:
Splunk Data Collection Architecture
Apps and Technology Add-ons
Demos / Examples
Best Practices
Resources and Q&A
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery ToolsAntonio Rolle
Review why a CMDB is essential to and is the foundation of your BSM strategy
Outline the known challenges that require planning at the outset of a CMDB initiative
Drill down into the approach and lessons learned in the initial stages of a CMDB rollout for one of the largest financial institutions in North America
Learn how you can automate your offline IT asset management processes so you can ensure data security, efficiency, standardized processes and more!
Learn about our tape management solutions at www.bandl.com/solutions/tape-management/
Learn more about our offline IT asset management solution at www.bandl.com/solutions/assetaware/
Kalix: Tackling the The Cloud to Edge ContinuumJonas Bonér
Read this blog for an overview of Kalix:
https://www.kalix.io/blog/kalix-move-to-the-cloud-extend-to-the-edge-go-beyond
Abstract:
What will the future of the Cloud and Edge look like for us as developers? We have great infrastructure nowadays, but that only solves half of the problem. The Serverless developer experience shows the way, but it’s clear that FaaS is not the final answer. What we need is a programming model and developer UX that takes full advantage of new Cloud and Edge infrastructure, allowing us to build general-purpose applications, without needless complexity.
What if you only had to think about your business logic, public API, and how your domain data is structured, not worry about how to store and manage it? What if you could not only be serverless but become “databaseless” and forget about databases, storage APIs, and message brokers?
Instead, what if your data just existed wherever it needed to be, co-located with the service and its user, at the edge, in the cloud, or in your own private network—always there and available, always correct and consistent? Where the data is injected into your services on an as-needed basis, automatically, timely, efficiently, and intelligently.
Services, powered with this “data plane” of application state—attached to and available throughout the network—can run anywhere in the world: from the public Cloud to 10,000s of PoPs out at the Edge of the network, in close physical approximation to its users, where the co-location of state, processing, and end-user, ensures ultra-low latency and high throughput.
Sounds exciting? Let me show you how we are making this vision a reality building a distributed real-time Data Plane PaaS using technologies like Akka, Kubernetes, gRPC, Linkerd, and more.
Federal Webinar: Improve IT Service Management and help meet Federal StandardsSolarWinds
The Federal Sales Engineering team discussed how service management can be improved by leveraging our integrated help desk and remote support solutions. They also reviewed and demonstrated our powerful, budget-friendly tools for mapping, network troubleshooting, syslog management, and more.
During this interactive webinar, attendees learned about:
Connect to remote computers directly from help desk trouble tickets, while having easy access to integrated IT asset information for faster troubleshooting with Web Help Desk® and Dameware® Remote Support
Use Network Topology Mapper to discover the IT assets on your network, including Layer 2 and Layer 3 topology data
Improve compliance with log retention policies using Kiwi Syslog® Server
Automatically back up and perform configuration changes with Kiwi CatTools®
Troubleshooting a range of network issues with Engineer’s Toolset™ (such as IP address and DNCP scope monitoring, port scanning, etc.)
Designed to be secure and manage file transfers within Serv-U® MFT
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
More Related Content
Similar to SplunkApplicationLoggingBestPractices_Template_2.3.pdf
Machine-generated data is one of the fastest growing and complex areas of big data. It's also one of the most valuable, containing a definitive record of all user transactions, customer behavior, machine behavior, security threats, fraudulent activity and more. Join us as we explore the basics of machine data analysis and highlight techniques to help you turn your organization’s machine data into valuable insights. This introductory workshop includes a hands-on(bring your laptop) demonstration of Splunk’s technology and covers use cases both inside and outside IT. Learn why more than 13,000 customers in over 110 countries use Splunk to make business, government, and education more efficient, secure, and profitable.
Machine-generated data is one of the fastest growing and complex areas of big data. It's also one of the most valuable, containing some of the most important insights: where things went wrong, how to optimize the customer experience, the fingerprints of fraud. Join us as we explore the basics of machine data analysis and highlight techniques to help you turn your organization’s machine data into valuable insights—across IT and the business. This introductory workshop includes a hands-on (bring your laptop) demonstration of Splunk’s technology and covers use cases both inside and outside IT. Learn why more than 13,000 customers in over 110 countries use Splunk to make their organizations more efficient, secure, and profitable.
To view recording of this webinar please use below URL:
http://wso2.com/library/webinars/2016/06/analytics-in-your-enterprise/
Big data spans many fields and brings together technologies like distributed systems, machine learning, statistics and Internet of Things (IoT). It has now become a multi-billion dollar industry with use cases ranging from targeted advertising and fraud detection to product recommendations and market surveys.
Some use cases such as urban planning can be slower (done in batch mode), while others such as the stock market needs results in milliseconds (done is a streaming fashion). Different technologies are used for each case; MapReduce for batch analytics, complex event processing for real-time analytics and machine learning for predictive analytics. Furthermore, the type of analysis ranges from basic statistics to complicated prediction models.
This webinar will discuss the big data landscape including
Concepts, use cases and technologies
Capabilities and applications of the WSO2 analytics platform
WSO2 Data Analytics Server
WSO2 Complex Event Processor
WSO2 Machine Learner
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...confluent
In this talk we’ll look at the relationship between three of the most disruptive software engineering paradigms: event sourcing, stream processing and serverless. We’ll debunk some of the myths around event sourcing. We’ll look at the inevitability of event-driven programming in the serverless space and we’ll see how stream processing links these two concepts together with a single ‘database for events’. As the story unfolds we’ll dive into some use cases, examine the practicalities of each approach-particularly the stateful elements-and finally extrapolate how their future relationship is likely to unfold. Key takeaways include: The different flavors of event sourcing and where their value lies. The difference between stream processing at application- and infrastructure-levels. The relationship between stream processors and serverless functions. The practical limits of storing data in Kafka and stream processors like KSQL."
Presentation by Smart ERP Solutions providing hands on deep dive into the PeopleSoft Alert Framework. The Alerts feature, which is a PeopleSoft Enterprise Component, enables you to alert your organization to errors, changes, and stalled transactions. It is a tool that is not limited to developers. If you can write a PeopleSoft Query, you can create an Alert. With alerts, you can scan PeopleSoft tables and receive alerts when exceptions are found. These alerts can include a link to the PeopleSoft page where you can review or correct the issue. In this session, we take a detailed look at how to set up alerts, how to take advantage of some of the different options, and prove real-world examples of how alerts can help you be proactive in your business.
This session takes an in-depth look at:
- Trends in stream processing
- How streaming SQL has become a standard
- The advantages of Streaming SQL
- Ease of development with streaming SQL: Graphical and Streaming SQL query editors
- Business value of streaming SQL and its related tools: Domain-specific UIs
- Scalable deployment of streaming SQL: Distributed processing
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming Data
Cloudera Partner SkillUp
Tim Spann
Principal Developer Advocate in Data In Motion for Cloudera
tspann@cloudera.com
using apache nifi, apache kafka and apache flink in a hybrid environment
cloudera dataflow
cloudera streams messaging manager
cloudera sql streams builder
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
Presented at SplunkLive! Frankfurt 2018:
Splunk Data Collection Architecture
Apps and Technology Add-ons
Demos / Examples
Best Practices
Resources and Q&A
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery ToolsAntonio Rolle
Review why a CMDB is essential to and is the foundation of your BSM strategy
Outline the known challenges that require planning at the outset of a CMDB initiative
Drill down into the approach and lessons learned in the initial stages of a CMDB rollout for one of the largest financial institutions in North America
Learn how you can automate your offline IT asset management processes so you can ensure data security, efficiency, standardized processes and more!
Learn about our tape management solutions at www.bandl.com/solutions/tape-management/
Learn more about our offline IT asset management solution at www.bandl.com/solutions/assetaware/
Kalix: Tackling the The Cloud to Edge ContinuumJonas Bonér
Read this blog for an overview of Kalix:
https://www.kalix.io/blog/kalix-move-to-the-cloud-extend-to-the-edge-go-beyond
Abstract:
What will the future of the Cloud and Edge look like for us as developers? We have great infrastructure nowadays, but that only solves half of the problem. The Serverless developer experience shows the way, but it’s clear that FaaS is not the final answer. What we need is a programming model and developer UX that takes full advantage of new Cloud and Edge infrastructure, allowing us to build general-purpose applications, without needless complexity.
What if you only had to think about your business logic, public API, and how your domain data is structured, not worry about how to store and manage it? What if you could not only be serverless but become “databaseless” and forget about databases, storage APIs, and message brokers?
Instead, what if your data just existed wherever it needed to be, co-located with the service and its user, at the edge, in the cloud, or in your own private network—always there and available, always correct and consistent? Where the data is injected into your services on an as-needed basis, automatically, timely, efficiently, and intelligently.
Services, powered with this “data plane” of application state—attached to and available throughout the network—can run anywhere in the world: from the public Cloud to 10,000s of PoPs out at the Edge of the network, in close physical approximation to its users, where the co-location of state, processing, and end-user, ensures ultra-low latency and high throughput.
Sounds exciting? Let me show you how we are making this vision a reality building a distributed real-time Data Plane PaaS using technologies like Akka, Kubernetes, gRPC, Linkerd, and more.
Federal Webinar: Improve IT Service Management and help meet Federal StandardsSolarWinds
The Federal Sales Engineering team discussed how service management can be improved by leveraging our integrated help desk and remote support solutions. They also reviewed and demonstrated our powerful, budget-friendly tools for mapping, network troubleshooting, syslog management, and more.
During this interactive webinar, attendees learned about:
Connect to remote computers directly from help desk trouble tickets, while having easy access to integrated IT asset information for faster troubleshooting with Web Help Desk® and Dameware® Remote Support
Use Network Topology Mapper to discover the IT assets on your network, including Layer 2 and Layer 3 topology data
Improve compliance with log retention policies using Kiwi Syslog® Server
Automatically back up and perform configuration changes with Kiwi CatTools®
Troubleshooting a range of network issues with Engineer’s Toolset™ (such as IP address and DNCP scope monitoring, port scanning, etc.)
Designed to be secure and manage file transfers within Serv-U® MFT
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
2. § Reality
of
Event
Logging
§ Liberating
Application
Data
§ Operational
Best
Practices
§ Data
Enrichment
|
Other
Data
Sources
§ More
Developer
Tools
Agenda
2
4. The
Accelerating
Pace
of
Data
Volume
|
Velocity |
Variety
|
Variability
GPS,
RFID,
Hypervisor,
Web
Servers,
Email,
Messaging,
Clickstreams,
Mobile,
Telephony,
IVR,
Databases,
Sensors,
Telematics,
Storage,
Servers,
Security
Devices,
Desktops
Machine data is
the
fastest
growing,
most
complex,
most
valuable
area
of
big
data
5. Event
Logs
Suck
Online
Services Web
Services
Servers
Security GPS
Location
Storage
Desktops
Networks
Packaged
Applications
Custom
Applications
Messaging
Telecoms
Online
Shopping
Cart
Web
Clickstreams
Databases
Energy
Meters
Call
Detail
Records
Smartphones
and
Devices
RFID
On-‐
Premises
Private
Cloud
Public
Cloud
§ They
have
some
structure
§ Structure
is
not
consistent
§ Structure
is
non-‐standard
§ Keys
can
be
stored
separately
§ High
volume,
growing
every
day
§ Hard
to
access
§ Take
up
tons
of
space
§ Clog
up
the
network
6. Event
Logs
Suck
050818
16:19:31 2
Query
UPDATE
xar_session_info
SET
xar_vars=
'XARSVuid|i:2;XARSVrand|i:343223999;XARSVuaid|s:2:"29";XARSVbrowsername|s:9:"Netscape6";XARSVbrowserversion|s:
3:"5.0";XARSVosname|s:7:"Unknown";XARSVosversion|s:7:"Unknown";XARSVnavigationLocale|s:11:"en_US.utf-‐
8";SPLUNKAPP_IP|N;',
xar_lastused =
1124407171
WHERE
xar_sessid ='ll7joq442223fl6h07v3f3vpd2’
10
Query
UPDATE
xar_session_info
SET
xar_vars=
'XARSVuid|i:2;XARSVrand|i:89426315;XARSVuaid|s
:2:"29";XARSVbrowsername|s:9:"Netscape6";XARSVbrowserversion|s:3:"5.0";XARSVosname|s:7:"Unknown";XARSVosv
ersion|s:7:"Unknown";XARSVnavigationLocale|s:11:"en_US.utf-‐8";SPLUNKAPP_IP|N;',
xar_lastused =
1124407193
WHERE
xar_sessid =
't2idg584t1co0scgj40qnnm’
31
Connect
caveuser@web2.int.splunk.com
on
cave
Jun
2
13:36:50
DEBUG[1826]:
Setting
NAT
on
RTP
to
0
Jun
2
13:36:50
DEBUG[1826]:
Check
for
res
for
5008office
Jun
2
13:36:50
DEBUG[1826]:
Call
from
user
'5008office'
is
1
out
of
0
Jun
2
13:36:50
DEBUG[1826]:
build_route:
Contact
hop:
<sip:5008office@10.1.1.132:5060>
Jun
2
13:36:50
VERBOSE[10887]:
-‐-‐ Executing
Macro("SIP/5008office-‐dfbd",
”
8. • Ensure
system
security
• Meet
compliance
mandates
• Customer
behavior
and
experience
• Product
and
service
usage
• End-‐to-‐end
transaction
visibility
Definitive
record
of
activity
and
behavior
Important
insight
for
IT
and
the
business
10.2.1.44 - [25/Sep/2009:09:52:30 -0700]
type=USER_LOGIN msg=audit(1253898008.056:199891): user pid=25702 uid=0
auid=4294967295 msg='acct="TAYLOR": exe="/usr/sbin/sshd" (hostname=?,
addr=10.2.1.48, terminal=sshd res=failed)'
User
IP Action Login Result
10.2.1.80 - - [25/Jan/2010:09:52:30 -0700]
"GET /petstore/product.screen
?product_id=AV-CB-01 HTTP/1.1" 200 9967 "http://
category.screen?category_id=BIRDS" "Mozilla/5.0 (
Linux)”"JSESSIONID=xZDTK81Gjq9gJLGWnt2NXrJ2tpGZb1
User
IP Product Category
Gold
Mine
of
Information
8
11. The
Mighty
Application
Log
Operations
Security
Business
Intelligence
Social/
Mobile
§ How
many
transactions
are
failing?
§ Which
specific
transactions
are
failing?
§ Is
system
performance
falling
behind?
§ Who
is
accessing
the
app?
When?
§ What
activity
looks
suspicious?
§ Is
the
application
behaving
as
expected?
§ What
is
the
purchase
volume
over
time?
§ How
do
purchases
compare
to
last
month?
§ How
are
customers
affected
by
app
issues?
§ How
is
the
customer
experience?
§ Are
transactions
taking
too
long?
§ Where
are
transactions
happening?
12. Traditional
Analytics
SELECT customers.* FROM
customers WHERE
customers.customer_id NOT
IN(SELECT customer_id FROM
orders WHERE
year(orders.order_date) =
2004)
Early
Structure
Binding
Structure Data
§ Schema created
a
design
time
§ Queries
understood
at
design
time
§ Homogenous
§ Must
fit
into
table
or
converted
to
tables
§ Must
match
constraints
13. Analytics
with
Splunk
Late
Structure
Binding
Structure Data
§ Schema-‐less
§ Created
at
search
time
§ Queries executed
ad-‐hoc
§ Heterogeneous
§ Constantly changing
§ No
conversion
required
§ No
constraints
14. Gain
Intelligence
Quickly
Early
Structure
Binding
Decide
question
to
ask
Design
the
schema
Normalize
data
+
write
DB
insertion
code
Create
SQL
&
feed
into
analytics
tool
Write
Semantic
Events
Collect
Create
Searches,
Reports
&
Graphs
Late
Structure
Binding
§ Days
– Weeks
– Months
§ Destructive
§ Minutes
§ Non-‐Destructive
16. Current
State
§ You
have
no
control
over
other
system’s
events
§ You
have
full
control
over
events
that
YOU
write
§ Most
events
are
written
by
developers
to
help
them
debug
§ Some
events
are
written
to
form
an
audit
trail
17. Logging
with
Purpose
§ Logging
for
Debugging
§ Troubleshoot
application
problems
§ Identify
trends
§ Categorize
issues
§ Semantic
Logging
§ Record
the
state
of
business
processes
§ Examples:
web
clicks,
financial
trades,
cell
phone
connections,
audit
trails,
etc.
void submitPurchase(purchaseId) {
log.info(
"action=submitPurchaseStart,
purchaseId=%d", purchaseId)
// These calls throw an exception:
submitToCreditCard(...)
generateInvoice(...)
generateFullfillmentOrder(...)
log.info(
"action=submitPurchaseCompleted,
purchaseId=%d", purchaseId)
}
18. Liberating
Log
Data
– In
a
Nutshell
q Use
clear
key-‐value
pairs
q Create
events
humans
can
read
q Use
developer-‐friendly
formats
q Use
timestamps
for
every
event
q Use
unique
identifiers
(IDs)
q Log
in
text
format
q Log
more
than
debug
events
q Use
categories
q Identify
the
source
q Minimize
multi-‐line
events
19. Use
Clear
Key-‐Value
Pairs
§ Create
Structure
from
Unstructured
Data
§ Use
space
or
comma
delimited
§ Wrap
values
with
spaces
in
quotes
§ Automatic
field
extraction
§ Self
describing,
does
not
require
regular
expressions
to
parse
§ Keys
are
stored
alongside
field
values
§ No
additional
configuration
work
for
Splunk
Admin
or
Knowledge
Manager
Example
(Good):
Log.debug(“orderstatus=error,errorcode=454,
user=%d,transactionid=%s”, userId, transId)
Example
(Bad):
Log.debug(“error %d 454 - %s ”, userId, transId)
20. Create
Human-‐Readable
Events
§ Use
ASCII
Format
§ Avoid
complex
encoding
§ Avoid
formats
which
require
arbitrary
code
to
decipher
§ Use
Consistent
Formatting
§ Separate
events
with
different
formats
into
individual
files
21. Create
Human-‐Readable
Events
§ Avoid
Binary
Data
§ Binary
data
is
compressed,
but
requires
decoding
and
does
not
segment
§ Splunk
cannot
meaningfully
search
or
analyze
binary
data
§ If
data
must
be
in
binary
format:
§ Provide
tool
to
easily
convert
to
ASCII
§ Create
custom
Splunk
search
command
to
decode
binary
segments
inline
§ Place
textual
metadata
in
the
event
§ For
example,
do
not
log
the
binary
data
of
a
JPG
file,
but
do
log
its
image
size,
creation
tool,
username,
camera,
GPS
location,
etc.
22. Use
Developer-‐Friendly
Formats
§ JSON
and
XML
are
Readable
by
Humans
and
Machines
§ Seamless
parsing
by
most
programming
languages
right
in
the
browser
§ Useful
for
capturing
hierarchy
or
membership,
and
self-‐describing
§ Easily
interpreted
by
Splunk
spath
command
{"widget": {
"text": {
"data": "Click here",
"size": 36,
"data": "Learn more",
"size": 37,
"data": "Help",
"size": 38,
}}
date size data
---------- ---- ----------
2014-08-12 36 Click here
37 Learn more
38 Help
23. Use
Timestamps
§ Time
is
a
First
Class
Citizen
§ Timestamps
are
critical
to
understanding
the
sequence
of
events
for
debugging,
analytics,
and
deriving
transactions
§ Timestamps
are
automatically
detected,
but
best
to
use
an
intelligent
format
§ Timestamp
Dos
§ Use
most
verbose
granularity,
if
possible
microseconds
since
events
can
become
orphaned
from
the
originating
event
§ Place
timestamps
at
beginning
of
event
§ Include
a
four
digit
year
§ Include
a
time
zone
§ Timestamp
Do
Nots
§ Do
not
use
a
time
offset
Example
(Good):
08/12/2014:09:16:35.842 GMT
INFO key1=value1 key2=value2
24. Use
Unique
Identifiers
(IDs)
§ More
Power
for
Debugging
and
Analytics
§ Examples:
Transaction
IDs,
user
IDs
§ Used
to
find
exact
transactions
§ Carry
Unique
IDs
Through
Multiple
Touch
Points
§ Avoid
changing
format
between
modules
or
systems
§ Include
transitive
closures
transid=abcdef,
transid=abcdef,
otherid=
qrstuv,
.
.
.
.
.
otherid=qrstuv
Transaction
25. Unique
IDs
Through
Multiple
Touch
Points
Order
ID
Customer’s
Tweet
Time
Waiting
On
Hold
Product
ID
Company’s
Twitter
ID
Order
ID
Customer
ID
Twitter
ID
Customer
ID
Customer
ID
Sources
Order
Processing
Twitter
Care
IVR
Middleware
Error
26. Minimize
Multi-‐ Line/Value
Events
§ Multi-‐ Line/Value
Events
are
Less
Efficient
§ More
difficult
for
software
to
parse
§ Generate
many
segments,
affects
indexing/search
speed
+
disk
compression
§ Break
multi-‐line
events
into
separate
events
§ Break
multi-‐value
fields
into
separate
events
for
easier
manipulation
Example
(Good):
<TS>
phonenumber=333-‐444-‐4444,
app=angrybirds,
installdate=xx/xx/xx
<TS>
phonenumber=333-‐444-‐4444,
app=facebook,
installdate=yy/yy/yy
Example
(Bad):
<TS>
phonenumber=333-‐444-‐4444,
app=angrybirds,facebook
27. Log
More
Than
Debug
Events
§ Log
anything
that
can
add
value
when
aggregated
and/or
visualized
§ user
actions
§ timing
§ transactions
§ audit
trails
§ Log
Category
§ Severity
levels
can
aid
navigation
and
baselining
§ Identify
the
Source
§ Use
class,
function
or
filename
29. Operational
Best
Practices
§ Log
locally
to
log
files
§ Provides
local
buffer
§ Non-‐blocking
during
network
failures
§ Use
syslog-‐ng or
rsyslog +
Splunk
forwarder
for
syslog
data
§ Implement
rotation
policies
§ Logs
take
up
space
§ Many
compliance
regulations
require
years
of
archival
storage
§ Decide
on
destroying
or
backing
up
logs
30. Operational
Best
Practices
§ Use
Splunk
Forwarders
§ Data
collection
in
real-‐time
§ Tracks
and
maintains
state
§ Enable
collection
of
data
over
many
channels:
HTTP
|
Queues
|
Multicast
|
Web
services
|
Databases
§ Collect
events
from
everything,
everywhere
§ Application
logs
|
database
logs
|
network
data
|
configuration
files
|
performance
data
|
time-‐based
data
§ More
data
captured
=
more
visibility
32. Creating
Value
With
Structured
Data
Enrich
search
results
with
additional
business
context
Easily
import
data
into
Splunk
for
deeper
analysis
Integrate
multiple
DBs
concurrently
Simple
set-‐up,
non-‐invasive
and
secure
DB
Connect
provides
reliable,
scalable,
real-‐time
integration
between
Splunk
and
traditional
relational
databases
Microsoft
SQL
Server
JDBC
Database
Lookup
Database
Query
Connection
Pooling
Other
Databases
Oracle
Database
Java
Bridge
Server
32
33. Hadoop
and
NoSQL
offer
simple
storage
but
hard
analytics:
difficult
to
explore,
analyze,
visualize
Hard-‐to-‐staff
skills:
require
months
of
labor
by
specialists
with
rare
and
expensive
skill
sets
Inflexible
approaches:
must
predefine
fixed
schemas
or
program
MapReduce
jobs
Hadoop
(MapReduce
&
HDFS)
YARN
DataFu
H
i
v
e
Mahout Pig
Sqoop
Wide
Range
of
Open
Source
Projects
for
Analytics
and
Data
Visualization
Azkaban
It’s
Hard
to
Turn
Raw
Data
Into
Refined
Insights
NoSQL
Data
Stores
34. Integrated
Analytics
Platform
for
Diverse
Data
Stores
Full-‐featured,
Integrated
Product
Fast
Insights
for
Everyone
Works
with
What
You
Have
Today
Explore Visualize Dashboard
s
Share
Analyze
Hadoop
Clusters NoSQL
and
Other
Data
Stores
Hadoop Client
Libraries Streaming
Resource
Libraries
Bi-‐directional
Integration
with
Hadoop
36. The
Splunk
Enterprise
Platform
Collection
Indexing
Search
Processing
Language
Core
Functions
Inputs,
Apps,
Other
Content
SDK
Content
Core
Engine
User
and
Developer
Interfaces
Web
Framework
REST
API
37. What’s
Possible
with
the
Splunk
Enterprise
Platform?
Power
Mobile
Apps
Log
Directly
Extract
Data
Customer
Dashboards
Integrate
BI
Tools
Integrate
Platform
Services
Developer Platform
38. Powerful
Platform
for
Enterprise
Developers
REST API
Web Framework
Web
Framework
Ruby
C#
PHP
Data
Models
Search
Extensibility
Modular
Inputs
SDKs
Simple
XML
JavaScript
Django
Developers Can Customize and Extend
39. Splunk
Software
for
Developers
Gain
Application
Intelligence
Build
Splunk
Apps
Integrate
and
Extend
Splunk