How to integrate Splunk with any data solution

Software Developer at Google, Apache Software Foundation
Sep. 14, 2012
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
How to integrate Splunk with any data solution
1 of 26

More Related Content

What's hot

What's new in Mondrian 4?What's new in Mondrian 4?
What's new in Mondrian 4?Julian Hyde
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can helpChristian Tzolov
SQL Now! How Optiq brings the best of SQL to NoSQL data.SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.Julian Hyde
Mondrian update (Pentaho community meetup 2012, Amsterdam)Mondrian update (Pentaho community meetup 2012, Amsterdam)
Mondrian update (Pentaho community meetup 2012, Amsterdam)Julian Hyde
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde

What's hot(20)

Viewers also liked

From Support to Success: How Splunk Evolved Its Success Services to Deliver M...From Support to Success: How Splunk Evolved Its Success Services to Deliver M...
From Support to Success: How Splunk Evolved Its Success Services to Deliver M...Matthew Klassen
Splunk in integration testingSplunk in integration testing
Splunk in integration testingAlbert Witteveen
SplunkLive! Splunk for IT OperationsSplunkLive! Splunk for IT Operations
SplunkLive! Splunk for IT OperationsSplunk
Splunk for NAC in YandexSplunk for NAC in Yandex
Splunk for NAC in YandexTimur Bagirov
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
Driving Efficiency with Splunk Cloud at Gatwick AirportDriving Efficiency with Splunk Cloud at Gatwick Airport
Driving Efficiency with Splunk Cloud at Gatwick AirportSplunk

Similar to How to integrate Splunk with any data solution

Spark Summit EU talk by Michael NitschingerSpark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit
Projeto-web-services-Spring-Boot-JPA.pdfProjeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdfAdrianoSantos888423
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks
Splunk Ninjas: New Features and Search DojoSplunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search DojoSplunk
SplunkLive! Tampa: Splunk Ninjas: New Features, Pivot, and Search Dojo SplunkLive! Tampa: Splunk Ninjas: New Features, Pivot, and Search Dojo
SplunkLive! Tampa: Splunk Ninjas: New Features, Pivot, and Search Dojo Splunk
The Django Book - Chapter 5: ModelsThe Django Book - Chapter 5: Models
The Django Book - Chapter 5: ModelsSharon Chen

Similar to How to integrate Splunk with any data solution(20)

More from Julian Hyde

Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Julian Hyde
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming languageJulian Hyde
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityJulian Hyde

Recently uploaded

Product Research Presentation-Maidy Veloso.pptxProduct Research Presentation-Maidy Veloso.pptx
Product Research Presentation-Maidy Veloso.pptxMaidyVeloso
Mule Meetup Calgary- API Governance & Conformance.pdfMule Meetup Calgary- API Governance & Conformance.pdf
Mule Meetup Calgary- API Governance & Conformance.pdfNithaJoseph4
Safe Community Call #12.pdfSafe Community Call #12.pdf
Safe Community Call #12.pdfLornyPfeifer
Webhook Testing StrategyWebhook Testing Strategy
Webhook Testing StrategyDimpy Adhikary
10 reasons to choose Galaxy Tab S9 for work on the go10 reasons to choose Galaxy Tab S9 for work on the go
10 reasons to choose Galaxy Tab S9 for work on the goSamsung Business USA
Getting your enterprise ready for Microsoft 365 CopilotGetting your enterprise ready for Microsoft 365 Copilot
Getting your enterprise ready for Microsoft 365 CopilotVignesh Ganesan I Microsoft MVP

How to integrate Splunk with any data solution

Editor's Notes

  1. The obligatory “big data” definition slide. What is “big data”? It's not really about “big”. We need to access data from different parts of the organization, when we need it (which often means we don't have time to copy it), and the performance needs to be reasonable. If the data is large, it is often larger than the disks one can fit on one machine. It helps if we can process the data in place, leveraging the CPU and memory of the machines where the data is stored. We'd rather not copy it from one system to another. It needs to be flexible, to deal with diverse systems and formats. That often means that open source is involved. Some systems (e.g. reporting tools) can't easily be changed to accommodate new formats. So it helps if the data can be presented in standard formats, e.g. SQL.
  2. Demo connecting to Splunk via the Optiq driver. We aer using sqlline as the shell (it works with any JDBC driver). Se;ect “source” from “splunk”.”splunk” where “sourcetype=” = 'mysqld-4'; In the generated Java on the screen, Note how sourcetype is pushed down to Splunk.
  3. The wrong way to execute the query is for Splunk to send all of the data to Optiq. Splunk does more work than it needs to, it doesn't use any indexes, the network sends too much data, Optiq does too much work.
  4. The right way to execute the query is to pass the filter down to Splunk. This lets Splunk use its indexes, so it does less work, passes less data over the network, and the query finishes faster. This is just a simple answer, but a lot of problems can be solved by “pushing down” expressions, filters, computation of summaries. Do the work, and reduce the volume of data, as early in the process as possible.
  5. Demo connecting to Splunk via the Optiq driver. We aer using sqlline as the shell (it works with any JDBC driver). Se;ect “source” from “splunk”.”splunk” where “sourcetype=” = 'mysqld-4'; In the generated Java on the screen, Note how sourcetype is pushed down to Splunk.
  6. It's much more efficient if we psuh filters and aggregations to Splunk. But the user writing SQL shouldn't have to worry about that. This is not about processing data. This is about processing expressions. Reformulating the question. The question is the parse tree of a query. The parse tree is a data flow. In Splunk, a data flow looks like a pipeline of Linux commands. SQL systems have pipelines too (sometimes they are dataflow trees) built up of the basic relational operators. Think of the SQL SELECT, WHERE, JOIN, GROUP BY, ORDER BY clauses.
  7. It's much more efficient if we psuh filters and aggregations to Splunk. But the user writing SQL shouldn't have to worry about that. This is not about processing data. This is about processing expressions. Reformulating the question. The question is the parse tree of a query. The parse tree is a data flow. In Splunk, a data flow looks like a pipeline of Linux commands. SQL systems have pipelines too (sometimes they are dataflow trees) built up of the basic relational operators. Think of the SQL SELECT, WHERE, JOIN, GROUP BY, ORDER BY clauses.
  8. To recap. Optiq is not a database. It does as little of the database processing as it can get away with. Ideally, nothing at all. But what is it?
  9. Optiq is not a database... it is more like a telephone exchange. Applications can get the data they need, quickly and efficiently.
  10. Conventional database has ODBC/JDBC driver, SQL parser, . Data sources. Expression tree. Expression transformation rules. Optimizer. For NoSQL databases, the language may not be SQL, and the optimizer may be less sophisticated, but the picture is basically the same. For frameworks, such as Hadoop, there is no planner. You end up writing code (e.g MapReduce jobs).
  11. In Optiq, the query optimizer (we modestly call it the planner) is central. The JDBC driver/server and SQL parser are optional; skip them if you have another language. Plug-ins provide metadata (the schema), planner rules, and runtime operators. There are built-in relational operators and rules, and there are built-in operators implemented in Java. But to access data, you need to provide at least one operator.
  12. It needs to be said. Optiq is not a database. It looks like a database to your applications, and that's great. But when you want to integrate data from multiple sources, in different formats, and have those systems talk to each other, it doesn't force you to copy the data around. It gets out of your way. You configure Optiq by writing Java code. Therefore it is a framework, like Spring and, yes, like Hadoop. Optiq masquerades as a really, really smart JDBC driver. It has a SQL parser and JDBC driver. And actually you can embed it into another data management system, with a language other than SQL.