This document provides an overview of Hunk, an analytics platform for exploring, analyzing, and visualizing data stored in Hadoop. It discusses how Hunk allows users to connect to HDFS and MapReduce, create virtual indexes, use MapReduce as an orchestration framework, and search data in Hadoop. The document also highlights how Hunk provides an easier, more flexible workflow for business users compared to traditional Hadoop approaches.
During the course of this presentation, forward-looking statements were made regarding Splunk's expected performance and legal notices were provided. The presentation discussed using Splunk to analyze large amounts of data stored in Hadoop by moving computation to the data through MapReduce jobs while supporting Splunk Processing Language and maintaining schema on read. Optimization techniques like partition pruning were covered to improve performance as well as best practices, troubleshooting tips, and resources for using Hunk.
The document provides an overview of Hunk, a product from Splunk that allows users to explore, analyze and visualize data stored in Hadoop. Some key points:
- Hunk uses virtual indexes to enable searching of data in Hadoop using Splunk's interface and capabilities without needing to move the data. It handles MapReduce jobs behind the scenes.
- It provides an interactive interface for business users to explore and query data in Hadoop in an easy and flexible way, with the ability to preview results while MapReduce jobs are running.
- Integration with Hadoop is done through Hadoop client libraries, requiring only read access to data stored in HDFS. Hunk supports various Hadoop distributions and operating
Hunk - Unlocking The Power of Big Data Breakout SessionSplunk
This document discusses Splunk's Hunk product and how it allows users to analyze data stored in Hadoop using Splunk. Hunk runs natively in Hadoop using MapReduce, supports mixed mode searching that allows previewing data, and auto-deploys Splunk components to Hadoop data nodes for real-time indexing. It also provides role-based security and supports connecting to data in NoSQL databases and SQL databases through Splunk's DB Connect product.
This document provides an overview of Splunk Hunk, which allows users to run Splunk analytics on data stored in Hadoop. Some key points:
- Hunk uses "virtual indexes" to make data in Hadoop look and feel like Splunk indexes, allowing seamless use of the Splunk interface and search processing language.
- It supports running Splunk searches either through an interactive "streaming" mode or efficient batch "reporting" mode using MapReduce.
- The indexing and search pipelines apply the same field extraction, event breaking, and other processing as standard Splunk to enable flexible searches.
- A processing library allows plugging in custom data preprocessors when ingesting data into Hunk
Explore, Analyze and Visualize Data in Hadoop and NoSQL. Make massive quantities of machine data accessible, usable and valuable for the people who need it, at the speed they need it. Use Hunk to turn underutilized data into valuable insights in minutes, not weeks or months.
Monitoring a Database Driven System Utilizing Splunk's DB ConnectSplunk
This document discusses how Cerner uses Splunk's DB Connect tool to monitor a complex database-driven system that processes over 180 million real-time eligibility transactions annually between healthcare partners and payers. By connecting Splunk to their Oracle database, Cerner is now able to create near real-time dashboards and alerts to more proactively monitor performance, identify issues, and improve processes. This represents a transformation from a previously reactive, manual approach accessible only to technical analysts.
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
Hunk is a Splunk analytics tool that allows users to explore, analyze, and visualize raw big data stored in Hadoop and NoSQL data stores. It can interactively query raw data, accelerate reporting, create charts and dashboards, and archive historical data to HDFS. BlueData's EPIC platform enables running Hunk jobs on Hadoop clusters while accessing data from any storage system, such as HDFS, NFS, Gluster, and others. Hunk supports ingesting large amounts of data and provides pre-packaged analytics functions and intuitive visualization of results.
Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDBMongoDB
This document discusses Splunk Hunk, which enables users to combine time series event data stored in MongoDB with Splunk's data visualization and search capabilities. It provides an overview of Splunk Hunk's components and architecture, describes how to install and configure the MongoDB virtual index app to integrate MongoDB data with Splunk, and demonstrates how to query and analyze MongoDB data using Splunk.
During the course of this presentation, forward-looking statements were made regarding Splunk's expected performance and legal notices were provided. The presentation discussed using Splunk to analyze large amounts of data stored in Hadoop by moving computation to the data through MapReduce jobs while supporting Splunk Processing Language and maintaining schema on read. Optimization techniques like partition pruning were covered to improve performance as well as best practices, troubleshooting tips, and resources for using Hunk.
The document provides an overview of Hunk, a product from Splunk that allows users to explore, analyze and visualize data stored in Hadoop. Some key points:
- Hunk uses virtual indexes to enable searching of data in Hadoop using Splunk's interface and capabilities without needing to move the data. It handles MapReduce jobs behind the scenes.
- It provides an interactive interface for business users to explore and query data in Hadoop in an easy and flexible way, with the ability to preview results while MapReduce jobs are running.
- Integration with Hadoop is done through Hadoop client libraries, requiring only read access to data stored in HDFS. Hunk supports various Hadoop distributions and operating
Hunk - Unlocking The Power of Big Data Breakout SessionSplunk
This document discusses Splunk's Hunk product and how it allows users to analyze data stored in Hadoop using Splunk. Hunk runs natively in Hadoop using MapReduce, supports mixed mode searching that allows previewing data, and auto-deploys Splunk components to Hadoop data nodes for real-time indexing. It also provides role-based security and supports connecting to data in NoSQL databases and SQL databases through Splunk's DB Connect product.
This document provides an overview of Splunk Hunk, which allows users to run Splunk analytics on data stored in Hadoop. Some key points:
- Hunk uses "virtual indexes" to make data in Hadoop look and feel like Splunk indexes, allowing seamless use of the Splunk interface and search processing language.
- It supports running Splunk searches either through an interactive "streaming" mode or efficient batch "reporting" mode using MapReduce.
- The indexing and search pipelines apply the same field extraction, event breaking, and other processing as standard Splunk to enable flexible searches.
- A processing library allows plugging in custom data preprocessors when ingesting data into Hunk
Explore, Analyze and Visualize Data in Hadoop and NoSQL. Make massive quantities of machine data accessible, usable and valuable for the people who need it, at the speed they need it. Use Hunk to turn underutilized data into valuable insights in minutes, not weeks or months.
Monitoring a Database Driven System Utilizing Splunk's DB ConnectSplunk
This document discusses how Cerner uses Splunk's DB Connect tool to monitor a complex database-driven system that processes over 180 million real-time eligibility transactions annually between healthcare partners and payers. By connecting Splunk to their Oracle database, Cerner is now able to create near real-time dashboards and alerts to more proactively monitor performance, identify issues, and improve processes. This represents a transformation from a previously reactive, manual approach accessible only to technical analysts.
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
Hunk is a Splunk analytics tool that allows users to explore, analyze, and visualize raw big data stored in Hadoop and NoSQL data stores. It can interactively query raw data, accelerate reporting, create charts and dashboards, and archive historical data to HDFS. BlueData's EPIC platform enables running Hunk jobs on Hadoop clusters while accessing data from any storage system, such as HDFS, NFS, Gluster, and others. Hunk supports ingesting large amounts of data and provides pre-packaged analytics functions and intuitive visualization of results.
Splunk's Hunk: A Powerful Way to Visualize Your Data Stored in MongoDBMongoDB
This document discusses Splunk Hunk, which enables users to combine time series event data stored in MongoDB with Splunk's data visualization and search capabilities. It provides an overview of Splunk Hunk's components and architecture, describes how to install and configure the MongoDB virtual index app to integrate MongoDB data with Splunk, and demonstrates how to query and analyze MongoDB data using Splunk.
Splunk is an American software company that allows users to search and analyze real-time data across various sources to generate reports, visualizations, alerts and dashboards. Splunk Enterprise is Splunk's main product that makes it easy to collect, analyze and act on big data from different sources for dashboard creation and searches. The key features of Splunk include interactive searching, subsearching, customizable dashboards, searching by field, and saved searches. Case studies show that Splunk improves operational efficiencies by providing real-time data analytics.
This document provides an overview of Splunk, including:
- Splunk's main functionality is real-time log collection, indexing, and analytics of time series data through search queries and data exploration/visualization capabilities.
- Reasons to use Splunk include its proven success in the field, flexible and user-friendly interface, and ability to handle large volumes of data from various sources through infinite scaling.
- Splunk uses a MapReduce-based architecture to index and search large volumes of data across multiple servers.
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersDataWorks Summit
This document discusses enabling exploratory analytics of data in shared-service Hadoop clusters using Hunk. It describes how Hunk allows users to visually browse and analyze data in HDFS through an interactive search interface without needing to understand the data schema. The document provides examples of how Hunk has been used at Yahoo to gain operational insights from Hadoop cluster metrics and optimize performance. It demonstrates how Hunk can create visualizations and dashboards for analyzing jobs, queues, NameNode usage and more.
SplunkSummit 2015 - A Quick Guide to Search OptimizationSplunk
This document provides an overview and tips for optimizing searches in Splunk. It discusses how to scope searches more narrowly through techniques like limiting the time range and including specific indexes, sourcetypes, and fields. This helps reduce the amount of data that needs to be scanned to find search results. The document also recommends using inclusionary search terms rather than exclusionary ones when possible to improve performance. Additional optimization strategies covered include using smarter search modes and defining fields on segmented boundaries.
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Processing by "Sampat Kumar" from "Harman". The presentation was done at #doppa17 DevOps++ Global Summit 2017. All the copyrights are reserved with the author
SplunkLive! Presentation - Data Onboarding with SplunkSplunk
- The data onboarding process involves systematically bringing new data sources into Splunk to make the data instantly usable and valuable for users
- The process includes pre-boarding activities like identifying the data, mapping fields, and building index-time and search-time configurations
- It also involves deploying any necessary infrastructure, deploying the configurations, testing and validating the data, and getting user approval before the process is complete
Splunk Ninjas: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersBrett Sheppard
The document discusses Hunk, a self-service analytics platform for exploring, visualizing, and analyzing data stored in Hadoop clusters and other data stores. Hunk allows users to rapidly interact with data through an interactive search interface and preview results without waiting for full queries to finish. It provides integrated visualization of data through built-in graphs and charts. Hunk deployment is fast, requiring under 60 minutes to connect to Hadoop clusters and begin searching data.
Power of Splunk Search Processing Language (SPL) ...Splunk
This session will unveil the power of the Splunk Search Processing Language (SPL). See how to use Splunk's simple search language for searching and filtering through data, charting statistics and predicting values, converging data sources and grouping transactions, and finally data science and exploration. We'll begin with basic search commands and build up to more powerful advanced tactics to help you harness your Splunk Fu!
Splunk is a scalable software that indexes and searches logs and IT data in real time. It can analyze data from any application, server, or device. Splunk uses a server component and forwarders to collect and index streaming data, and provides a web interface for searching, reporting, monitoring and alerting on the data.
This document provides an overview of data models in Splunk:
- A data model maps raw machine data onto a hierarchical structure to encapsulate domain knowledge and enable non-technical users to interact with data via pivot reports.
- There are three root object types: events, searches, and transactions. Objects have constraints, attributes, and inherit properties from parent objects.
- Data models are built using the UI or REST API. Pivot reports leverage data models by generating optimized search strings from the model.
- Data model acceleration improves performance of pivot reports by pre-computing searches on disk. Only the first event object and descendants are accelerated by default.
Data models provide a hierarchical structure for mapping raw machine data onto conceptual objects and relationships. They encapsulate domain knowledge needed to build searches and reports. Data models allow non-technical users to interact with data via a pivot interface without understanding the underlying data structure or search syntax. When reports are generated from a data model, the search strings are automatically constructed based on the model. Model acceleration can optimize searches by pre-computing search results.
SplunkLive! London: Splunk ninjas- new features and search dojoSplunk
The document discusses new features and enhancements in Splunk 6.4, including improvements to reduce storage costs through TSIDX reduction, enhance platform security and management through features like improved DMC and new SSO options, and new interactive visualizations. It also covers search commands like eval, stats, eventstats, streamstats, and transaction that can solve most data analysis problems, and provides examples of using these commands. Finally, it discusses some tips and tricks for Splunk searches.
This document provides an agenda and overview for a Splunk TechDay event focused on Splunk Ninja skills. The agenda includes refreshers on search language and structure, examples of SPL commands for searching, charting, and exploring data, and custom commands for extending SPL capabilities. The overview sections explain key aspects of SPL like its large command set, syntax based on Unix pipelines and SQL, and uses for data searching, filtering, and manipulation. Examples are provided for various SPL techniques including search/filter, evaluating/modifying fields, statistics, and charting.
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016StampedeCon
This document discusses building a data pipeline using tools from the Apache Hadoop ecosystem. It begins with an introduction to the speaker and why Hadoop is useful for data pipelines. It then provides a matrix comparing the different Hadoop distributions and their included components. It outlines the various tiers of projects in the Hadoop ecosystem and disclaims any completeness. It also presents the typical data lifecycle of capture, enrichment, analysis, presentation, reporting, archival and removal. The document concludes with a reference to demo code and soliciting questions.
SplunkLive! Analytics with Splunk EnterpriseSplunk
Splunk provides analytics capabilities through data models and pivot reporting. Data models encapsulate domain knowledge about data sources and allow non-technical users to interact with and report on data. Pivot provides a query builder interface for creating reports based on data models without using the Splunk search language. Data models define objects that map to events, searches, or groups of events/searches with constraints and attributes. Pivot reports generate optimized search strings from the data model objects.
Splunk Ninjas: New features, pivot, and search dojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
This document outlines an agenda for a Splunk getting started user training workshop. The agenda includes introducing Splunk functionality like search, alerts, dashboards, deployment and integration. It also covers installing Splunk, indexing data, search basics, field extraction, saved searches, alerting and reporting dashboards. The workshop aims to help users get started with the core Splunk features.
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
"Real World Use Cases: Hadoop and NoSQL in Production" by Tugdual Grall.
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
(1) The document discusses challenges of managing large and complex datasets for interdisciplinary research projects. It presents Hadoop and the Etosha data catalog as solutions.
(2) Etosha aims to publish and link metadata about datasets to enable discovery and sharing across distributed research clusters. It focuses on descriptive, structural and administrative metadata rather than just technical metadata.
(3) Etosha's architecture includes a distributed metadata service and context browser that can query metadata from different Hadoop clusters to support federated querying and subquery delegation.
SplunkSummit 2015 - Real World Big Data ArchitectureSplunk
This document discusses big data architectures using Splunk, Hadoop, and relational databases. It begins with an overview of Splunk's scalability and real-time analytics capabilities. It then discusses Hunk, an analytics platform for Hadoop that provides self-service analytics. The document also examines using structured data in Splunk and connecting to relational databases. A case study examines challenges with the open source Hadoop ecosystem. Finally, it outlines a real-world customer architecture that uses Splunk for machine data, Hadoop for storage, Hunk for analytics, and connects to relational databases.
What is Splunk? At the end of this session you’ll have a high-level understanding of the pieces that make up the Splunk Platform, how it works, and how it fits in the landscape of Big Data. You’ll see practical examples that differentiate Splunk while demonstrating how to gain quick time to value.
Splunk is an American software company that allows users to search and analyze real-time data across various sources to generate reports, visualizations, alerts and dashboards. Splunk Enterprise is Splunk's main product that makes it easy to collect, analyze and act on big data from different sources for dashboard creation and searches. The key features of Splunk include interactive searching, subsearching, customizable dashboards, searching by field, and saved searches. Case studies show that Splunk improves operational efficiencies by providing real-time data analytics.
This document provides an overview of Splunk, including:
- Splunk's main functionality is real-time log collection, indexing, and analytics of time series data through search queries and data exploration/visualization capabilities.
- Reasons to use Splunk include its proven success in the field, flexible and user-friendly interface, and ability to handle large volumes of data from various sources through infinite scaling.
- Splunk uses a MapReduce-based architecture to index and search large volumes of data across multiple servers.
Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersDataWorks Summit
This document discusses enabling exploratory analytics of data in shared-service Hadoop clusters using Hunk. It describes how Hunk allows users to visually browse and analyze data in HDFS through an interactive search interface without needing to understand the data schema. The document provides examples of how Hunk has been used at Yahoo to gain operational insights from Hadoop cluster metrics and optimize performance. It demonstrates how Hunk can create visualizations and dashboards for analyzing jobs, queues, NameNode usage and more.
SplunkSummit 2015 - A Quick Guide to Search OptimizationSplunk
This document provides an overview and tips for optimizing searches in Splunk. It discusses how to scope searches more narrowly through techniques like limiting the time range and including specific indexes, sourcetypes, and fields. This helps reduce the amount of data that needs to be scanned to find search results. The document also recommends using inclusionary search terms rather than exclusionary ones when possible to improve performance. Additional optimization strategies covered include using smarter search modes and defining fields on segmented boundaries.
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Processing by "Sampat Kumar" from "Harman". The presentation was done at #doppa17 DevOps++ Global Summit 2017. All the copyrights are reserved with the author
SplunkLive! Presentation - Data Onboarding with SplunkSplunk
- The data onboarding process involves systematically bringing new data sources into Splunk to make the data instantly usable and valuable for users
- The process includes pre-boarding activities like identifying the data, mapping fields, and building index-time and search-time configurations
- It also involves deploying any necessary infrastructure, deploying the configurations, testing and validating the data, and getting user approval before the process is complete
Splunk Ninjas: New Features, Pivot and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersBrett Sheppard
The document discusses Hunk, a self-service analytics platform for exploring, visualizing, and analyzing data stored in Hadoop clusters and other data stores. Hunk allows users to rapidly interact with data through an interactive search interface and preview results without waiting for full queries to finish. It provides integrated visualization of data through built-in graphs and charts. Hunk deployment is fast, requiring under 60 minutes to connect to Hadoop clusters and begin searching data.
Power of Splunk Search Processing Language (SPL) ...Splunk
This session will unveil the power of the Splunk Search Processing Language (SPL). See how to use Splunk's simple search language for searching and filtering through data, charting statistics and predicting values, converging data sources and grouping transactions, and finally data science and exploration. We'll begin with basic search commands and build up to more powerful advanced tactics to help you harness your Splunk Fu!
Splunk is a scalable software that indexes and searches logs and IT data in real time. It can analyze data from any application, server, or device. Splunk uses a server component and forwarders to collect and index streaming data, and provides a web interface for searching, reporting, monitoring and alerting on the data.
This document provides an overview of data models in Splunk:
- A data model maps raw machine data onto a hierarchical structure to encapsulate domain knowledge and enable non-technical users to interact with data via pivot reports.
- There are three root object types: events, searches, and transactions. Objects have constraints, attributes, and inherit properties from parent objects.
- Data models are built using the UI or REST API. Pivot reports leverage data models by generating optimized search strings from the model.
- Data model acceleration improves performance of pivot reports by pre-computing searches on disk. Only the first event object and descendants are accelerated by default.
Data models provide a hierarchical structure for mapping raw machine data onto conceptual objects and relationships. They encapsulate domain knowledge needed to build searches and reports. Data models allow non-technical users to interact with data via a pivot interface without understanding the underlying data structure or search syntax. When reports are generated from a data model, the search strings are automatically constructed based on the model. Model acceleration can optimize searches by pre-computing search results.
SplunkLive! London: Splunk ninjas- new features and search dojoSplunk
The document discusses new features and enhancements in Splunk 6.4, including improvements to reduce storage costs through TSIDX reduction, enhance platform security and management through features like improved DMC and new SSO options, and new interactive visualizations. It also covers search commands like eval, stats, eventstats, streamstats, and transaction that can solve most data analysis problems, and provides examples of using these commands. Finally, it discusses some tips and tricks for Splunk searches.
This document provides an agenda and overview for a Splunk TechDay event focused on Splunk Ninja skills. The agenda includes refreshers on search language and structure, examples of SPL commands for searching, charting, and exploring data, and custom commands for extending SPL capabilities. The overview sections explain key aspects of SPL like its large command set, syntax based on Unix pipelines and SQL, and uses for data searching, filtering, and manipulation. Examples are provided for various SPL techniques including search/filter, evaluating/modifying fields, statistics, and charting.
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016StampedeCon
This document discusses building a data pipeline using tools from the Apache Hadoop ecosystem. It begins with an introduction to the speaker and why Hadoop is useful for data pipelines. It then provides a matrix comparing the different Hadoop distributions and their included components. It outlines the various tiers of projects in the Hadoop ecosystem and disclaims any completeness. It also presents the typical data lifecycle of capture, enrichment, analysis, presentation, reporting, archival and removal. The document concludes with a reference to demo code and soliciting questions.
SplunkLive! Analytics with Splunk EnterpriseSplunk
Splunk provides analytics capabilities through data models and pivot reporting. Data models encapsulate domain knowledge about data sources and allow non-technical users to interact with and report on data. Pivot provides a query builder interface for creating reports based on data models without using the Splunk search language. Data models define objects that map to events, searches, or groups of events/searches with constraints and attributes. Pivot reports generate optimized search strings from the data model objects.
Splunk Ninjas: New features, pivot, and search dojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
This document outlines an agenda for a Splunk getting started user training workshop. The agenda includes introducing Splunk functionality like search, alerts, dashboards, deployment and integration. It also covers installing Splunk, indexing data, search basics, field extraction, saved searches, alerting and reporting dashboards. The workshop aims to help users get started with the core Splunk features.
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
"Real World Use Cases: Hadoop and NoSQL in Production" by Tugdual Grall.
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
(1) The document discusses challenges of managing large and complex datasets for interdisciplinary research projects. It presents Hadoop and the Etosha data catalog as solutions.
(2) Etosha aims to publish and link metadata about datasets to enable discovery and sharing across distributed research clusters. It focuses on descriptive, structural and administrative metadata rather than just technical metadata.
(3) Etosha's architecture includes a distributed metadata service and context browser that can query metadata from different Hadoop clusters to support federated querying and subquery delegation.
SplunkSummit 2015 - Real World Big Data ArchitectureSplunk
This document discusses big data architectures using Splunk, Hadoop, and relational databases. It begins with an overview of Splunk's scalability and real-time analytics capabilities. It then discusses Hunk, an analytics platform for Hadoop that provides self-service analytics. The document also examines using structured data in Splunk and connecting to relational databases. A case study examines challenges with the open source Hadoop ecosystem. Finally, it outlines a real-world customer architecture that uses Splunk for machine data, Hadoop for storage, Hunk for analytics, and connects to relational databases.
What is Splunk? At the end of this session you’ll have a high-level understanding of the pieces that make up the Splunk Platform, how it works, and how it fits in the landscape of Big Data. You’ll see practical examples that differentiate Splunk while demonstrating how to gain quick time to value.
Getting Started with Splunk Breakout SessionSplunk
This presentation provides an overview of Splunk Enterprise for getting started. It discusses how Splunk fits into the big data landscape, highlighting its capabilities for real-time indexing of machine data from various sources. Key differentiators of Splunk like role-based access control and centralized access management are covered. The presentation demonstrates Splunk's components for data collection, indexing, and presentation and provides a demo of basic search functionality. Resources for learning more about Splunk like documentation, books, and the Splunk community are also mentioned.
Splunk Announces Beta Version of Hunk: Splunk Analytics for Hadoop
New Software Product to Explore, Analyze and Visualize Data in Hadoop
HADOOP SUMMIT NORTH AMERICA 2013, SAN JOSE – June 26, 2013 - Splunk Inc. (NASDAQ: SPLK), the leading software platform for real-time operational intelligence, today announced the beta version of Hunk: Splunk® Analytics for Hadoop. Hunk (beta) is a new software product from Splunk that integrates exploration, analysis and visualization of data in Hadoop. Building upon Splunk’s years of experience with big data analytics technology deployed at thousands of customers, Hunk drives dramatic improvements in the speed and simplicity of interacting with and analyzing data in Hadoop without programming, costly integrations or forced data migrations. Watch the Hunk video to learn more.
This summary provides an overview of a presentation about Splunk:
1. The presentation introduces Splunk, an enterprise software platform that allows users to search, monitor, and analyze machine-generated big data for security, IT and business operations.
2. Key components of Splunk include universal forwarders for data collection, indexers for data storage and search heads for data visualization. Splunk supports data ingestion from various sources like servers, databases, applications and sensors.
3. A demo section shows how to install Splunk, ingest sample data, perform searches, set up alerts and reports. It also covers dynamic field extraction, the search command language and Splunk applications.
Here’s your chance to get hands-on with Splunk for the first time! Bring your modern Mac, Windows, or Linux laptop and we’ll go through a simple install of Splunk. Then, we’ll load some sample data, and see Splunk in action – we’ll cover searching, pivot, reporting, alerting, and dashboard creation. At the end of this session you’ll have a hands-on understanding of the pieces that make up the Splunk Platform, how it works, and how it fits in the landscape of Big Data. You’ll experience practical examples that differentiate Splunk while demonstrating how to gain quick time to value.
Quelles nouveautés avec la version 6.5 de Splunk EnterpriseSplunk
Guided ML
Splunk Enterprise
Splunk Cloud
Splunk Light
Splunk Analytics for Hadoop
Splunk User Behavior Analytics
Splunk IT Service Intelligence
Splunk Security Essentials
Splunk App for AWS
Splunk App for Cisco
Splunk App for VMware
Splunk App for Microsoft
Splunk App for PCI
Splunk App for ServiceNow
Splunk App for SAP
Splunk App for Oracle
Splunk App for Salesforce
Splunk App for Workday
Splunk App for Marketo
Splunk App for ServiceNow
Splunk App for Marketo
Spl
Intel IT empowers business units to easily make rapid, impactful business decisions. Ingesting a variety of internal/external data sources has challenges. This slideset covers how Intel IT overcame the issues with Hadoop and Gobblin. Learn more at http://www.intel.com/itcenter
This presentation was given at Integration Developer News Big Data in Technology Summit on July 23 2015. See how Pepperdata's unique patented technology helps organizations gain up to 50% more throughput in their clusters. We are the only technology that can help organizations GUARANTEE SLAs in Hadoop production environments.
This document summarizes a presentation by Pepperdata about improving Hadoop performance. It discusses challenges companies face using Hadoop, such as wasted capacity and inability to prioritize jobs. Pepperdata provides fine-grained visibility into resource usage, total predictability through SLA enforcement, and 30-50% greater throughput by reclaiming wasted capacity. It allows mixed workloads on a single cluster through dynamic resource allocation based on user-defined policies. Q&A is provided where Pepperdata's CEO discusses how it installs agents, helps with mixed workloads, and is different than YARN. More information links are also included.
Level Up – How to Achieve Hadoop AccelerationInside Analysis
The Briefing Room with Robin Bloor and HP Vertica
Live Webcast on August 26, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=3dd6d1b068fe395f665c75adb682ac41
Hadoop has long passed the point of being a nascent technology, but many users have found that when left to its own devices, Hadoop can be a one trick pony. To get the most out of Hadoop, organizations need a flexible platform that empowers analysts and data managers with a complete set of information lifecycle management and analytics tools without a performance tradeoff.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he outlines Hadoop’s role in a big data architecture. He’ll be briefed by Walt Maguire of HP Vertica, who will showcase his company’s big data solutions, including HAVEn and the HP Big Data Platform. He will demonstrate how HP Vertica acts as a complement to Hadoop, and how the combination of the two provides a versatile and highly performant solution.
Visit InsideAnlaysis.com for more information.
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
What if your organization could obtain a 360 degree view of the customer across offline, online and social and mobile channels? Attend this webinar with Splunk and Hortonworks and see examples of how marketing, business and operations analysts can reach across disparate data sets in Hadoop to spot new opportunities for up-sell and cross-sell. We'll also cover examples of how to measure buyer sentiment and changes in buyer behavior. Along with best practices on how to use data in Hadoop with Splunk to assign customer influence scores that online, call-center, and retail branches can use to customize more compelling products and promotions.
SplunkLive! Analytics with Splunk Enterprise - Part 1Splunk
This document discusses analytics using Splunk Enterprise software. It provides an overview and context for Splunk analytics capabilities including search, data modeling, pivot reporting, and the analytics store. The agenda outlines discussing the big picture of analytics, examples of operational intelligence across the enterprise, data models, and a question and answer session. Legal notices are also included, discussing forward-looking statements, roadmap information, and trademarks.
This document summarizes a presentation about using Hadoop as an analytic platform. It discusses how Actian has added seven key ingredients to Hadoop to unlock its full potential for analytics. These include high-speed data integration, a visual framework for data science and modeling, open-source analytic operators, high-performance data processing engines, vector-based SQL processing natively on HDFS, an extremely fast parallel analytics engine, and a next-generation big data analytics platform. The goal is to transform Hadoop from merely a data reservoir to a fully-featured analytics platform.
This document discusses Oracle Data Integration solutions for tapping into big data reservoirs. It begins with an overview of Oracle Data Integration and how it can improve agility, reduce risk and costs. It then discusses Oracle's approach to comprehensive data integration and governance capabilities including real-time data movement, data transformation, data federation, and more. The document also provides examples of how Oracle Data Integration has been used by customers for big data use cases involving petabytes of data.
The document discusses how Staples uses Splunk for operational support, application insights, and business intelligence across their infrastructure. Staples relies on Splunk for real-time visibility into the health of their Advantage website and business/operational analytics. Splunk provides comprehensive insights into Staples' infrastructure and helps map application performance to user experience. It has saved Staples numerous times by quickly detecting issues. Adoption of Splunk at Staples has grown organically as more teams see its benefits.
ASUG83511 - Accelerate Digital Transformation at General Mills.pdfSreeGe1
General Mills and SAP presented on using SAP Data Hub to accelerate General Mills' digital transformation. The presentation provided General Mills' data integration journey and challenges with their existing enterprise data warehouse. It outlined the key capabilities desired in a data integration solution, including real-time replication, federated analytics, data governance, and data science capabilities. Finally, it reviewed SAP Data Hub's capabilities and General Mills' proof of concept, and discussed SAP's roadmap to address additional requirements.
An overview of Splunk Enterprise 6.3. Presented by Splunk's Jim Viegas at GTRI's Splunk Tech Day, December 8, 2015.
Visit http://www.gtri.com/ for more information.
The document discusses an upcoming webinar on Big Data and SQL. It provides details on the webinar topics, speakers, and the HP Vertica Analytics Platform. The webinar will explore how HP Vertica allows users to navigate and analyze data stored in Hadoop using SQL, avoiding complex ETL processes. It will also discuss how the platform handles both query and analytical workloads and enables exploration of semi-structured data through its Flex Zone.
Splunk Webinar: Verwandeln Sie Daten in wertvolle Erkenntnisse - Machine Lear...Georg Knon
This document provides an overview of machine learning presented at a Splunk webinar. It begins with disclaimers about forward-looking statements and product roadmaps. It then discusses why machine learning is needed to make use of both historical and real-time data. The rest of the document covers the basics of machine learning, including the main types (supervised, unsupervised, reinforcement learning) and algorithms. Example use cases for machine learning in IT operations, security, and business analytics are presented. The document concludes with information about Splunk's Machine Learning Toolkit and links to resources.
Splunk Webinar: Mit Splunk SPL Maschinendaten durchsuchen, transformieren und...Georg Knon
This document provides examples of SPL commands for searching, filtering, modifying, visualizing, and exploring data in Splunk. It discusses commands for searching and filtering data, modifying or creating new fields, calculating statistics and charting them over time, converging different data sources, identifying transactions and anomalies, and exploring data relationships. Examples are provided for commands like eval, stats, timechart, lookup, appendcols, transaction, anomalydetection, cluster, correlate, and others.
SplunkLive! Zürich 2016 - Use Case SwisscomGeorg Knon
Swisscom uses Splunk to gain operational intelligence and visibility into its cloud infrastructure and services. Splunk aggregates data from various systems to provide monitoring, troubleshooting, and license management across Swisscom's complex cloud environment. This centralization with Splunk improves customer experience by enabling faster issue resolution. Going forward, Swisscom aims to leverage Splunk further for predictive analytics and make more operational data accessible to the wider business.
Splunk Webinar: Splunk für Application ManagementGeorg Knon
The document discusses how Splunk can be used for application management. It begins with an introduction of the speaker and agenda. It then discusses challenges in application management like availability, response time, planning capacity and reducing mean time to repair. It shows how traditionally there were infrastructure and application silos with low visibility. With Splunk, it provides a platform to index and analyze data across the technology stack. Splunk can complement application performance monitoring for complete visibility. It then demonstrates Splunk and discusses trying Splunk for free.
Splunk Webinar: IT Operations Demo für Troubleshooting & DashboardingGeorg Knon
This document provides an overview of Splunk's IT operations software. It discusses the challenges facing IT operations, including siloed tools and reactive problem solving. It presents Splunk as a solution, with its ability to index and analyze machine data from any source in real-time. Key benefits highlighted include faster troubleshooting to reduce downtime, proactive monitoring to address issues before they become problems, and increased operational visibility across the IT environment. The document concludes with a demonstration of Splunk's IT service intelligence capabilities.
Splunk for IT Operations Breakout SessionGeorg Knon
This document discusses how IT complexity is a challenge for CIOs due to siloed technologies, disconnected point solutions, and time spent maintaining rather than innovating. It presents Splunk as a solution that provides comprehensive visibility across infrastructure, applications, databases, and more through centralized data collection and analysis. Splunk reduces problem resolution time by 67% and escalations by 90% by enabling "first responders" to search across all IT data from a single interface. The document also outlines how Splunk apps can provide insights by role and technology and its capabilities for various IT functions like virtualization, storage, and operating systems.
Getting started with Splunk - Break out SessionGeorg Knon
This document provides an overview and getting started guide for Splunk. It discusses what Splunk is for exploring machine data, how to install and start Splunk, add sample data, perform basic searches, create saved searches, alerts and dashboards. It also covers deployment and integration topics like scaling Splunk, distributing searches across data centers, forwarding data to Splunk, and enriching data with lookups. The document recommends resources like the Splunk community for further support.
Webinar Big Data zur Echtzeit-Betrugserkennung im eBanking nutzen mit Splunk ...Georg Knon
In diesem Webinar zeigen wir Ihnen, wie Fraud Detection in diesem Umfeld funktioniert:
- Echtzeit-Überwachungsservice
- Neue Einblicke in die Geschäftstätigkeit
- Offene Schnittstelle für interne und externe Systeme
- Automatisierte Reaktion auf Unregelmässigkeiten
- Verdächtige IP Adressen können blockiert werden
- Betroffene Transaktionen umgehend stornieren
- Betroffene Konten sowie Transaktionen können gesperrt und der Endkunde über den Vorfall informiert werden
Splunk Webinar: Verwandeln Sie Datensilos in Operational IntelligenceGeorg Knon
This document provides an overview and agenda for a Splunk presentation on operational intelligence. It introduces Matthias Maier and Rene Siekermann as today's speakers and includes a safe harbor statement. The agenda covers an overview of operational intelligence, a live demo, use case, and roadmap. It also provides a company overview of Splunk including its products, customers, and ability to collect and analyze machine data from various sources to provide insights.
5 Möglichkeiten zur Verbesserung Ihrer SecurityGeorg Knon
Splunk Enterprise Security can improve organizations' security posture in 5 ways:
1. Detect external, advanced threats by finding abnormal access to sensitive data or signs of data exfiltration.
2. Detect insider threats by monitoring for terminated employee accounts being used or active employee accounts when those employees are on vacation.
3. Use free, external threat intelligence from sources like Emerging Threats and SANS, integrating threat indicators like bad IP addresses.
4. Accelerate incident investigations using Splunk's incident review framework, investigation timeline and journaling capabilities.
5. Perform advanced analytics and visualizations to detect anomalies through correlation of disparate security data sources.
The document provides an overview of Splunk IT Service Intelligence (ITSI). Some key points:
- ITSI makes Splunk "service-aware" and provides insights into IT services to help accelerate customers' path to operational intelligence.
- ITSI provides search-based KPIs, full-fidelity service health monitoring, and leverages Splunk's universal data platform to provide a data-driven approach.
- Core concepts in ITSI include services, KPIs, health scores, service analyzers for monitoring services, glass tables dashboards, and deep dives for investigation.
- Notable events are also generated by correlation searches to indicate service degradation.
Data models pivot with splunk break out sessionGeorg Knon
Here are the key points about data model acceleration in Splunk:
- Data model acceleration optimizes searches that use data models by pre-processing constraints and attribute definitions at search time. This can significantly improve search performance.
- Acceleration only applies to the first "event" object in the data model tree and its descendant objects. Searches against other object types like "search" or "transaction" do not benefit from acceleration.
- The more filtering/extraction done in the data model objects, the more acceleration can improve performance by reducing the number of events earlier in the search pipeline. Simply defining fields may not yield huge gains.
- Acceleration is most helpful for reports that run the same search repeatedly, like scheduled
Splunk IT Service Intelligence is a solution that provides end-to-end service visibility, reduces time to problem resolution, and allows for proactive management of IT health. It introduces a data-centric approach to service monitoring and analytics built on the Splunk platform. Key benefits include unified data insights across IT silos, easy access to actionable troubleshooting information through dynamic service models and customizable visualizations, and early warning on deviations through correlated KPI monitoring.
Splunk Internet of Things Roundtable 2015Georg Knon
This document contains an agenda and presentation materials for an Internet of Things Day event by Splunk. The presentation provides an overview of Splunk as a company, its machine data platform for collecting and analyzing data from IoT devices, and use cases from customers across various industries utilizing Splunk for IoT applications. Examples include using machine data from manufacturing equipment to optimize energy usage and enable predictive maintenance, and aggregating data from vending machines for diagnostics and insights into customer behavior.
Webinar splunk cloud saa s plattform für operational intelligenceGeorg Knon
This document discusses Splunk Cloud, a platform for collecting, analyzing, and visualizing machine data from any source. Some key points:
- Splunk Cloud can handle any amount and type of machine data from various online services, applications, devices, and systems, regardless of location.
- It offers universal indexing without needing to filter or schema data beforehand.
- The cloud portfolio includes apps for AWS, ServiceNow, and Salesforce, as well as deploying Splunk Enterprise as a service and analyzing data stored in cloud services.
- Splunk Cloud provides instant access, security, reliability with 100% uptime, and hybrid capabilities to search data across on-premises, private cloud and
Splunk App for Stream - Einblicke in Ihren NetzwerkverkehrGeorg Knon
The document discusses the Splunk App for Stream, which enables real-time insights into private, public and hybrid cloud infrastructures by capturing and analyzing critical events from wire data not found in logs or with other collection methods. It provides an overview of the app, what's new, important features, architecture and deployment, customer success examples, and FAQs.
Webinar: Vulnerability Management leicht gemacht – mit Splunk und QualysGeorg Knon
This document discusses how Splunk and Qualys can be used together for vulnerability management. It provides an overview of Splunk and how it is used across IT and business operations, including for security use cases. It then discusses Qualys' vulnerability management and security solutions. The remainder consists of an agenda, demos of Qualys data in Splunk, and benefits of correlating Qualys and Splunk data for improved security posture monitoring and risk visibility.
[To download this presentation, visit:
https://www.oeconsulting.com.sg/training-presentations]
This presentation is a curated compilation of PowerPoint diagrams and templates designed to illustrate 20 different digital transformation frameworks and models. These frameworks are based on recent industry trends and best practices, ensuring that the content remains relevant and up-to-date.
Key highlights include Microsoft's Digital Transformation Framework, which focuses on driving innovation and efficiency, and McKinsey's Ten Guiding Principles, which provide strategic insights for successful digital transformation. Additionally, Forrester's framework emphasizes enhancing customer experiences and modernizing IT infrastructure, while IDC's MaturityScape helps assess and develop organizational digital maturity. MIT's framework explores cutting-edge strategies for achieving digital success.
These materials are perfect for enhancing your business or classroom presentations, offering visual aids to supplement your insights. Please note that while comprehensive, these slides are intended as supplementary resources and may not be complete for standalone instructional purposes.
Frameworks/Models included:
Microsoft’s Digital Transformation Framework
McKinsey’s Ten Guiding Principles of Digital Transformation
Forrester’s Digital Transformation Framework
IDC’s Digital Transformation MaturityScape
MIT’s Digital Transformation Framework
Gartner’s Digital Transformation Framework
Accenture’s Digital Strategy & Enterprise Frameworks
Deloitte’s Digital Industrial Transformation Framework
Capgemini’s Digital Transformation Framework
PwC’s Digital Transformation Framework
Cisco’s Digital Transformation Framework
Cognizant’s Digital Transformation Framework
DXC Technology’s Digital Transformation Framework
The BCG Strategy Palette
McKinsey’s Digital Transformation Framework
Digital Transformation Compass
Four Levels of Digital Maturity
Design Thinking Framework
Business Model Canvas
Customer Journey Map
How MJ Global Leads the Packaging Industry.pdfMJ Global
MJ Global's success in staying ahead of the curve in the packaging industry is a testament to its dedication to innovation, sustainability, and customer-centricity. By embracing technological advancements, leading in eco-friendly solutions, collaborating with industry leaders, and adapting to evolving consumer preferences, MJ Global continues to set new standards in the packaging sector.
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...my Pandit
Explore the fascinating world of the Gemini Zodiac Sign. Discover the unique personality traits, key dates, and horoscope insights of Gemini individuals. Learn how their sociable, communicative nature and boundless curiosity make them the dynamic explorers of the zodiac. Dive into the duality of the Gemini sign and understand their intellectual and adventurous spirit.
Top 10 Free Accounting and Bookkeeping Apps for Small BusinessesYourLegal Accounting
Maintaining a proper record of your money is important for any business whether it is small or large. It helps you stay one step ahead in the financial race and be aware of your earnings and any tax obligations.
However, managing finances without an entire accounting staff can be challenging for small businesses.
Accounting apps can help with that! They resemble your private money manager.
They organize all of your transactions automatically as soon as you link them to your corporate bank account. Additionally, they are compatible with your phone, allowing you to monitor your finances from anywhere. Cool, right?
Thus, we’ll be looking at several fantastic accounting apps in this blog that will help you develop your business and save time.
Starting a business is like embarking on an unpredictable adventure. It’s a journey filled with highs and lows, victories and defeats. But what if I told you that those setbacks and failures could be the very stepping stones that lead you to fortune? Let’s explore how resilience, adaptability, and strategic thinking can transform adversity into opportunity.
Profiles of Iconic Fashion Personalities.pdfTTop Threads
The fashion industry is dynamic and ever-changing, continuously sculpted by trailblazing visionaries who challenge norms and redefine beauty. This document delves into the profiles of some of the most iconic fashion personalities whose impact has left a lasting impression on the industry. From timeless designers to modern-day influencers, each individual has uniquely woven their thread into the rich fabric of fashion history, contributing to its ongoing evolution.
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...APCO
The Radar reflects input from APCO’s teams located around the world. It distils a host of interconnected events and trends into insights to inform operational and strategic decisions. Issues covered in this edition include:
Navigating the world of forex trading can be challenging, especially for beginners. To help you make an informed decision, we have comprehensively compared the best forex brokers in India for 2024. This article, reviewed by Top Forex Brokers Review, will cover featured award winners, the best forex brokers, featured offers, the best copy trading platforms, the best forex brokers for beginners, the best MetaTrader brokers, and recently updated reviews. We will focus on FP Markets, Black Bull, EightCap, IC Markets, and Octa.
Top mailing list providers in the USA.pptxJeremyPeirce1
Discover the top mailing list providers in the USA, offering targeted lists, segmentation, and analytics to optimize your marketing campaigns and drive engagement.
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....Lacey Max
“After being the most listed dog breed in the United States for 31
years in a row, the Labrador Retriever has dropped to second place
in the American Kennel Club's annual survey of the country's most
popular canines. The French Bulldog is the new top dog in the
United States as of 2022. The stylish puppy has ascended the
rankings in rapid time despite having health concerns and limited
color choices.”
3. Agenda
1. What is Hunk?
2. Powerful Developer Platform
3. Preparation
4. Connect Hunk to HDFS and MapReduce
5. Create Virtual Indexes
6. MapReduce as the Orchestration Framework
7. Search Data in Hadoop
8. Flexible, Iterative Workflow for Business Users
3
4. Explore, Analyze, Visualize Data in Hadoop
No fixed schema to search unstructured data
Preview results while MapReduce jobs start
Easier app development than in raw Hadoop
4
Unlock business value of data in Hadoop
Fast to learn instead of scarce skills
Integrated – explore, analyze and visualize
6. Connect to HDFS and MapReduce
6
Connect to Apache HDFS and MapReduce
or your choice of Hadoop distribution
Hadoop Cluster 1
7. Extract to
in-memory store
Unmet Needs for Hadoop Analytics
8
• Scarce skill sets to hire
• Need to know MapReduce
• Wait for slow jobs to finish
• No results preview
• No built-in visualization
• No granular authentication
• Slow time to value
• Pre-defined fixed schema
• Need knowledge of data
• Miss data that “doesn’t fit”
• No results preview
• No built-in visualization
• Scarce skill sets to hire
• Slow time to value
• Data too big to move
• Limited drill down to raw data
• No results preview
• Another data mart
• Expensive hardware
“Do it yourself”
Hadoop / Pig
Problems
OPTION 1
Hive or SQL on
Hadoop
Problems
OPTION 2
Problems
OPTION 3
8. Hadoop in Real life
Using HunkMap Reduce Job for Hadoop
il.public class WordCount extends Configured implements Tool {
public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, IntWritable> {static enum Counters { INPUT_WORDS }
• private Text word = new Text();
• private boolean caseSensitive = true;
• private Set<String> patternsToSkip = new HashSet<String>();
• private long numRecords = 0;
• private String inputFile;
• public void configure(JobConf job) {
• caseSensitive = job.getBoolean("wordcount.case.sensitive", true);
• inputFile = job.get("map.input.file");
• if (job.getBoolean("wordcount.skip.patterns", false)) {
• Path[] patternsFiles = new Path[0];
• try {
• patternsFiles = DistributedCache.getLocalCacheFiles(job);
• } catch (IOException ioe) {
• System.err.println("Caught exception while getting cached files: " +
StringUtils.stringifyException(ioe));
• }
• for (Path patternsFile : patternsFiles) {
• parseSkipFile(patternsFile);
• }
• }
• Index=Hadoop
• |wc usestopwords=f
• |stats sum(count) by word
9. Integrated Analytics Platform for Hadoop Data
10
10
Full-featured,
Integrated
Product
Insights for
Everyone
Works with
What You
Have Today
Explore Visualize Dashboards Share
Hadoop
(MapReduce
& HDFS)
Analyze
10. What Hunk Does Not Do
Hunk does not replace your Hadoop distribution
Hunk does not replace or require Splunk Enterprise
Interactive but not real time
No data ingest management (that’s Flume or Sqoop)
No Hadoop operations management
11
1.
2.
3.
4.
5.
11. Product Portfolio
12
Real-time
indexing
Real-time
search
Splunk Apps
Vibrant and passionate developer community
IT
Ops.
Security &
Compliance
Web
Intelli-
gence
App Dev
&
App
Mgmt.
Business
Analytics
Splunk Hadoop Connect
DB Connect
Ad hoc analytics of
historical data in Hadoop
Developers building big data apps on top of Hadoop
3600
Customer
View
Complete
Security
Analytics
Product and
Service
Analytics
12. Powerful Developer Platform with Familiar Tools
13
JavaScript Java Python PHP C# Ruby
API
Add New
UI components
Integrate into
Existing Systems
With Known
Languages
and Frameworks
14. MapReduce as the Orchestration Framework
15
1. Copy splunkd
binary
HDFS.tgz
TaskTracker 1 TaskTracker 2
.tgz
2. Copy
3. Expand in specified location on each TaskTracker
TaskTracker 3
.tgz
4. Receive binary in
subsequent searches
Hunk
Search Head >
15. Data Processing Pipeline
17 17
Raw data
(HDFS)
Custom
processing
Indexing
pipeline
Search
pipeline
You can plug in
data preprocessors
e.g. Apache Avro or
format readers
MapReduce/Java
stdin
Event breaking
Timestamping
Event typing
Lookups
Tagging
Search processors
splunkd/C++
16. Hunk applies schema for all fields – including transactions – at search time
Hunk Applies Schema on the Fly
18
• Structure applied at
search time
• No brittle schema to
work around
• Automatically find
patterns and trends
17. Mixed-mode Search
ReportingStreaming
• Transfers first several blocks from
HDFS to the Hunk Search Head
for immediate processing
• Pushes computation to the
DataNodes and TaskTrackers for
the complete search
20
• Hunk starts the streaming and reporting modes concurrently
• Streaming results show until the reporting results come in
• Allows users to search interactively by pausing and refining queries
18. Flexible, Iterative Workflow for Business Users
22
Explore
Analyze
Model
Pivot
Visualize
Share
Interactive Analytics
• Preview results
• Normalization as it’s
needed
• Faster implementation
and flexibility
• Easy search language +
data models & pivot
• Multiple views into the
same data
This session is designed for audiences who have seen an introduction to Hunk and would like a more comprehensive understanding of how Hunk works. I’ll cover each of these eight topics.
Hunk is a new product for organizations deploying Hadoop and is priced and packaged separately from Splunk Enterprise. A Splunk Enterprise license is not required to run Hunk. Hunk is the integrated analytics platform for data in Hadoop. Supports business use cases to unlock value of data stored in Hadoop– Data analytics to launch and optimize products and services – Synthesis of data from all customer touch points– Comprehensive security analytics for modern threats– Easier app development than in raw Hadoop, with tools and frameworks that developers already know Easy to use for any business or IT user– Versus scarce skills to manually write MapReduce jobs or define Hive data schemasFully integrated analytics product– Explore, analyze, visualize, create dashboards, create data models, pivot, and share No fixed schema to search raw and unstructured dataPreview results while MapReduce jobs startEasier app development than in raw Hadoop
Hunk is essentially the Splunk Enterprise technology stacksitting on top of Hadoop, with some limitations (no real time, and several functions in the Splunk processing language that do not apply to virtual indexes). Hunk is a high performance, scalable software server written in C/C++ and Python. It indexes and searches logs and other big data stored in the Hadoop Distributed File System, called HDFS, or MapR’s proprietary variant of HDFS. Hunk works with machine data generated by any application, server or device. The Splunk Developer API is accessible via REST, SOAP or the command line. After downloading Hunk, installing Hunk on your choice of 64-bit Linux operating system, and starting Hunk, you'll find two Hunk Server processes running on your host:splunkd and splunkweb. splunkweb is a Python-based application server providing the Splunk Web user interface. It allows users to search and navigate machine data virtually indexed by Hunk servers and to manage your Hunk deployment through the browser interface. splunkd is a distributed C/C++ server that creates a virtual index from machine data and handles search requests. An ODBC driver (in beta as of September 2013) will provide integration with 3rd party data visualization software.
Connect Hunk to your Hadoop cluster as an external results provider. The external results provider is a search-time helper process responsible for: accessing the external system (Hadoop); translating or interpreting the search request; and pushing as much of the computation as possible to the external system. Connect to the Hadoop Distributed File System (HDFS) and MapReduce from Apache downloads or from your choice of Hadoop distribution, including the option for Cloudera, Hortonworks, MapR or Pivotal. Hunk only requires basic Hadoop: HDFS and MapReduce. You can continue to use additional projects and subprojects with your Hadoop cluster but what’s required by Hunk is just MapReduce and HDFS (or MapR’s proprietary variant of HDFS).
Connect Hunk to multiple Hadoop clusters.
There are significant challenges with theseapproaches to ask and answer questions of data in Hadoop. Not shown is a less common option, spreadsheet-like interfaces, that raise their own problems: these are batch job builders, with no interactive engine, and use “spreadsheet like” interfaces, not Microsoft Excel or Apple Numbers.
Hunk (Splunk Analytics for Hadoop) is a full-featured, integrated product offering – that delivers interactive data exploration, analysis and visualization for Hadoop. Full-featured, integrated product: Delivers interactive data exploration, analysis and visualization for HadoopInsights for everyone: Empowers broader user groups to derive actionable insights from raw data in HadoopWorks with what you have today: Works with leading Hadoopdistributions to maximize enterprise technology investments
Hunk does not replace your Hadoop distributionHunk coexists with your Apache HDFS & MapReduce downloads or your Hortonworks, Cloudera, or MapR distributionHunk does not replace or require Splunk Enterprise– Hunk is a separate product designed for new use cases involving data in HadoopIterative search but not real time or needle in the haystack searches– That’s Splunk EnterpriseNo data ingest management– That’s using tools from Apache Hadoop or from your Hadoop distribution vendor, or Hadoop connectors by enterprise software or business intelligence vendors Notes: Needle in a haystack – one in a million searches.
Splunk Enterpriseis a standalone solution and the industry-leading platform for machine data with all of Splunk’s core use cases. For customers who are storing historical data in Hadoop, we offer Hunk to run analytics on data stored natively in Hadoop. Hunk targets new use cases, including:– Data analytics for new product and service launches – Synthesis of data from all customer touch points– Comprehensive security analytics for modern threats– Easier big data app development than in raw Hadoop Furthermore, you can use Splunk Enterprise Hadoop Connect to send data between Splunk Enterprise and Hadoop. Many accounts may decide to buy both Splunk Enterprise for real-time monitoring and real-time search together with Hadoop for exploratory analytics of historical data stored in Hadoop. With this combination, you can run searches across native indexes in Splunk Enterprise and Hunk virtual indexes for data in Hadoop.
A rich developer platform and tool chain that includes a robust API and software developer kits in Java, JavaScript, Python, PHP, C# and Ruby to enable developer teams to rapidly build powerful big data applications. DEV.SPLUNK.COM activity highlights a strong developer community.
What you’ll need to get started. Data in Hadoop to analyze Hadoop client libraries From your Hadoop distribution vendor or from http://archive.apache.org/dist/hadoop/core/Hadoop access rightsHunk requires permission to read from HDFS and run MapReduce jobsJava 1.6+HDFS scratch spaceThe amount depends on the size of the interim results. Between 10 and 20 Gigs is common. DataNode local temp disk spaceAt most 5 Gigs per DataNode
On the first search, MapReduce auto-populates the Splunk binaries. The orchestration process begins when Hunk copies the Hunk binary .tgz file to HDFS. Hunk supports both the MapReduce JobTracker and the YARN MapReduce Resource Manager.Each TaskTracker (called ApplicationContainer in YARN) fetches the binary.The binary files expand in the specified location on each TaskTracker; the default location is configurable. TaskTrackers not involved in the 1st search will receive the Hunk binary in a subsequent search that involves those TaskTrackers. This process is one example of why Hunk needs some scratch space in HDFS and in the local file system (TaskTrackers / DataNodes). Background on Hadoop: Typically a Hadoop cluster has a single master and multiple worker nodes. The master node (also referred to as NameNode) coordinates the reads and writes to worker nodes (also referred to as DataNodes). HDFS reliability is achieved by replicating the data across multiple machines. By default the replication value is 3 and chunk size is 64MB.The JobTracker dispatches tasks to worker nodes (TaskTracker) in the cluster. Priority is given to nodes that host the data upon which said task will operate on. If the task cannot be run on that node, next priority is given to neighboring nodes (in order to minimize network traffic). Upon job completion, each worker node writes own results locally and the HDFS ensures replication across the cluster.HDFS = NameNode + DataNodes MapReduce Engine = JobTracker + TaskTracker
Search execution: The Hunk Search head takes the list of content of directories in the virtual index. The search head filtersdirectories & files based on the search & time range(partition pruning)The NameNode and JobTracker (MapReduce Resource Manager in YARN) read data from MapReduce framework and feed it to search process. The process computes File Splits, constructs and submits the MapReduce jobs.Hunk streams a few File Splits from HDFS and processes them in the Search Head to provider quick previews. The search head consumes and merges the MapReduce results (provide incremental previews) while the MapReduce jobs kick off. The data nodes run a copy of splunkd to process the the jobs and write them to a working directory in HDFS. Final results are stored in the Hunk search head. Hunk utilizes the Splunk Search Processing Language, the industry-leading method to enable interactive data exploration across large, diverse data sets. There is no requirement to "understand" data up front. For customers of Splunk Enterprise, reuse your Search Processing Language knowledge and skill set for data stored in Hadoop. Any commands whose output depends on the event input order would yield different results – this is because Splunk guarantees events to be delivered in descending time order. Hunk doesn’t. This is the reason why transaction and localize do not work.We can see the results from the intermediate Hadoop Map jobs getting steamed into the Splunk UI even before all the Map jobs are finished, and once all the Hadoop Maps are done processing the results, Splunk displays the full results. In essence, Splunk acts as the Hadoop Reduce phase and there is no need to use Hadoop for that phase.
Before data is processed by Hunk you can plug in your own data preprocessor. The preprocessors have to be written in Java and can transform the data in some way before Hunk gets a chance to. Data preprocessors can vary in complexity from simple translators (say Avro to JSON) to as complex as doing image/video/document processing.Hunk translates Avro to JSON. These translations happen on the fly and are not persisted.
Hunk applies structure at search timeDesigned for data exploration across large datasets – preview data & iterate quicklyNo requirement to understand the data upfrontNo limit to the number of results returned by Hadoop or the number of searchesNo brittle schema to maintain or update Find patterns and trends across disparate data sets in a “grab bag” Hadoop clusterUse the Search Processing Language or create data models and pivot Unlike Splunk Enterprise, Hunk applies schema for all fields – including transactions and localizations – at search time.
MapReduce considerations: Stats/chart/timechart/top/etc. commands work well in a distributed environmentThey MapReduce wellTime and order commands don’t work well in a distributed environmentThey don’t MapReduce wellFor large summary indexes, consider a dedicated "summarizer" instance with plenty of CPU to execute search jobs Summary jobs won't interfere with user searchesAggregates and stores the results away from indexers Report acceleration is not supported by Hunk 6.0 but may be supported in a future release.
Hunk starts the streaming and reporting modes concurrently. Streaming results show until the reporting results come in.Allows users to search interactively by pausing and refining queries.This is a major, unique advantage of Hunk compared to alternative approaches such as Hive or SQL on Hadoop which require fixed schema in an effort to speed up searches, while Hunk retains the combination of schema on the fly with results preview.
Pause or stop Jobs in progress and revise queries interactively. We’re mindful of the resources we use in Hadoop. Pause in Hunk:This pauses in the Search Head. Hadoop jobs keep running until the TCP header runs out. If you abandon a search for more than 30 seconds it will kill the search.
There’s no one path to explore data. Preview results and refine your queries. Hunk applies normalization as it’s needed for faster implementation and flexibility. Hunk supports the easy-to-use Splunk search processing language along with data models and pivot to provide multiple views into the same data. Find insights following a flexible, iterative workflow. I’ll touch on each of the components of the data workflow. There is no one set way to explore data. Go back and forth across components at the speed of thought. Explore and search data from one placePowerful Search Processing Language (SPL)Designed for data exploration across large datasetsPreview data, iterate quicklyNo fixed schemaNo requirement to “understand” data upfrontEasy to use interactive analytics Deep analysisPattern detectionFind anomaliesOver 100 statistical commandsModel: make unstructured data more valuable Describes how underlying machine data is represented and accessedDefines hierarchical relationships Enables single authoritative view of underlying raw dataPivot: powerful analytics anyone can useDrag and drop interface Easily build complex queries and reports Click to visualize chart typesReports dynamically updateVisualize: interactive reporting and visualization of dataInteractive reports viewRapidly build advanced graphs and chartsGenerate visualizations on-the-fly Drill down to raw data in HadoopOBDC connector to 3rd-party data visualization softwareShare:Build, personalize and share custom dashboards and PDFsCombine multiple charts, views, reports and external dataSet role and group access security for web dashboardsView and edit on any desktop, tablet or mobile deviceAnd do all of this from one integration platform for data in Hadoop.