4. The cowboy metaphor
4
Data wrangling / lassoing (capturing)
Data needs harnessing (bring under control for analysis)
Data might need a little grooming (clean, filter)
Data might need branding (categorizing / labeling / enrichment)
Data corralling (correlation)
Data stabling (securing)
Data needs to go to the rodeo (a platform)
Make data useful = be a data cowboy
11. Operational Technology
Energy Oil & Gas Process Buildings Mfg
Transport-
ation
Medical
Devices
Telecom
Consumer
Technology
Smart
Home
Wearables Media
The landscape is much, much vaster
Industrial Data Producing Assets
12. Succeeding with IOT data
12
IOT data is already being generated
And we are already capturing this data
The key challenge will be in turning this into something genuinely
useful. This is the opportunity.
Enable the developers & data domain experts
Give them the platforms and tools to be productive
This leads to ECOSYSTEM
14. Splunk can help you become an IOT data cowboy
14
Wrangle – Collect the data
Harness – Search over the data / Correlate
Show at the Rodeo – Visualize the data/Alerting
Provide a platform for Developers to build IOT Apps
15. Platform for machine data
15
Splunk storage Other Big Data stores
Developer
Platform
Data collection
and indexing
Report
and
analyze
Custom
dashboards
Monitor
and alert
Ad hoc
search
20. Kepware Industrial Data Forwarder for Splunk
20
Real-time streaming data collection from 150+ industrial protocolsProprietary and legacy data translation
24. Some data from a work order
05/27/2014T10:24:17GMT applicationId="safetyObs" eventType="safety" assetID="CV1002384-1045"
employeeId="114635" jobSite="PLEC-2014-GC" observationId="184568-451124-256" observation="Control Valve handle
extracted to manual position. No lockout/tagout or other tag visible. Process is running." observationCriticality="5"
imageId="PLEC-2014-GC-184568-451124-256" imageUri="https://mybucket.s3.amazonaws.com/PLEC-2014-GC-184568-
451124-256.png"
1543541, workorder, bsic, 78544, pipefitting, CV1002384, "install manual bleed bypass", 04/13/2014, 05/21/2014, 25663,
complete
05/22/2014 03:17:31 asset_id="CV1002384-1045" process_id="batch transfer starting" alarm="control valve failed to open"
05/22/2014 04:21:45 asset_id="CV1002384-1045" process_id="batch transfer starting" alarm="control valve failed to open"
05/22/2014 06:35:39 asset_id="CV1002384-1045" process_id="batch transfer starting" alarm="control valve failed to open"
05/22/2014 07:40:29 asset_id="CV1002384-1045" process_id="batch transfer starting" alarm="control valve failed to open"
CMMS (Work Order)
Application
29. Splunk options
29
Splunk> Enterprise : Free to download and use. Index 500 MB/day.
Splunk> Cloud : Premium, cloud hosted. Full Enterprise stack.100% uptime.
Splunk> Sandbox : Spin up a cloud instance in minutes. Load in data.
Hunk> : Splunk for data in Hadoop HDFS, MongoDB , other datastores (Neo4J)
IOT is the buzzword of the moment. And it is warranted. An internet of things is actually starting to exist and is growing at an exponential pace. But this brings with it new and exciting challenges.And by that I mean how are we going to capture the huge volume and variety of IOT data and turn it into something genuinely useful for the end user.So in this talk I'll put my cowboy hat on and and show a few ways in which you can use Splunk to harness all this data and create compelling IOT apps . Yee Ha !
Cowboy metaphor is very applicable to data , it’s about the end to end iterative lifecycle of data.
This is what we are going to talk about today , a few marketing slides , we are sponsors
Actually show how to make something useful with IOT data , yes we will make something !!!
Lets find some ways to become data cowboys today.
Stampede , Spurring , Ranch , Saddle , Reins , sure there are other metaphors , email your suggestions
[We hear about Big Data from IT, but scope is ever-widening as more devices become internet-enabled]
Data is generated everywhere
It’s not just the servers, applications, network devices, security devices in your datacenters.
Planes, trains, cars – all generate data.
The mobile device in your pocket – is continuously generating data even as you sit there.
Industrial control system (ICS), which had little resemblance to traditional information technology (IT) systems (isolated systems running proprietary control protocols using specialized hardware and software) are being replaced by Internet-connected devices – opening up opportunity and the the possibility of cyber security vulnerabilities and incidents.
Staggering scale=staggering analytics
The scale of all of this is staggering, and it is going to take machine data platforms for those who are responsible for the production, maintenance, management of these devices to handle this type of scale without giving up the additional insight real-time analytics provides.
2014 IDC report – “Universe of data is doubling in size every two years – by 2020 containing nearly as many digital bits as there are stars in the universe. 44 trillion gigabytes”
New challenges , new approaches needed
We are not living in an ETL landscape anymore , the lay of the land is different and we need different approaches.
[This machine data is high volume, diverse and very valuable, containing a record of all activity and behavior – it’s hard or impossible to process and analyze for traditional approaches or in a timely manner]
Machine data is one of the fastest growing, most complex and most valuable segments of data.
Why? it contains a categorical record of activity and behavior – of your customers, your applications, servers, sensors, devices, and so on.
Characteristics of machine data – the four V’s - the last two are the most interesting / challenging.
Want to talk a bit about this particular source of data, Im’ most interested in this
Devices are now joining the network at pace we have not seen before , not necessarily emiiting more data , just more devices
Growing at a massive pace and represents the greatest domain of opportunity
Just a natural evolution of the delivery of information over the internet
What are folks saying and doing
Devices / data volume
Investment
Opportunity for business
Apis for developers
A vibrant ecosystem of opportunity
From Cisco:
A jet engine generates 1TB of data per flight.
A large refinery generates 1TB of raw data per day.
As cars get smarter, the number of sensors is projected to reach as many as 200 per car.
Sensors of all types will generate immense amounts of data. In fact, analysts estimate that by 2020, 40 percent of all data will come from sensors.
Not just your fitbit , washing machine or connected bed
There are two distinct segments of the IoT—the largely siloed, enterprise-managed industrial systems, consisting of legacy applications and heavy equipment (e.g., oil and gas, power distribution and manufacturing systems); and consumer-focused, cloud-connected devices like connected vending machines, wearables and smarthome appliances.
Whether in industrial systems or the consumer-focused IoT, the constant processes, applications, interactions, sensor readings, geolocation information and streams of data can be easily managed and analyzed using Splunk software.
Smart buildings , elevators
Smart Car
Healthcare devices
But the challenges are stil the same
There is no point in just capturing and storing IOT data , that is not being productive or pragmatic
You have to unleash it into something of use
Create compelling apps
A thriving developer ecosystem will evolve ideas and uses cases for you , you just have to give them the platform to suceed.
Other challenges like security and scalability and timely processing
Platform for collecting data from any source , in any format
Search over it , correlate , look for insights
Visualize it , build apps for your domain use case
Splunk’s flagship product is Splunk Enterprise. Splunk Enterprise is a fully featured, powerful platform for collecting, searching, monitoring and analyzing machine data.
Splunk collects machine data securely and reliably from wherever it’s generated. It stores and indexes the data in real time in a centralized location and protects it with role-based access controls. You can even leverage other data stores. Splunk lets you search, monitor, report and analyze your real-time and historical data. Now you have the ability to quickly visualize and share your data, no matter how unstructured, large or diverse it may be.
Troubleshoot problems and investigate security incidents in minutes (not hours or days). Monitor your end-to-end infrastructure to avoid service degradation or outages. Gain real-time visibility and critical insights into customer experience, transactions and behavior. Use Splunk and make your data accessible, usable and valuable across the enterprise.
Splunk collects and indexes any machine data from virtually any source, format or location in real time. This includes data streaming from packaged and custom applications, app servers, web servers, databases, networks, virtual machines, telecoms equipment, OS’s, sensors, and much more. There’s no requirement to “understand” the data upfront. Just point Splunk at your data or deploy Splunk forwarders to reliably stream data from remote systems at scale. Splunk immediately starts collecting and indexing, so you can start searching and analyzing.
No more armies of consultants, or a DBA to make it work.
Connection to industrial devices and applications and translation of proprietary industrial protocols is outside the scope of our technology
Some customers and partners have developed custom connectors or have leveraged existing middleware applications, but this approach has a high barrier to entry and does not allow fast time to value or much flexibility after the initial integration
We need to align with technology that is proven and aligns with our own scalability, security, and flexibility.
Through our partnership with Kepware, Splunk can now ingest data from thousands of devices communicating over more than 150 proprietary and legacy industrial protocols.
Let’s take a peek at what machine data looks like.
Safety is #1 priority for you and in this example, we see here excerpts from 3 typically siloed operational technology systems:
Safety Observation Reporting System
Computerized Maintenance and Management System (CMMS)
Alarm Logs from the Plant Control or SCADA system.
Some machine data contains human-generated content. For example this logfile from a safety app on a technicians mobile device. The application allows them to easily report either positive, negative or neutral safety observations on a regular basis.
(logfile – timestamp, app_id, etc.)
Here we see fields from a work order application. This is relational data from a relational database backed work order management system like MAXIMO. Fields include work_order_id, the technical_id that performed the work and the date the work order was created and completed.
Here is a repeating alarm from the SCADA system – oftentimes repeating alarms such as this are considered a nuisance without the context of … <next slide>
Alerted in Splunk by a level 5 “Critical” safety observation entering the safety observation system.
Splunk is able to extract the asset identification automatically from the observation, and using the Splunk app DBConnect, correlate information from the CMMS (Workorder) application’s relational database back end, which can be used to supplement the observation with information such as the technician who performed the work (for mitigation through additional training), and the date the work was completed.
This complete date can then be used to automatically search Splunk’s historical record of alarms and events for this particular asset ID, and show that operations were negatively impacted by the issue.
In the future, repeating alarms of this type could automatically trigger an investigation using the details provided in the observation, and troubleshooting could be greatly accelerated. In fact, using scripted alerts in Splunk, the workorder to investigate the valve handle position could be automatically created and dispatched.
Splunk provides a single pane of glass view into comprehensive facility operations unmatched by other specialized facility or factory ops systems.
BUILD SPLUNK APPS
The Splunk Web Framework makes building a Splunk app looks and feels like building any modern web application.
The Simple Dashboard Editor makes it easy to BUILD interactive dashboards and user workflows as well as add custom styling, behavior and visualizations. Simple XML is ideal for fast, lightweight app customization and building. Simple XML development requires minimal coding knowledge and is well-suited for Splunk power users in IT to get fast visualization and analytics from their machine data. Simple XML also lets the developer “escape” to HTML with one click to do more powerful customization and integration with JavaScript.
Developers looking for more advanced functionality and capabilities can build Splunk apps from the ground up using popular, standards-based web technologies: JavaScript and Django. The Splunk Web Framework lets developers quickly create Splunk apps by using prebuilt components, styles, templates, and reusable samples as well as supporting the development of custom logic, interactions, components, and UI. Developers can choose to program their Splunk app using Simple XML, JavaScript or Django (or any combination thereof).
EXTEND AND INTEGRATE SPLUNK
Splunk Enterprise is a robust, fully-integrated platform that enables developers to INTEGRATE data and functionality from Splunk software into applications across the organization using Software Development Kits (SDKs) for Java, JavaScript, C#, Python, PHP and Ruby. These SDKs make it easier to code to the open REST API that sits on top of the Splunk Engine. With almost 200 endpoints, the REST API lets developers do programmatically what any end user can do in the UI and more. The Splunk SDKs include documentation, code samples, resources and tools to make it faster and more efficient to program against the Splunk REST API using constructs and syntax familiar to developers experienced with Java, Python, JavaScript, PHP, Ruby and C#. Developers can easily manage HTTP access, authentication and namespaces in just a few lines of code.
Developers can use the Splunk SDKs to:
- Run real-time searches and retrieve Splunk data from line-of-business systems like Customer Service applications
- Integrate data and visualizations (charts, tables) from Splunk into BI tools and reporting dashboards
- Build mobile applications with real-time KPI dashboards and alerts powered by Splunk
- Log directly to Splunk from remote devices and applications via TCP, UDP and HTTP
- Build customer-facing dashboards in your applications powered by user-specific data in Splunk
- Manage a Splunk instance, including adding and removing users as well as creating data inputs from an application outside of Splunk
- Programmatically extract data from Splunk for long-term data warehousing
Developers can EXTEND the power of Splunk software with programmatic control over search commands, data sources and data enrichment.
Splunk Enterprise offers search extensibility through:
- Custom Search Commands - developers can add a custom search script (in Python) to Splunk to create own search commands. To build a search that runs recursively, developers need to make calls directly to the REST API
- Scripted Lookups: developers can programmatically script lookups via Python.
- Scripted Alerts: can trigger a shell script or batch file (we provide guidance for Python and PERL).
- Search Macros: make chunks of a search reuseable in multiple places, including saved and ad hoc searches.
Splunk also provides developers with other mechanisms to extend the power of the platform.
- Data Models: allow developers to abstract away the search language syntax, making Splunk queries (and thus, functionality) more manageable and portable/shareable.
- Modular Inputs: allow developers to extend Splunk to programmatically manage custom data input functionality via REST.