Introducing Splunk – The Big Data Engine


Published on

Published in: Technology

Introducing Splunk – The Big Data Engine

  1. 1. Copyright © 2012 Splunk Inc.Introducing Splunk –The Big Data Engine5th Big Data Usergroup MeetingZurich, 21.01.2012
  2. 2. Splunk – The Big Data Company Company (NASDAQ: SPLK) Founded 2004, first software release in 2006 HQ: San Francisco / Region HQ: London, Hong Kong Over 600 employees, based in 12 countries FY2012 $120 million; +83% year-over-year 5,000+ Customers Customers in over 80 countries 54 of the Fortune 100 Largest license: 100 Terabytes per day 2
  3. 3. Over 3,000 Customers in 70+ CountriesCloud and Online Services Education Energy and Utilities Financial Services and Insurance Government Healthcare Manufacturing Media Retail Technology Telecommunications Travel and Leisure 4
  4. 4. Some Splunk Big Data CustomersCustomer Daily Data Volume 12 TB 6 TB 4 TB 1.2 TB 900 GB 800 GB 5
  5. 5. Big Data Comes from Machines Volume | Velocity | Variety | Variability GPS, Machine-generated data is one of the RFID, fastest growing, most complex Hypervisor,and most valuable segments of big data Web Servers, Email, Messaging Clickstreams, Mobile, Telephony, IVR, Databases, Sensors, Telematics, Storage, Servers, Security Devices, Desktops 6
  6. 6. Big Data Technologies Aster Data Cassandra Greenplum Hbase MongoDB Hadoop Single Single RDBMS SQL & NoSQLRDBMS Bigger Sharding Map/Reduce RDBMS Map / Reduce Relational Database (highly structured) Key/Value, Tables or Temporal, Unstructured Other (semi-structured) Heterogeneous Time 7
  7. 7. Splunk: the Platform for Machine Data Innovative, Easy to Use and Powerful Ad hoc Monitor Report and Custom Developer search and alert analyze dashboards Platform Data collection and indexing Splunk storage Other Big Data stores 8
  8. 8. Apps and SolutionsApplication IT Web Business Security ComplianceMonitoring Operations Intelligence Analytics User Interface APIs SDK Core Functions Access Stats/ Alerts Reports Dashboards Controls Analytics Search Indexing Collection 9
  9. 9. Scales to TBs/day and Thousands of Users Automatic load balancing linearly scales Distributed search and MapReduce linearly indexing scales search and reporting 10
  10. 10. What Does Machine Data Look Like? SourcesOrder Processing Middleware Error Care IVR Twitter 11
  11. 11. Machine Data Contains Critical Insights Sources Customer ID Order ID Product IDOrder Processing Order ID Customer ID Middleware Error Time Waiting On Hold Care IVR Customer ID Twitter ID Customer’s Tweet Twitter Company’s Twitter ID 12
  12. 12. What do we do? Collect and index Machine DataCustomer Outside theFacing Data Datacenter Click-stream data Manufacturing, Shopping cart data logistics… Online transaction data CDRs & IPDRs Power consumption Logfiles Configs Messages Traps Metrics Scripts Changes Tickets RFID data Alerts GPS data Windows Linux/Unix Virtualization Applications Databases Networking Registry Configurations & Cloud Web logs Configurations Configurations Event logs syslog Hypervisor Log4J, JMS, JMX Audit/query syslog File system File system Guest OS, Apps .NET events logs SNMP sysinternals ps, iostat, top Cloud Code and scripts Tables netflow Schemas 13
  13. 13. What do we do? Collect and index Machine DataCustomer Outside theFacing Data Datacenter Click-stream data Manufacturing, Shopping cart data logistics… Online transaction data •Any amount, any location, any source. CDRs & IPDRs Power consumption Logfiles Configs Messages Traps Metrics Scripts Changes Tickets RFID data Alerts GPS data No upfront schema No custom connectors Windows Linux/Unix Virtualization Registry Configuration No RDBMS Applications & Cloud Web logs Databases Configurations Networking Configurations Event logs File system s syslog No need to filter/forward Hypervisor Log4J, JMS, JMX .NET events Audit/query logs syslog SNMP Guest OS, Apps sysinternals File system Cloud Code and scripts Tables netflow ps, iostat, top Schemas 14
  14. 14. Inside Universal Indexing Automatic event boundary identificationAutomatic timestamp normalization ...enable accurate searching and trending by time across all data: 15
  15. 15. Inside Universal Indexing Segmentation & dense indexing of every term ...enable Boolean search on anything in the original event: 16
  16. 16. Inside Search-time Knowledge Extraction Automatically discovered fields And user-defined fields... enable statistics and precise search on specific fields: 17
  17. 17. New Approach to Heterogeneous Data Universal Indexing Search-time Knowledge Flexibility and Fast Time to Value• No data normalization • Knowledge applied at • Normalization as it’s• Automatically handles search-time needed timestamps • No brittle schema to • Faster implementation• Parsers not required work around • Easy search language• Index every term & • Multiple views into the • Multiple views into the pattern “blindly” same data same data• No attempt to • Splunk helps find “understand” up front transactions, patterns and trends 18
  18. 18. Splunk Used Across IT and the Business Application Management Operations Management Security & Compliance Web and Business Analytics 19
  19. 19. Provides Strong Machine Data Governance Provides comprehensive controls for data Single sign-on integration enables pass- security, retention and integrity through authentication of user credentials 20
  20. 20. Splunk Big Data StrategyDeliver ease of use, real-time analytics and enterprise capabilities Ad hoc search Monitor and alert Data collection Report and and indexing analyze Splunk storage Other Custom Stores dashboards Developer Platform 21
  21. 21. Deploying New Technologies is a Challenge 22
  22. 22. Splunk-Hadoop: Co-existence use cases Real-time Analytics Side by Side ETL / recommendation systemSplunk in-front of Hadoop Collect, Visualize, Report ETL, Archival, Long Running Queries Splunk visualize and secure Hadoop Data } Combine Splunk Index Hadoop Data
  23. 23. Splunk: Enabling the Big Data Ecosystem Real-time Dashboards,Collection and Reports, Analysis Access Controls Splunk Hadoop Connect • Reliable Data Export • Import Hadoop Data > > Splunk App for HadoopOps > > • End-to-end monitoring,> > troubleshooting , analysis of Hadoop environment 24
  24. 24. Splunk Hadoop Connect Delivers reliable integration between Splunk and Hadoop Export events to Hadoop Explore and Browse Hadoop directories Import and Index Hadoop data into Splunk 25
  25. 25. Splunk App for HadoopOpsMonitoring the full Hadoop environment – Hadoop, Switch, OS, AS, and Database Splunk HadoopOps Splunk HadoopOps Forwarder Package on every Dashboards, alerts and notifications, host Add Collect & Distributed Monitor Rich UI powered by Splunk search Knowledge Index Data Search & Alert Framewor k Host Operating System Infrastructure 26
  26. 26. Splunk and Big DataProduct-based Integrated and Performance Solution End-to-end at scaleEasy to download and Collects data from tens of Proven at multi-terabytedeploy thousands of sources scale per dayPre-integrated, end-to- Advanced real-time and Upwards of PB underend functionality historical analysis of data managementEnterprise-grade features Fast, custom visualizations Thousands of enterprise for IT and business users customers Developer API, SDKs 27
  27. 27. Thank You