Your SlideShare is downloading. ×
  • Like
  • Save
Implementing Big Data at the Speed of Business
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Implementing Big Data at the Speed of Business



Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • let’s examine for a second, one of the fastest growing, most complex and most valuable segments of big data – machine data. All the webservers, applications, network devices – all of the technology infrastructure running your enterprise – generates massive streams of data, in an array of unpredictable formats that are difficult to process and analyze by traditional methods or in a timely manner. Why is this “machine data” valuable? Because it contains a trace - a categorical record - of user behavior, cyber-security risks, application behavior, service levels, fraudulent activity and customer experience.For Splunk the last two Vs are very important. Variety of data + Variability of data (change in format. For example, new fields are added to the log file)
  • Why is this “machine data” valuable? Because it contains a trace - a categorical record - of user behavior, cyber-security risks, application behavior, service levels, fraudulent activity and customer experience.Order Processing = Order of a productMiddleware Error = WebLogic Application Server errorCare IVR = Telephone call to complain about the errorTwitter = Comments on the bad experienceThis information is very hard and time consuming effort to parse the data for a database consumption. The reason it is very hard to normalize this data is because of the last two Vs = Variety of data + Variability of data (change in format. For example, new fields are added to the log file)
  • Example of a Customer ID that Splunk can correlate between the:Order Processing -> Application Server Error -> Customer calling to complain about the issue -> Twitter record that the customer gave up on waiting
  • Splunk is the platform for machine data.Optimized for real-time, low latency and interactivitySplunk is the platform for machine data.It reliably collects and indexes all the streaming data from IT systems and technology devices in real-time - tens of thousands of sources in unpredictable formats and types.The Splunk platform indexes the data, making it available for searching, monitoring, analysis and visualizations.It enables you to interact with your data. Gain operational intelligence from your data.1. Find and fix problems dramatically faster2. Automatically monitor to identify issues, problems and attacks3. Gain end-to-end visibility to track and deliver on IT KPIs and make better-informed IT decisions4. Gain real-time insight from operational data to make better-informed business decisions
  • Both IT and business professionals can analyze machine data to get real-time visibility and operational intelligence.With our data engine and our customers' machine data, organizations can meaningfully improve their performance in a wide range of areas e.g. meet service levels, reduce costs, mitigate security risks, maintain compliance and gain insights.
  • Splunk can be divided into four logical functions. First, from the bottom up, is forwarding. Splunk forwarders come in two packages; the full Splunk distribution or a dedicated “Universal Forwarder”. The full Splunk distribution can be configured to filter data before transmitting, execute scripts locally, or run SplunkWeb. This gives you several options depending on the footprint size your endpoints can tolerate. The universal forwarder is an ultra-lightweight agent designed to collect data in the smallest possible footprint. Both flavors of forwarder come with automatic load balancing, SSL encryption and data compression, and the ability to route data to multiple Splunk instances or third party systems. To manage your distributed Splunk environment, there is the Deployment Server. Deployment server helps you synchronize the configuration of your search heads during distributed searching, as well as your forwarders to centrally manage your distributed data collection. Of course, Splunk has a simple flat-file configuration system, so feel free to use your own config management tools if your more comfortable with what you already have. The core of the Splunk infrastructure is indexing. An indexer does two things – it accepts and processes new data, adding it to the index and compressing it on disk. The indexer also services search requests, looking through the data it has via it’s indices and returning the appropriate results to the searcher over a compressed communication channel. Indexers scale out almost limitlessly and with almost no degradation in overall performance, allowing Splunk to scale from single-instance small deployments to truly massive Big Data challenges. Finally, the Splunk most users see is the search head. This is the webserver and app interpreting engine that provides the primary, web-based user interface. Since most of the data interpretation happens as-needed at search time, the role of the search head is to translate user and app requests into actionable searches for it’s indexer(s) and display the results. The Splunk web UI is highly customizable, either through our own view and app system, or by embedding Splunk searches in your own web apps via includes or our API.
  • Splunk uses commodity servers to scale. Splunk customers use the product to harness multiple TB of data per day. 1000s of Forwarders -> Indexers <- Search heads support hundreds or thousands of users all accessing the data
  • Open Source software, such as Hadoop and Cassandra, require 6 months+ development cycles and specialized development resources.
  • Splunk DB Connect enables you to enrich and combine machine data with database data. Easily configure database queries and lookups in minutes via the Splunk Enterprise user interface and conduct connection pooling as well as flexible search commands to query database tables.
  • The Splunk App for HadoopOps provides several specialized features to monitor Hadoop:Monitoring Nodes on cluster – Display a complete view of all of the servers in the cluster. The monitoring allows Hadoop administrator a view into the health of the cluster, track disk usage, CPU, and RAM from one single view rather then opening multiple consoles for information. Cluster visualization can display a rack or a node specific failure.Monitoring MapReduce jobs – Displays information on the Map and Reduce tasks. The information here delivers real-time as well as historical statistics as to how the individual tasks are operating and how they are working together. Information gathered here is used to troubleshoot MapReduce performance issues by comparing similar jobs and drilling from JobIDs to TaskIDs. Furthermore, it correlates between used core slots and MapReduce, and pinpoint the MapReduce attempts that are using them. Monitoring Hadoop Services – Displays information about the health of the Name node, Secondary Name node, and Data node. The services explore HDFS I/O, HDFS capacity per user, HDFS size, and well as the CPU and Memory of the HDFS daemons. Information here is used for monitoring the load and capacity, which can be used to justify hardware and software acquisitions.View Hadoop Configuration – Displays information about the configuration of each node and each daemon in the Hadoop cluster. Hadoop is highly dependent on the hardware and network it uses. Therefore, any changes made to the Hadoop configurations can create service disruption. The information indexed by Splunk allows Hadoop Administrators to view configurations from HDFS, MapReduce, and the entire surrounding environment, which can lead to producing faster resolution times.Search Logs – Splunk distributed search and indexing allows for real-time display of information from all Hadoop, Linux, Database, and Network log files to further enhance the end-to-end debugging of issues.Headlines and Alerts Notifications – Splunk allows for alerts that can be trigger based on a single event as well as a group of events. Per-result Alerting allows users a granular control over the notifications received when one of the Hadoop nodes, MapReduce tasks, or HDFS daemon is failing.
  • More than 4,800 users in over 85 countries have purchased the enterprise license of Splunk. This includes a majority of the Fortune 100. Enterprises, service providers and government agencies in 80 countries use Splunk to improve service levels, reduce IT operations costs, mitigate security risks and drive new levels of operational visibility.As they gain new visibility into their real-time and historical machine data, Splunk’s customers are finding answers and solving the most challenging issues facing IT and the business.


  • 1. Copyright © 2013 Splunk Inc.Big Data at theSpeed of BusinessRaanan Dagan, Big Data PM, SplunkMaciej Jagiellowicz, Monitoring and Response SeniorSpecialist , Allegro
  • 2. What We’ll Talk About• What is Splunk?• Real-Time Monitoring and Alerts at Allegro• Integration Platform with Splunk Applications• Archiving Big Data at Allegro• Q&A
  • 3. • Company (NASDAQ: SPLK) • Online transaction platform – Founded 2004, first software • Was formed in 1999 release in 2006 • E-commerce leader in – HQ: San Francisco, CA Central and Eastern Europe,• 5,200+ Enterprise Customers a group of companies• #1 Big Data Innovator* managing 129 platforms in over 23 countries• #1 Big Data – Pure Play Vendor** • More then 12.5 million users * Fast Companys Most Innovative Companies Issue (March 2013) ** Forbes/Wikibon (Feb 2013) • Web site:
  • 4. Big Data Comes from Machines Volume | Velocity | Variety | Variability Machine-generated data is one of the GPS, RFID, fastest growing, most complex Hypervisor,and most valuable segments of big data Web Servers, Email, Messaging Clickstreams, Mobile, Telephony, IVR, Databases, Sensors, Telematics, Storage, Servers, Security Devices, Desktops
  • 5. What Does Machine Data Look Like? SourcesOrder Processing Middleware Error Care IVR Twitter
  • 6. Machine Data Contains Critical Insights Sources Customer ID Order ID Product IDOrder Processing Order ID Customer ID Middleware Error Time Waiting On Hold Customer ID Care IVR Twitter ID Customer’s Tweet Twitter Company’s Twitter ID
  • 7. Splunk: The Platform for Machine Data Machine Data Operational Intelligence Insight and Visualizations for Executives Statistical Analysis Proactive Monitoring Splunk Index Search and Investigation
  • 8. Serves Needs Across IT and Business IT Operations Management Web Intelligence Application Management Business Analytics Security and ComplianceCustomer LOB Owners/ Support Executives Operations Website/Business Teams Analysts System Application IT Security Auditors Administrator Developers Executives Analysts 8
  • 9. Splunk for Real-Time Monitoring and Alerts
  • 10. Why do we like Splunk …• Meets strategic needs across IT• Scales from laptop to datacenter to cloud• For all types of users• Users want to use it
  • 11. Where do we use Splunk• Real time monitoring - Web servers - App servers - Active Directory - Security devices• Post incident log analyze - Historical data analyze• Application debugging - Real time log analyze
  • 12. Splunk Architecture• Concurrent Users = 250• Search Heads = 5• Indexers = 2• Forwarders = 1500• Total Data Processed Per Day = 100GB
  • 13. Visualizing Real-Time Data in SplunkReal time monitoring:• Transactions with financial institutions and banks• Monitoring of key referrals to web site• Monitoring of applications JMS queues• Top areas of application errors• Business transactions• Monitoring of SMS and mobile devices communications
  • 14. Key Functions• Searching and Reporting (Search Head)• Indexing and Search Services (Indexer)• Local and Distributed Management (Deployment Server)• Data Collection and Forwarding (Forwarder) A Splunk install can be one or all roles…
  • 15. Splunk Components and Scalability • Distributed analysis • Automatic load balancing linearly scales indexing Search Heads • Role-based security Offload search load to Splunk Search Heads Indexers Auto load-balanced forwarding to as many Splunk Indexers as you need to index terabytes/dayForwarders Send data from 1000s of servers using combination of Splunk Forwarders, syslog, WMI, message queues, or other remote protocols
  • 16. Splunk Real-time AnalyticsData Monitor Input Parsing Pipeline Real-time • Source, event typing Search TCP/UDP Input • Character set normalization • Line breaking Scripted Input Splunk • Timestamp identification Raw data Index Index Files
  • 17. Splunk Delivers Big Data in Days or Weeks Product-based Real-time Performance Solution Platform at scale Easy to download and Collects data from tens of Proven at multi-terabyte deploy thousands of sources scale per day Pre-integrated, end-to- Advanced real-time and Upwards of PB under end functionality historical analysis of data management Enterprise-grade features Fast, custom visualizations Thousands of enterprise for IT and business users customers
  • 18. Do You Hadoop?
  • 19. Splunk: A Platform for Big Data Integration Splunk Dev Platform Ad hoc Monitor Report Custom Developer search and alert and dashboards Platform • API and SDKs to build analyze Big Data appsSplunk DB Connect Splunk Hadoop Connect• Real-time integration • Reliable bi-directional to relational DBs integration to Hadoop SQL 19
  • 20. Splunk Hadoop ConnectDelivers reliable integration between Splunk and Hadoop
  • 21. Splunk DB ConnectReliable, scalable, real-timeintegration between Splunk andtraditional relational databases Java Bridge Server Database Connection Database Enrich search results with additional Lookup Pooling Query business context JDBC Easily import data into Splunk for deeper analysis Integrate multiple DBs concurrently Oracle Microsoft SQL Other Database Server Databases Simple set-up, non-evasive and secure 21
  • 22. Splunk Developer Platform 1 2 3 Accelerate Integrate with IT Build Real- me Data Dev & Test Infrastructure Applica ons Developer Platform (REST API, SDKs)Enables enterprise developers to extend the power of Splunk Enterprise with robust API and Java, JavaScript and Python SDKs
  • 23. Splunk Hadoop MonitoringSplunk HadoopOps Splunk HadoopOpsForwarder Package on every Dashboards, alerts and notifications, host powered by Splunk search Add Collect & Distributed Monitor Rich UI Knowledge Index Data Search & Alert Framewor k Host Operating System Infrastructure
  • 24. Archiving Big Data in Hadoop
  • 25. Hadoop Components• Hive• Flume• Mahout• MapReduce
  • 26. Hadoop Cluster
  • 27. Why and Where do we Use Hadoop• Big Data archive• Web services statistics• Mail flow statistics
  • 28. Where we do not use Hadoop• Not for Visualization• Not for Analytics• Not for Real-time• Not for Access Control
  • 29. Where we are today and where do we want to be tomorrow
  • 30. Splunk 5,200+ Licensed CustomersCloud and Online Services Education Energy and Utilities Financial Services and Insurance Government Healthcare Manufacturing Media Retail Technology Telecommunications Travel and Leisure
  • 31. Splunk Big Data PlatformProduct-based Real-time Performance solution Platform at scale Visit Splunk Booth
  • 32. Copyright © 2013 Splunk Inc.Thank