Making the Elephant Fly: Real-time Operational Intelligence on Hadoop with Streaming SQL
 

Making the Elephant Fly: Real-time Operational Intelligence on Hadoop with Streaming SQL

on

  • 919 views

Streaming SQL, Hadoop and Real-time Operational Intelligence. ...

Streaming SQL, Hadoop and Real-time Operational Intelligence.

Accelerating Hadoop to process live, high velocity unstructured data streams delivers the low latency, streaming operational intelligence demanded by today's real-time businesses. Hadoop has been the driving force behind Big Data Analytics but as the technology hits the mainstream, many industries are seeking to take a step further and eliminate latency from their business completely. With the SQL language emerging as the key enabler for the mainstream adoption of Hadoop, executing streaming SQL queries over Hadoop extends the platform out to the edge of the network, making it possible to query unstructured log file, sensor and network machine data sources on the fly and in real-time.  

This presentation, given by SQLstream CEO, Damian Black, at IE Group's Big Data Innovation Summit in San Francisco, presents a 100% standards-compliant streaming SQL language for Hadoop, delivering real-time operational intelligence from streaming SQL queries that utilize the Hadoop / Map Reduce infrastructure for high velocity, high volume scalability. 

Statistics

Views

Total Views
919
Views on SlideShare
918
Embed Views
1

Actions

Likes
1
Downloads
45
Comments
0

1 Embed 1

http://www.sqlstream.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • SQLstream transforms your streaming log file, sensor and other machine-generated data into business value in real-time – improving profitability, business and process efficiency while reducing the costs and complexity of developing and enhancing your applications and data processing. SQLstream is the leading stream computing platform, providing a real-time link between streaming data and your ability to react and respond to the continuous, complex analyses of the data. This gives deep insights into what is going on in your business – all in real-time, while allowing you to share the streams of answers to those ongoing analyses with any number of downstream applications – and at massive scale. Data volumes are exploding exponentially – making it too costly to analyze with conventional technologies where you have to store all of the data, even if most of the data might have a very limited ‘shelf life’. The costs come from needing specialized data warehousing systems to handle such large and ever growing data volumes, and those license fees are almost always based on the volume of data stored. So why store everything if you are only really concerned with the results of analyses? At the same time, businesses are having to become nimbler and more agile. They need to consume and analyze data faster than their competitors, and fast enough to hold the attention of their customers while they are interacting with their product, service, systems or personnel. Finally, automating such real-time processes is hard to do in a scalable and effective manner. Getting raw data into a data warehouse in “cooked form” is the job of ETL tools (data Extraction, Loading and Transformation). However, today’s ETL tools operate as a sequence of sequential processing stages, with each stage having to complete before the next stage can start. The way this works is that first raw records are collected and poured into staging tables. The the records are processed, aggregated, cleaned. Then they are reading for querying and reports are finally generate and delivered to awaiting users, often the next day. In contrast, SQLstream provides a continuous, overlapped parallel processing alternative where records continuously stream in, directed into cleaning and aggregation pipelines, and then piped immediately in sets of awaiting analytics queries that stream out results for immediate delivery to awaiting apps and users. So we have minimal end-to-end latency and maximum throughput non-stop with massive scale and parallel processing.
  • SQL was developed to elegant process massive quantities of stored data. It works just as well in processing massive volumes of streaming data. It has proven scalability and sophisticated query optimizers, enables rapid application development – a few SQL rules have immense power – and the SQL skills are readily available in the marketplace. SQL t allows easy migration of SQL queries and logic to and from databases and data warehouses and SQLstream.
  • We are one of only two closed source solutions within Mozilla. We power their real-time analytics. See our ”Powered by SQLstream” logo on the bottom right of their web display. Search Youtube for “Mozilla Glow”. SQLstream processes all of the log-files from Mozilla download servers in real-time, parsing the files, streaming the data, mapping IP addresses to Longitude and Latitude, finding the nearest town, city or village and performing a range of analytics on the streams to feed a Hadoop cluster (Hbase) for displaying historical information complemented by SQLstream’s real-time analytics.
  • Put simply, there is a new real-time data challenge facing many enterprises today. The data volumes are exploding exponentially – making it too costly to analyze with conventional technologies where you have to store all of the data, even if most of the data might have a very limited ‘shelf life’. The costs come from needing specialized data warehousing systems to handle such large and ever growing data volumes, and those license fees are almost always based on the volume of data stored. So why store everything if you are only really concerned with the results of analyses? At the same time, businesses are having to become nimbler and more agile. They need to consume and analyze data faster than their competitors, and fast enough to hold the attention of their customers while they are interacting with their product, service, systems or personnel. Finally, automating such real-time processes is hard to do in a scalable and effective manner.

Making the Elephant Fly: Real-time Operational Intelligence on Hadoop with Streaming SQL Making the Elephant Fly: Real-time Operational Intelligence on Hadoop with Streaming SQL Presentation Transcript

  • Copyright © SQLstream Inc.SQLstream: BIG DATA ON TAP™SQLstream: BIG DATA ON TAP™Damian Black, CEO of SQLstream+1 415 326 4261damian@sqlstream.comBig Data Innovation Summit, SanFrancisco, April 2013Turbo-charging Hadoop for Real-time,HighVelocity Machine Data Analytics
  • | 2Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.comBig Data InnovationPresentation Title:Streaming SQL, Hadoop and Real-time Operational IntelligenceSub-title: Turbo-charging Hadoop for real-time, high velocity machine data analytics.Presentation Description:Accelerating Hadoop to process live, high velocity unstructured data streams delivers the lowlatency, streaming operational intelligence demanded by todays real-time businesses. Hadoop hasbeen the driving force behind Big Data Analytics but as the technology hits the mainstream, manyindustries are seeking to take a step further and eliminate latency from their businesscompletely. With the SQL language emerging as the key enabler for the mainstream adoption ofHadoop, executing streaming SQL queries over Hadoop extends the platform out to the edge of thenetwork, making it possible to query unstructured log file, sensor and network machine data sourceson the fly and in real-time.This session presents a 100% standards-compliant streaming SQL language for Hadoop, deliveringreal-time operational intelligence from streaming SQL queries that utilize the Hadoop / Map Reduceinfrastructure for high velocity, high volume scalability.
  • | 3Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.comLet’s answer the following questions:➔What is Operational Intelligence and why care?➔What is streaming Big Data and streaming SQL?➔How do you use streaming SQL with Hadoop?➔How do they work?➔Why is this approach so compelling?Streaming SQL + Hadoop forReal-time Operational Intelligence
  • | 4Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.comWhat is Operational Intelligence?As we move toward a real-timebusiness environment, thecapability to process data flowsswiftly and flexibly will becomeincreasingly important. SQLstreamleads the industry in this kind ofcapability.“”Robin BloorChief Analyst for Bloor GroupAberdeen’s research has shown thatBest-in-Class organizations aredemanding access to actionableintelligence faster than ever. This isprecisely the growing demand thatSQLstream is meeting with theirStreaming Big Data Engine, whilecontinuing to bring other attractivefeatures like full Hadoopintegration.“”Nathaniel RoweLeading Analyst for Aberdeen GroupBusinessIntelligencePost-hoc analysisData warehousingStrategic insightsBusinessIntelligencePost-hoc analysisData warehousingStrategic insightsOperationsTransactionsPromotionsMachine dataOperationsTransactionsPromotionsMachine dataBridge the chasm between Operations and BIOperational IntelligenceReal-time, integrated view of business and operationsActionable insights in real-timeCombines operations data with BI data continuouslyOperational IntelligenceReal-time, integrated view of business and operationsActionable insights in real-timeCombines operations data with BI data continuously
  • | 5Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.comFrom Machine Data to Operational IntelligencePROACTIPROACTIVEVEREACTIVREACTIVEE
  • | 6Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.comMachine Data: Where is the intelligence?TRANS,2013-02-17-15:30:22,3458783,2347897953,128.56.0.253,STATUS:-15, DE69975, 4157588342TransactionLog DetailsWeb ServerLogsCDR RecordsSmartphoneGPS UpdatesTwitter{"created_at:Thu Feb 17 15:30:55 +00002013,id:304612775055998976,id_str:304612775055998976,text:@MyServiceProvider today sucks, keepsdropped!,source:u006ca href=http:www.url.com rel=nofollow,followers_count:147,friends_count:10142,location: San Francisco, time_zone: Pacific, geo_enabled:true, location:u00dcT: -6.1987552,106.8661953,screen_name:APerson<id>1597831220</id><deviceid>0198873465</deviceid><lat>lat=47.643957</lat><lon>lon=-122.3269</lon><time>2013-02-17T15:37:26Z</time><bearing>223.4535</bearing><id>1597865781</id><deviceid>0198873465</deviceid><lat>lat=47.645982</lat><lon>lon=-122.327500</lon><time>2013-02-17T15:37:26Z</time><bearing>200.6138</bearing><id>1597940125</id><deviceid>0198873465</deviceid><lat>lat=47.647381</lat><lon>lon=-122.326501</lon><time>2013-02-17T15:37:26Z</time><bearing>87.4357</bearing>[Sun Feb 17 15:30:49 2013] [notice] srv-sfo-08 caught SIGTERM, shutting down[Sun Feb 17 15:30:49 2013] [notice] Apache/2.2.21 -- resuming normal operationsTERMINATE,ctl09gsx,01299796304,GMT-08:00,02-17-13,15:21:00,9,387,64ms,02-17-13,15:30:55,0005,IP-TO-IP,4157588342,8775715775,1,0,4157588342,RD_AXY_NN0_001,SFR01AAG34,40.50.245.60,234.234.60.75,65678,411,399,SIP,SANFRANCISCO,0x4B1698,0x0005E,0x49768,4157588342,0198873465TimestampTimestampTimestampTimestampTimestampTimestampTimestampTimestampTimestampTimestampMobile #Mobile #CustomerCustomerServerServerMobile #Mobile #Device IDDevice IDTerm ReasonTerm ReasonDevice IDDevice ID LocationLocationLocationLocationService ProviderService ProviderFail CodeFail Code
  • | 7Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.comStreaming Analytics over High-velocity Big DataHistoricalqueries andenrichmentMaterialize andstore OI viewsfor future accessReal-timealerts, actionandvisualization• Continuous and Historical Queries• Analyze and enhance unstructured and structured data• Predictive analytics  actions, alerts and real-time visualization
  • | 8Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.comSELECT STREAM ROWTIME, url, numErrorsLastMinuteFROM ( SELECT STREAM ROWTIME, url, numErrorsLastMinute,AVG(numErrorsLastMinute) OVER lastMinute AS avgErrorsPerMinute,STDDEV(numErrorsLastMinute) OVER lastMinute AS stdDevErrorsPerMinuteFROM ServiceRequestsPerMinuteWINDOW lastMinute AS (PARTITION BY url RANGE INTERVAL ‘1’ MINUTE PRECEDING) ) ASSWHERE S.numErrorsLastMinute > S.avgErrorsPerMinute + 2 * S.stdDevErrorsPerMinute;The Power of Streaming SQLBUSINESS NEED:Predict run-away applicationsbefore resourceconsumption becomes anissue.BLAZING SPEED:Process millions of records persecond on low-end servers
  • | 9Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.com➔Queries both unstructured and structured data:»Binary, text, natural language, XML payloads»Built-in functions to process payload types»Extract structure and tag records➔Process the tagged data using the full power of SQL»Filters, joins, aggregations performed continuously, parallel, real-time»Automatic query optimization»High performance, low-latency➔Examples of SQL on Hadoop»SQLstream, Google BigQuery, CloudEra Impala, and Hadoop HiveWhy SQL in a NoSQL World?
  • | 10Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.comANALYZEANALYZE SHARESHARECOLLECTCOLLECTStreaming SQL Views over HadoopOffline Map ReduceOffline Map ReduceHDFS and HBaseHDFS and HBaseSQLstream Streaming SQLSQLstream Streaming SQLImpala / HIVEImpala / HIVESQLstream RT DashboardSQLstream RT DashboardStreaming Hadoop Operations ExplainedStream Recorder Materialize and persist raw and derived data streams in Hadoop HDFS.Stream Playback Replay streams data thru SQLstream to find temporal and spatialpatterns. Use Impala or Hive as the Stream Retrieval API.Streaming ETL Continuously aggregate data streams into HBase.Continuously store materialized streams in HDFS.
  • | 11Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.coms-Streaming Product Portfolios-ServerDistributed Management Platform for Streaming Big Datas-ServerDistributed Management Platform for Streaming Big Datas-AnalyzerReal-Time Visualization for StreamingOperational Intelligences-AnalyzerReal-Time Visualization for StreamingOperational Intelligences-TransportGeo-Analytics forLocation-basedApplicationss-TransportGeo-Analytics forLocation-basedApplicationss-VisualizerAdvancedVisualizations-VisualizerAdvancedVisualizations-Clouds-Server EC2 AMI Deployments-Clouds-Server EC2 AMI Deployment
  • | 12Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.comStreaming over HadoopHadoop Streaming Integration OptionsHadoop DataIntegrationOptionsStreamRecordAPIStreamPlaybackAPICommentsCustom StreamingQueries overHDFS or HBaseFlumeAPISQL MEDforImpala/Hive• Retrieve time-ordered data and thenperform streaming time/spatial analytics• Goes beyond basic Impala/Hive queryingRestreamingQueries overHDFSFlumeAPIAdapterforHadoop HDFS• Replay complete history of stream thenperform time/spatial analytics• Scenario evaluation and testingHBase DataEnrichmentFlumeAPIAdapterforHadoop HBase• Fast look-ups to enrich streaming data• Also useful for filtering and aggregationHadoop AdhocHistorical QueriesFlumeAPIJDBC SQLImpala/HiveHDFS or HBase• Custom queries over historicaloperational data• Non-streaming, much higher latency
  • | 13Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.comReal-time Operational IntelligenceNon-Streaming versus Streaming ComparisonAPPLICATIONREQUIREMENTOPERATIONAL INTELLIGENCEConventional Non-streamingOPERATIONAL INTELLIGENCEWith Streaming using SQLstreamTime Series Analytics Simplistic answers without time series. Comprehensive times series support.Complex Analysis Simple pattern matching and statistics. Elegantly solves hardest problems.Join & Correlate Does not combine or join streams. Joins data streams in real-time.Enrich & Integrate Does not enrich or integrate data. Gives rich answers in real-time.Big Data Scalability No parallel processing; limitedscalability.Massively parallel, auto-optimizing.Painless, low TCO Very expensive, proprietary, with onlybasic visualization.Low TCO, ANSI/ISO standard queries,rich real-time visualization.
  • | 14Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.comReal-World Example: Download MonitoringMozilla (Google: “Youtube Mozilla Glow”)Real-time monitoring across all download webservers across the world simultaneously.➔CollectRemote agents transform log files into real-time streams➔AnalyzeReal-time analysis & aggregation by location➔ShareContinuous ETL into Hadoop HbaseInternet ‘Glow’ app for real-time visualizationWeb Server Log Files (Remote)Web Server Log Files (Remote)Hadoop HBaseHadoop HBaseStreaming collection, real-timeanalysis and continuousintegration by locationStreaming collection, real-timeanalysis and continuousintegration by location
  • | 15Copyright © 2013 | Big Data on Tap™ | Damian Black | +1 415.326.4261| damian@sqlstream.comOperational IntelligenceOther IndustriesEnivironment &HealthcareEnivironment &HealthcareLocation-basedServicesLocation-basedServicesTelecomms & FinancialTelecomms & FinancialSmart Grid & EnergySmart Grid & Energy Sensors & M2MSensors & M2MSecurity, Social &InternetSecurity, Social &Internet
  • Any Questions?Any Questions?Damian BlackCEO SQLstream+1 415 326 4261damian@sqlstream.com