IBM Stream au Hadoop User Group

  • 3,100 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,100
On Slideshare
0
From Embeds
0
Number of Embeds
8

Actions

Shares
Downloads
38
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Big Data Jerome Chailloux, Big Data Specialistjerome.chailloux@fr.ibm.com © 2011 IBM Corporation
  • 2. Imagine the Possibilities of Analyzing All Available Data Faster, More Comprehensive, Less Expensive Real-time Understand and Traffic Flow Fraud & risk act on customer Optimization detection sentiment Accurate and timely Predict and act on Low-latency network threat detection intent to purchase analysis © 2011 IBM Corporation
  • 3. Where is this data coming from? Every day, the New York Stock Exchange captures 1 TB of trade information. 12 TB of tweets being 5 Billion mobile phones in created each day. use in 2010. Only 12% were smartphones. Every second of HD video More than 30M networked generates > 2,000 times as sensor, growing at a rate many bytes as required to store >30% per year. a single page of text. What is your business doing with it? © 2011 IBM Corporation3 Source: McKinsey & Company, May 2011
  • 4. Why is Big Data important ? Data AVAILABLE to an organization Missed ty ni opportu data an organization can PROCESS Organizations are able to Enterprises are “more blind” process less and less of the to new opportunities. available data. 4 © 2011 IBM Corporation4
  • 5. What does a Big Data platform do ? Analyze a Variety of Information Novel analytics on a broad set of mixed information that could not be analyzed before Analyze Information in Motion Streaming data analysis Large volume data bursts & ad-hoc analysis Analyze Extreme Volumes of Information Cost-efficiently process and analyze petabytes of information Manage & analyze high volumes of structured, relational data Discover & Experiment Ad-hoc analytics, data discovery & experimentation Manage & Plan Enforce data structure, integrity and control to ensure consistency for repeatable queries © 2011 IBM Corporation5
  • 6. Complementary Approaches for Different Use Cases Traditional Approach New Approach Structured, analytical, logical Creative, holistic thought, intuition Data Hadoop Warehouse Streams Transaction Data Web Logs Internal App Data Social Data Structured Structured Unstructured Unstructured Repeatable Enterprise Exploratory Repeatable Integration Exploratory Text Data: emails Mainframe Data Linear Iterative Linear Iterative sentiment Monthly sales reports Brand Profitability analysis Product Sensor data: images strategy OLTP System Data Customer surveys Maximum asset utilization ERP data Traditional New RFID Sources Sources © 2011 IBM Corporation
  • 7. IBM Big Data Strategy: Move the Analytics Closer to the DataNew analytic applications drive the Analytic Applicationsrequirements for a big data platform BI / Exploration / Functional Industry Predictive Content Reporting Visualization App App BI / Analytics Analytics Reporting • Integrate and manage the full IBM Big Data Platform variety, velocity and volume of data Visualization Application Systems • Apply advanced analytics to & Discovery Development Management information in its native form • Visualize all available data for ad- Accelerators hoc analysis • Development environment for Hadoop Stream Data System Computing Warehouse building new analytic applications • Workload optimization and scheduling • Security and Governance Information Integration & Governance © 2011 IBM Corporation
  • 8. Most Client Use Cases Combine Multiple Technologies Pre-processing Ingest and analyze unstructured data types and convert to structured data Combine structured and unstructured analysis Augment data warehouse with additional external sources, such as social media Combine high velocity and historical analysis Analyze and react to data in motion; adjust models with deep historical analysis Reuse structured data for exploratory analysis Experimentation and ad-hoc analysis with structured data © 2011 IBM Corporation
  • 9. IBM is in a lead position to exploit the Big Data opportunity February 2012 “The Forrester Wave™: Enterprise Hadoop Solutions, Q1 2012” Forrester Wave™: Enterprise Hadoop Solutions, Q1 ’12IBM DifferentiationEmbracing Open SourceData in Motion (Streams) and Data at Rest (Hadoop/BigInsights)Tight integration with other Information Management productsBundled, scalable analytics technologyHardened Apache Hadoop for enterprise readiness © 2011 IBM Corporation
  • 10. IBM’s unique strengths in Big Data Big Data in  Ingest, analyze and act on massive volumes of streaming data. Real-Time  Faster AND more cost-effective for specific use cases. (10x volume of data on the same hardware.) Fit for purpose  Analyzes a variety of data types, in their native format – text, analytics geospatial, time series, video, audio & more. Enterprise  Open source enhanced for reliability, performance and security. Class  High performance warehouse software and appliances  Ease of use with end users, admin and development UIs. Integration  Integration into your IM architecture.  Pre-integrated analytic applications. © 2011 IBM Corporation10
  • 11. Stream Computing : What is good for ?Analyze all your data, all the time, just in time What if you could get IMMEDIATE insight? Analytic Results What if you could analyze MORE kinds of data? What if you could do it with exceptional performance? Alerts Threat Prevention SystemsMore context Logging Traditional Data, Sensor Events, Active response Signals Storage and Warehousing © 2011 IBM Corporation 11
  • 12. What is Stream Processing ?  Relational databases and warehouses find information stored on disk  Stream computing analyzes data before you store it  Databases find the needle in the haystack  Streams finds the needle as it’s blowing by © 2011 IBM Corporation
  • 13. Without Streams With Streams • Intensive scripting Streams provide a Productive and Reusable • Embedded SQL Development Environment • File / Storage management by hand • Record management embedded in application code • Data Buffering, Locality • Security • Dynamic Application Composition • High Availability • Application management (checkpointing, Streams Runtime provides your Application performance optimization, monitoring, workload Infrastructure management, error and event handling) • Application tied to specific Hardware, Infrastructure • Multithreading, Multiprocessing • Debugging • Migration from development to production • Integration of best-of-breed commercial tools • Code reusability “TerraEchos developers can deliver • Source / Target interfaces applications 45% faster due to the agility of Streams Processing Language.“ – Alex Philp, TerraEchos IBM and Customer Confidential © 2011 IBM Corporation13
  • 14. Streams © 2011 IBM Corporation
  • 15. How Streams Works Continuous ingestion Infrastructure provides services for Continuous analysis Scheduling analytics across hardware hosts, Establishing streaming connectivity Filter / Sample Transform Annotate Correlate Classify Achieve scale: Where appropriate: By partitioning applications into software components Elements can be fused together By distributing across stream-connected hardware hosts for lower communication latency © 2011 IBM Corporation15
  • 16. Scalable Stream Processing  Streams programming model: construct a graph – Mathematical concept OP OP OP • not a line -, bar -, or pie chart! OP OP OP • Also called a network stream OP • Familiar: for example, a tree structure is a graph – Consisting of operators and the streams that connect them • The vertices (or nodes) and edges of the mathematical graph • A directed graph: the edges have a direction (arrows)  Streams runtime model: distributed processes – Single or multiple operators form a Processing Element (PE) – Compiler and runtime services make it easy to deploy PEs • On one machine • Across multiple hosts in a cluster when scaled-up processing is required – All links and data transport are handled by runtime services • Automatically • With manual placement directives where required © 2011 IBM Corporation16
  • 17. InfoSphere Streams Objects: Runtime View Instance  Instance – Runtime instantiation of InfoSphere Job Streams executing across one or more Node hosts PE PE Stream 1 Stream 2 – Collection of components and services operator  Processing Element (PE) 1 Stream – Fundamental execution unit that is run by the Streams instance PE 3 Stream 4 – Can encapsulate a single operator or Stream many “fused” operators Stream 3 Stream 5  Job Node – A deployed Streams application executing in an instance – Consists of one or more PEs © 2011 IBM Corporation17
  • 18. InfoSphere Streams Objects: Development View  Operator – The fundamental building block of the Streams Streams Application Processing Language – Operators process data from streams and may stream produce new streams operator  Stream – An infinite sequence of structured tuples – Can be consumed by operators on a tuple-by- tuple basis or through the definition of a window height: height: height: 640 1280 640  Tuple width: width: width: – A structured list of attributes and their types. 480 1024 480 Each tuple on a stream has the form dictated data: data: data: by its stream type  Stream type – Specification of the name and data type of each attribute in the tuple directory: directory: directory: directory:  Window "/img" "/img" "/opt" "/img" – A finite, sequential group of tuples filename: filename: filename: filename: – Based on count, time, attribute value, "farm" "bird" "java" "cat" or punctuation marks tuple © 2011 IBM Corporation18
  • 19. What is Streams Processing Language?  Designed for stream computing – Define a streaming-data flow graph – Rich set of data types to define tuple attributes  Declarative – Operator invocations name the input and output streams – Referring to streams by name is enough to connect the graph  Procedural support – Full-featured C++/Java-like language – Custom logic in operator invocations – Expressions in attribute assignments and parameter definitions  Extensible – User-defined data types – Custom functions written in SPL or a native language (C++ or Java) – Custom operator written in SPL – User-defined operators written in C++ or Java © 2011 IBM Corporation19
  • 20. Some SPL Terms port  An operator represents a class of manipulations Aggregate – of tuples from one or more input streams – to produce tuples on one or more output streams  A stream connects to an operator on a port – an operator defines input and output ports Employee Salary Info Statistics Aggregate  An operator invocation – is a specific use of an operator – with specific assigned input and output streams port – with locally specified parameters, logic, etc. TCP  Many operators have one input port and one output port; Source others have File – zero input ports: source adapters, e.g., TCPSource Sink – zero output ports: sink adapters, e.g., FileSink – multiple output ports, e.g., Split Split – multiple input ports, e.g., Join Join  A composite operator is a collection of operators – An encapsulation of a subgraph of • Primitive operators (non-composite) composite • Composite operators (nested) operator – Similar to a macro in a procedural language © 2011 IBM Corporation20
  • 21. Composite Operators  Every graph is encoded as a composite composite Main { – A composite is a graph of one or more operators graph – A composite may have input and output ports stream … { – Source code construct only } • Nothing to do with operator fusion (PEs) stream … { }  Each stream declaration in the composite . . . – Invokes a primitive operator or } – another composite operator Application (logical view)  An application is a main composite – No input or output ports Stream 1 Stream 2 – Data flows in and out but not on operator streams within a graph 1 – Streams may be exported to and Stream imported from other applications Stream 4 3 running in the same instance Stream Stream 3 Stream 5 © 2011 IBM Corporation2121
  • 22. Anatomy of an Operator Invocation  Operators share a common structure Syntax: – <> are sections to fill in stream<stream-type> stream-name = MyOperator(input-stream; …)  Reading an operator invocation { – Declare a stream stream-name logic logic ; – With attributes from stream-type param parameters ; – that is produced by MyOperator output output ; – from the input(s) input-stream window windowspec ; – MyOperator behavior defined by config configuration ; } logic, parameters, windowspec, and configuration; output attribute assignments are specified in output Example:  For the example: stream<rstring item> Sale – Declare the stream Sale with the attribute = Join(Bid; Ask) item, which is a raw string { – Join Bid and Ask streams with window Bid: sliding, time(30); – sliding windows of 30 seconds on Bid, Ask: sliding, count(50); param match: Bid.item == Ask.item and 50 tuples of Ask && Bid.price >= Ask.price; – When items are equal, and Bid price is output Sale: item = Bid.item greater than or equal to Ask price } – Output the item value on the Sale stream © 2011 IBM Corporation2222
  • 23. Streams V2.0 Data Types (any) (primitive) (composite)boolean enum (numeric) timestamp (string) blob (collection) tuple (integral) (floatingpoint) (complex) rstring ustring list set map (signed) (unsigned) (float) (decimal) int8 uint8 float32 decimal32 complex32 int16 uint16 float64 decimal64 complex64 int32 uint32 float128 decimal128 complex128 int64 uint64 © 2011 IBM Corporation23
  • 24. Stream and Tuple Types  Stream type (often called “schema”) – Definition of the structure of the data flowing through the stream  Tuple type definition – tuple<sequence of attributes> tuple<uint16 id, rstring name> • Attribute: a type and a name • Nesting: any attribute may be another tuple type  Stream type is a tuple type – stream<sequence of attributes> stream<uint16 id, rstring name>  Indirect stream type definitions – Fully defined within the output stream declaration stream<uint32 callerNum, … rstring endTime, list<uint32> mastIDs> Calls = Op(…)… – Reference a tuple type CallInfo = tuple<uint32 callerNum, … rstring endTime, list<uint32> mastIDs>; stream<CallInfo> InternationalCalls = Op(…) {…} – Reference another stream stream<uint32 callerNum, … rstring endTime, list<uint32> mastIDs> Calls = Op(…)… stream<Calls> RoamingCalls = Op(…) {…} © 2011 IBM Corporation24
  • 25. Collection Types  list: array with bounds-checking [0, 17, age-1, 99] – Random access: can access any element at any time Ordered, base-zero indexed: first element is someList[0]  set: unordered collection {"cats", "yeasts", "plankton"} – No duplicate element values  map: key-to-value mappings {"Mon":0, "Sat":99, "Sun":-1} – Unordered  Use type constructors to specify element type – list<type>, set<type> list<uint16>, set<rstring> – map<key-type,value-type> map<rstring[3],int8>  Can be nested to any number of levels – map<int32, list<tuple<ustring name, int64 value>>> – {1 : [{"Joe",117885}, {"Fred",923416}], 2 : [{"Max",117885}], -1 : []}  Bounded collections optimize performance – list<int32>[5]: at most 5 (32-bit) integer elements – Bounds also apply to strings: rstring[3] has at most 3 (8-bit) characters © 2011 IBM Corporation25
  • 26. The Functor Operator stream<rstring name,  Transforms input tuples into output uint32 age, tuples uint64 salary> Person = Op(…){} – One input port – One or more output ports stream<rstring name,  May filter tuples uint32 age, – Parameter filter rstring login, – A boolean expression tuple<boolean young, – If true, emit output tuple; boolean rich> info> if false, do not Adult = Functor(Person) {  Arbitrary attribute assignments param – Full-blown expressions filter : age >= 21u; – Including function calls output Adult : – Drop, add, transform attributes login = lower(name), – Omitted attributes auto-assigned info = {young = (age < 30u), rich = (salary > 100000ul)};  Custom logic supported } – logic clause Person Adult name Functor name – May include state age age – Applies to filter and assignments salary login info © 2011 IBM Corporation26
  • 27. The FileSink Operator  Writes tuples to a file  Has a single input port – No output port: data goes to a file, () as Sink = FileSink(StreamIn) { not a Streams stream param  Selected Parameters file : "/tmp/people.dat"; – file format : csv; • Mandatory • Base for relative paths is flush : 20u; data subdirectory } • Directories must already exist File- – flush Sink • Flush the output buffer after a given number of tuples – format • csv: comma-separated values • txt, line, binary, block © 2011 IBM Corporation27
  • 28. Communication Between Streams Applications  Streams jobs exchange data with the outside world – Source- and Sink-type operators – Can also be used between Streams jobs (e.g., TCPSource/Sink)  Streams jobs can exchange data with each other – Within one Streams Instance  Supports Dynamic Application Composition – By name or based on properties (tags) – One job exports a stream; another imports it  Implemented using two new pseudo-operators: Export and Import Job 1 Stream exported by Job 1 and imported by Job 2 oper- source sink ator Export Import Job 2 oper- oper- source sink ator ator © 2011 IBM Corporation28
  • 29. Application Design – Dynamic Stream Properties  API available for toolkit development  Can add/modify/delete – Exported stream properties – Imported stream subscription expression  Dynamic Job Flow Control Bus Pattern – Operators within jobs interpret control stream tuples – Rewire the flow of data from job to job Flow Control TuplesExported [A,B,C]Control Stream Job A Job B Job C Job D Data Stream © 2011 IBM Corporation29
  • 30. Application Design – Dynamic Stream Properties  API available for toolkit development  Can add/modify/delete – Exported stream properties – Imported stream subscription expression  Dynamic Job Flow Control Bus Pattern – Operators within jobs interpret control stream tuples – Rewire the flow of data from job to job Flow Control TuplesExported [A,B,C]Control Stream [A,C,D] Job A Job B Job C Job D Data Stream © 2011 IBM Corporation30
  • 31. Application Design – Multi-job Design Streams Instance: stream1 Job: imagefeeder Job: imagewriter Timestamp + File metadata File metadata Filename Directory- Image- Image- Functor FileSink Scan Source Sink subscription: properties: type == "Image" && name = "Feed", write == “ok" type = "Image", write = “ok"  Application / Job Decomposition – Dynamic Job Submission + Stream Import / Export © 2011 IBM Corporation31
  • 32. Application Design – Multi-job Design Streams Instance: stream1 Job: imagefeeder Job: imagewriter Timestamp + File metadata Image + File metadata Filename File metadata Directory- Image- Image- Functor FileSink Scan Source Sink subscription: properties: type == "Image" && name = "Feed", write == “ok" type = "Image", write = “ok"  Application / Job Decomposition – Dynamic Job Submission + Stream Import / Export © 2011 IBM Corporation32
  • 33. Application Design – Multi-job Design Streams Instance: stream1 Job: imagefeeder Job: imagewriter Timestamp + File metadata Image + File metadata Filename File metadata Directory- Image- Image- Functor FileSink Scan Source Sink Job: greyscaler subscription: properties: type == "Image" && name = "Feed", Greyscale write == “ok" type = "Image", write = “ok" properties: name = “Grey", subscription: type = "Image", name == "Feed" write = “ok"  Application / Job Decomposition – Dynamic Job Submission + Stream Import / Export © 2011 IBM Corporation33
  • 34. Application Design – Multi-job Design Streams Instance: stream1 Job: imagefeeder Job: imagewriter Timestamp + File metadata Image + File metadata Filename File metadata Directory- Image- Image- Functor FileSink Scan Source Sink Job: greyscaler subscription: properties: type == "Image" && name = "Feed", Greyscale write == “ok" type = "Image", write = “ok" properties: Job: resizer name = “Grey", subscription: type = "Image", name == "Feed" write = “ok" Job: facial scan Job: Alerter  Application / Job Decomposition – Dynamic Job Submission + Stream Import / Export © 2011 IBM Corporation34
  • 35. Application Design – Multi-job Design Streams Instance: stream1 Job: imagefeeder Job: imagewriter Job: imagefeeder Job: imagewriter Timestamp + Job: imagefeeder File metadata Image + File metadata Filename Directory- metadata File Image- metadata File File metadata Image- DirReader Scan File metadata Source WriteImage Functor Sink Functor FileSink Sink DirReader Job: greyscaler subscription: properties: type == "Image" && name = "Feed", Greyscale write == “ok" type = "Image", write = “ok" properties: Job: resizer name = “Grey", subscription: type = "Image", name == "Feed" write = “ok" Job: facial scan Job: Alerter  Application / Job Decomposition – Dynamic Job Submission + Stream Import / Export © 2011 IBM Corporation35
  • 36. Two Styles of Export/Import  Publish and subscribe (Recommended approach): – The exporting application publishes a stream with certain properties – The importing stream subscribes to an exported stream with properties satisfying a specified condition  Point to point: – The importing application names a specific stream of a specific exporting application  Dynamic publish and subscribe: – Export properties and Import expressions can be altered during the execution of a job – Allows dynamic data flows – Alter the flow of data based on the data (history, trends, etc.)() as ImageStream = Export(ImagesIn) { stream<IplImage image, rstring filename, param properties : { rstring directory> ImagesIn = streamName = "ImageFeed", Import() { dataType = "IplImage", param subscription : writeImage = "true"}; dataType == "IplImage" &&} writeImage == "true"; } © 2011 IBM Corporation36
  • 37. Parallelization Patterns – Introduction  Problem Statement – Series of operations to be performed on a piece of data (a tuple) – How to improve performance of these operations?  Key Question – Reduce latency? • For a single piece of data – Increase throughput? • For the entire data flow  Three possible design patterns – Serial Path – Parallel Operators (Task Parallelization) – Parallel Paths (Data Parallelization) © 2011 IBM Corporation37
  • 38. Parallelization Patterns – Pipeline, Task  Pipeline (serial path) A B C D – Base pattern: inherent in graph paradigm – Results arrive at D in time T(A) + T(B) + T(C)  Parallel operators (task parallelization) A B M D C – Process the tuple in operators A, B, and C at the same time – Requires merger (e.g., Barrier) before operator D – Results arrive at D in time Max(T(A),T(B),T(C)) + T(M) – Use when tuple latency requirement < T(A) + T(B) + T(C) – Complexity of merger depends on behavior of operators A, B, and C © 2011 IBM Corporation38
  • 39. Parallelization Patterns – Parallel Pipelines  Parallel pipelines (data parallelization) A B C A B C D A B C – Migration step from pipeline patttern – Can improve throughput • Especially good for variable-size data / processing time  Design Decisions – Are there latency and/or throughput requirements? – Do the operators perform filtering, feature extraction, transformation? – Is there an execution order requirement? – Is there a tuple order requirement?  Recommend Pipeline  Parallel Pipelines when possible © 2011 IBM Corporation39
  • 40. Application Design – Multi-tier Design Transport Processing / Transport Ingestion Reduction Transformation Adaptation Analytics Adaptation Transport Processing / Transport Ingestion Adaptation Analytics Adaptation Examples  N-tier design – Number and purpose of tiers is a result of Application Design  Create well-defined interfaces between the tiers  Supports several overarching concepts – Incremental development / testing – Application / Job / Operator reuse – Modular programming practices  Each tier in these examples may be made up of one or more jobs (programs) © 2011 IBM Corporation40
  • 41. Application Design – High Availability  HA application design pattern – Source job exports stream, enriched with tuple ID – Jobs 1 & 2 process in parallel, and export final streams – Sink job imports streams, discards duplicates, alerts on missing tuples Job 1 Job 1 Job 1 Job 1 Sink Sink Host pool 1 Job 1 Job 1 Job 1 Job 1 Job 2 Job 2 Host pool 2 Source Source Job 2 Job 2 Job 2 Job 2 Job 2 Job 2 Host pool 3 x86 host x86 host x86 host x86 host x86 host Host pool 4 © 2011 IBM Corporation41
  • 42. Application Design – High Availability  HA application design pattern – Source job exports stream, enriched with tuple ID – Jobs 1 & 2 process in parallel, and export final streams – Sink job imports streams, discards duplicates, alerts on missing tuples Job 1 Job 1 Job 1 Job 1 Sink Sink Host pool 1 Job 1 Job 1 Job 1 Job 1 Source Source Job 2 Job 2 Host pool 2 Job 2 Job 2 Job 2 Job 2 Job 2 Job 2 x86 host Host pool 3 x86 host x86 host x86 host x86 host Host pool 4 © 2011 IBM Corporation42
  • 43. IBM InfoSphere Streams Agile Development Distributed Runtime Sophisticated Analytics Environment Environment with Toolkits & Adapters Front Office 3.0  Toolkits  Database  Advanced Text  Mining  Geospatial  Clustered runtime for  Financial  Timeseries massive scalability  Standard  Messaging  RHEL v5.x and v6.x,  Internet  ...Eclipse IDE  BigData  User-defined CentOS v6.xStreams Live Graph  x86 & Power multicore • HDFS • DataExplorerStreams Debugger hardware  Ethernet & InfiniBand  Over 50 samples © 2011 IBM Corporation
  • 44. Toolkits and Operators to Speed and Simplify Development Standard Toolkit Internet Toolkit Relational Operators InetSource Filter Sort HTTP FTP HTTPS Functor Join FTPS RSS file Punctor Aggregate Adapter Operators Database Toolkit FileSource UDPSource ODBCAppendODBCEnrich FileSink UDPSink ODBCSource SolidDBEnrich DirectoryScan Export DB2SplitDB DB2PartitionedAppend TCPSource Import Supports: DB2 LUW, IDS, solidDB, TCPSink MetricsSink Netezza, Oracle, SQL Server, MySQL Utility Operators Custom Split  Financial Toolkit Beacon DeDuplicate Throttle Union  Data Mining Toolkit Delay ThreadedSplit  Big Data toolkit Barrier DynamicFilter Pair Gate  Text Toolkit JavaOp  ….. Standard toolkit contains the  User-Defined Toolkits default operators shipped with the  Extend the language by adding product user-defined operators and functions © 2011 IBM Corporation44
  • 45. User Defined Toolkits  Streams supports toolkits – Reusable sets of operators and functions – What can be included in a toolkit? • Primitive and composite operators • Native and SPL functions • Types • Tools/documentation/samples/data, etc. – Versioning is supported – Define dependencies on other versioned assets (toolkits, Streams) – Create cross-domain and domain-specific accelerators © 2011 IBM Corporation 4545
  • 46. © 2011 IBM Corporation46
  • 47. A quick peek inside …InfoSphere Streams Instance – Single Host Management Services & Applications Streams Web Service (SWS) Streams Application Manager (SAM) Streams Resource Manager (SRM) Authorization and Authentication Service (AAS) Scheduler Recover DB Name Server Host Controller Processing Element Container File System © 2011 IBM Corporation
  • 48. A quick peek inside …InfoSphere Streams Instance – Multi host, Management Services on separate node Management Services Streams Web Service (SWS) Streams Application Manager (SAM) Streams Resource Manager (SRM) Authorization and Authentication Service (AAS) Scheduler Recover DB Name Server Shared File System Application Host Application Host Application Host Host Controller Host Controller Host Controller Processing Element Processing Element Processing Element Container Container Container © 2011 IBM Corporation
  • 49. A quick peek inside …InfoSphere Streams Instance – Multi host, Management Services on multiple hosts Management Management Management Streams Web Service AAS Recovery DB Management Management Application Host Streams App Manager Scheduler Host Controller Processing Element Management Management Container Streams Resource Mgr Name Server Shared File System Application Host Application Host Application Host Application Host Host Controller Host Controller Host Controller Host Controller Processing Element Processing Element Processing Element Processing Element Container Container Container Container © 2011 IBM Corporation