StreamCentral Technical Overview
Upcoming SlideShare
Loading in...5
×
 

StreamCentral Technical Overview

on

  • 315 views

Rapidly operationalize the use of data by designing, building and running real-time business intelligence and big data solutions with StreamCentral

Rapidly operationalize the use of data by designing, building and running real-time business intelligence and big data solutions with StreamCentral

Statistics

Views

Total Views
315
Views on SlideShare
314
Embed Views
1

Actions

Likes
0
Downloads
5
Comments
0

1 Embed 1

https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

StreamCentral Technical Overview StreamCentral Technical Overview Presentation Transcript

  • A trusted partner Business Powered By Data
  • What it takes to build Real-Time Operational Intelligence and Big Data solutions Row level data security manual development Massively Parallel Processing Systems Example: Vertica, GreenPlum, Neteeza, ParStream NoSQL Databases Example: MongoDB, Amazon DynamoDB, Cassandra Relational Databases Example: Microsoft SQL Server, IBM DB2, Oracle, Sybase OLAP Example: Microsoft SSAS, Cognos Powerplay NewSQL Example: NuoDB Discovery & Analysis Tableau, QlikView, Cognos, SiSense Reporting (Many) Event Stream Processing – Developer focused (Tibco, Microsoft, IBM) Streaming data consumption (APIs, Enterprise Service Bus)Static data (Connectors) Data Mining R, SAS, SPSS Custom Applications Data Transformation (Ascential Software, Cognos, Microsoft Integration Services) Store & Manage Data Process Data Structured Semi Structured Unstructured Data Access & Visualization Acquire Data Generate Insights (Correlation, KPIs, Data Denormalization) Manual Custom Development Hadoop Example: HortonWorks, Cloudera Data Access & Security Real-time and historical data publishing - manual development API Data Export Database Design and Development Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events... Pre work
  • Row level data security manual development Massively Parallel Processing Systems Example: Vertica, GreenPlum, Neteeza, ParStream NoSQL Databases Example: MongoDB, Amazon DynamoDB, Cassandra Relational Databases Example: Microsoft SQL Server, IBM DB2, Oracle, Sybase OLAP Example: Microsoft SSAS, Cognos Powerplay NewSQL Example: NuoDB Discovery & Analysis Tableau, QlikView, Cognos, SiSense Reporting (Many) Event Stream Processing – Developer focused (Tibco, Microsoft, IBM) Streaming data consumption (APIs, Enterprise Service Bus)Static data (Connectors) Data Mining R, SAS, SPSS Custom Applications Data Transformation (Ascential Software, Cognos, Microsoft Integration Services) Store & Manage Data Process Data Structured Semi Structured Unstructured Data Access & Visualization Acquire Data Generate Insights (Correlation, KPIs, Data Denormalization) Manual Custom Development Hadoop Example: HortonWorks, Cloudera Data Access & Security Real-time and historical data publishing - manual development API Data Export Database Design and Development Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events... Pre work Innovations in Big Data technologies over the last 5 years
  • Row level data security manual development Massively Parallel Processing Systems Example: Vertica, GreenPlum, Neteeza, ParStream NoSQL Databases Example: MongoDB, Amazon DynamoDB, Cassandra Relational Databases Example: Microsoft SQL Server, IBM DB2, Oracle, Sybase OLAP Example: Microsoft SSAS, Cognos Powerplay NewSQL Example: NuoDB Discovery & Analysis Tableau, QlikView, Cognos, SiSense Reporting (Many) Event Stream Processing (Tibco, Microsoft, IBM) Streaming data consumption (APIs, Enterprise Service Bus)Static data (Connectors) Data Mining R, SAS, SPSS Custom Applications Data Transformation (Ascential Software, Cognos, Microsoft Integration Services) Store & Manage Data Process Data Structured Semi Structured Unstructured Data Access & Visualization Acquire Data Generate Insights (Correlation, KPIs, Data Denormalization) Manual Custom Development Hadoop Example: HortonWorks, Cloudera Data Access & Security Real-time and historical data publishing - manual development API Data Export Database Design and Development Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events... Pre work Challenging bits not addressed in this innovation cycle This causes: • Lots of systems integration of point solutions • Custom code • Specialist skills • Hard to change and evolve
  • Rapidly industrialize the use of data by designing, building and running real- time business intelligence and big data solutions with StreamCentral. Solution Designer (Data Consumption, data transformations, conditions, event, correlation) Workbench – Easy to Design Security Designer Systems ManagementAPI Designer Meta Data Manager Information Warehouse Manager – Auto Build De normalized schema generation for data marts Security schema generation Normalized schema generation for Fact and Dimensions Auto generate database design, auto generate database and application code, infer relationships in data BI Server – Run with scale Data Processing Analytic Applications BI / Reporting Data Exploration / Viisualization Functional Application Event Driven Predictive Analytics Industry Application Association Analysis Data Collection Business Event Detection Data Publishing - SQL Server, Vertica, MongoDB Data Export Caching
  • Putting it together – High impact real-time solutions in fraction of the time StreamCentral auto builds security infrastrucure Massively Parallel Processing Systems Vertica NoSQL Databases MongoDB Relational Databases Microsoft SQL Server Discovery & Analysis Tableau, QlikView, Cognos, SiSense Reporting (Many) Built in StreamToMe API (Stream any data from any application or device to StreamCentral) Static data (Connectors) Data Mining R, SAS, SPSS Custom Applications Store & Manage Data Process Data Structured Semi Structured Unstructured Data Access & Visualization Acquire Data Hadoop Data Access & Security StreamCentral Built in API builder API Data Export Database Development - StreamCentral auto generates database design and database code StreamCentral Workbench – No coding required -- Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events... ) – For a broad set of people with varying technical skills Pre work Event Stream Processing (No coding)Data Transformation (No coding) Generate Insights (Correlation, KPIs, Data Denormalization) (No coding) StreamCentral + Big Data
  • • Massively Parallel Processing architecture • Distributed processing • Scale out and distribute any component of StreamCentral independently on commodity hardware • Integrates with best of breed database technologies Collector Service Processing Service Business Event Service Data Pubishing Service Cache Service StreamCentral BI Server Scalability
  • Data available via StreamCentral Processed Source Data • Data Validation • Association to entities • Evaluated for conditions • Time and location standardization • Custom dimension standardization Single Event Stream • Correlated data across multiple data sources • Event detection based on condition evaluation Event Analysis Data Marts • Data mart built on highly correlated data • Updated real-time • Analyze multiple events and conditions • Bring together relevant data 360 o Analysis Data Marts • Data mart build on loosely correlated data • Updated periodically • Analyze any data Real-time Push Historical Pull API Access: Real-time Push Historical Pull API Access: Real-time Push Historical Pull API Access: Historical Pull API Access: Database Access: Historical Pull Database Access: Historical Pull Database Access: Historical Pull Database Access: Historical Pull
  • Example Big Data Solutions: Telco Telco’s Core IMS Network Data Data, Voice & Video Performance Data Data, Voice & Video Performance Data Data from Telco Towers Weather Data Traffic IncidentsPopulation Data Data Stream weatherundergrou nd MapquestUSA Today Census data Sources of real time streaming data from networks, devices, services and other internal applications External sources of data that add understanding of what’s happening when events are detected Network Test New Service – Investment Planning Adaptive Bit Rate – Video Streaming QoE 360o Customer QoE for 1st Level customer service Video QoE for IPTV Business Solutions 10 New revenue sources from marketing operations Service Disruption
  • Making changes to definitions • StreamCentral allows updates to data sources, entities, dimensions, rules for conditions, event detection rules and data mart definitions • When changes are made using the Workbench updates the schema change information in the StreamCentral meta data database. It also makes changes to the underlying database schema • Configuration data for all services running within StreamCentral is also in the distributed cache. The next step is to update this distributed cache. The cache then notifies the various services of the updates in schema definition • Correlation and the publishing engine evaluate the schema changes and make the appropriate changes to their in-memory data before sending the data to the database • Roll back is built in to account for errors
  • Many point solutions from multiple vendors High learning curve Maximum time spent integrating Manual design and coding Many steps to solution Older technology Years to Value = High Risk, = High Cost Agilityinmeetingchangingcustomerneedsinreal-time Data Real-timeorHistorical|Streamingorbatch|Structuredorunstructured Business Analysis Detailed Solution Design Manual Database Design Database Development CEP - Development Platform Enterprise Service Bus Traditional ETL tools Application Development Workbench – Business Solutions Designer Consume data, design transformations, conditions, events, analytics, security, APIs to export and share data Information Warehouse Manager Auto generate design, auto generate code, infer relationships, reduce manual design BI Server Built-in Event Processing, high speed data processing, scalable, secure, run on modern database platforms Traditional Pre-work Data Acquisition, Transformation and Enrichment Data Correlation & Event Mgmt Analytics & insight specific data marts Data Level Security Export Enriched Data & Real Time Analytics High Automation No coding required Contains multiple components that work together (ETL, CEP, data mart builder, location intelligence and more) Fewer steps to solution Modern technology Weeks to Value = Low Risk, = Reduced Cost StreamCentral advantage: Agility to change how you use data in real-time Risk Value Current technology and approach StreamCentral Risk Value Time Time
  • StreamCentral Concepts
  • Definitions of key concepts in StreamCentral.. • Entity: An entity represents a group of people or groups of things, that incoming data is directly connected to. Examples include departments, customers, site, products etc. By defining entities you tell StreamCentral how distributed data is connected to things core to your business • Data Source: StreamCentral can pull data from a variety of sources using standard web interfaces and data can also be streamed directly to StreamCentral API for processing purposes by devices, sensors, applications and services • Dimension: Common attributes in a variety of data sources that can be used to categorize and analyze data
  • Definitions of key concepts in StreamCentral.. • Conditions: A condition is a rule based measurement that is applied to incoming data. A condition has three parts to it : The Condition Name (example Voice Quality), Condition Range (Range of quality from Hard to hear, poor, average, toll quality, excellent) and Condition KPI (for example a RED KPI would be when the ranges are Hard to hear and Poor). Individual conditions can be grouped together in a conditions set which can then be used to detect events as an aggregate • Events: An event happens when patterns of multiple conditions with specific ranges from different data streams and environmental data sources are detected as the data streams in. While StreamCentral allows sophisticated rule based event detection, it goes further than that. StreamCentral auto builds a data mart around the event that consists of a variety of context around the event like entities, environmental data, dimensions and detailed data from data sources
  • 16 Insight Who (entities like customer, patient) When (time) Where (location) What (streaming & static data correlation) Generating insights from data requires context to be added to the data. This context is a continuous thread that connects all types of data throughout the BI Solution lifecycle. Four typical examples of context.. • StreamCentral automatically builds and maintains time and location dimensions • Entities like customer, department, site can be created and defined in StreamCentral. Entity data can be imported for initial load and continuously kept in sync • All incoming data in StreamCentral is continuously and automatically connected to time, location and defined entities • Resultant real-time events and analytical data marts automatically inherit this context without need for any programming or development work Converting data to insights by continuously adding context
  • Types of data sources: Regular • Data sources used to measure performance • Examples include data from that will be measured for conditions, ranges and events • This data can be connected to entities directly – For example data from a device can be connected to a customer or sales data can be connected to a product and a customer • Can be used in correlation, event detection and data marts
  • Types of data sources : Environmental • This source of data is used to add context and measure performance – These are also called environmental data sources • Example typically include external data that adds context about external factors in play • Does not have to be connected to the entities directly. StreamCentral will use implicit relations with time and location dimension to tie environmental data to other enterprise data. For example, consider an environmental data source called weather. Weather has location information associated with it. There are two entities namely “Customer” and “Tower”. Both also have location information associated with them. StreamCentral standardizes all three to the location dimension but StreamCentral also implicitly connects Customer to weather and Tower to weather because weather was created as an environmental data source. Now when analyzing data, StreamCentral will be able to provide real-time or historical context as to what the weather is where the customer is and what the weather is where the tower is • Great to use in data marts for analyzing associations with other data • Can be used in event detection as part of conditions set and to evaluate events
  • A note on time and location data • StreamCentral auto creates time and location dimensions. • Extended data types allow very specific association of a variety of time and location based attributes • Data types can be assigned to attributes in entities, regular data sources and environmental data sources • For every incoming attribute that is associated with one of the special time or location data types, StreamCentral looks to see if a specific record for that data already exists in the dimension. If not, it creates a new record for that value. If it exists already, then the key value of that data is substituted in the data source • Time and location data is stored in the database and in the distributed cache though the real-time lookups are done against the data stored in the cache • StreamCentral can dynamically feed time or location data to REST or SOAP based web services from these dimensions • StreamCentral supports standardizing location data for any geographic level and supports ability to standardize for specific radius
  • Types of data outputs available from StreamCentral • Processed Source Data – Once real-time streaming data or static data via scheduled pull is received by StreamCentral, it is validated, evaluated for conditions and associations to entities and dimensions like time and location are made, the data is available to be published • Event data – Processed data is evaluated for events. If event is detected then event data along with its associated context is available as a real-time stream. In addition, StreamCentral builds a data mart just for this event. Access to historical data for an event is also available • Events data mart analysis – Custom data marts that evaluate multiple events and the conditions that were recorded when the events were detected are available via events data mart. Historical access is available • Aggregate 360 degree data mart analysis – Bring disparate data together that is standardized to common themes and StreamCentral automatically builds a scalable data mart structure for this data
  • Type of data available Real-Time access method Historical access method Processed Source Data • ActiveMQ Messages, JMS based hornetQ, OracleQ, Microsoft based MSMQ • WCF based Pub/ Sub model • Format options - XML/JSON • REST API – Format options XML/JSON • Method Name: getFactualData • Input parameters: source name, filter parameters (location, time), numOfRecords Event Data with context • ActiveMQ Messages, JMS based hornetQ, OracleQ, Microsoft based MSMQ • WCF based Pub/ Sub model • Format options - XML/JSON • REST API – Format options XML/JSON • Method Name:getEventData • Input parameters: Event name or id, filter parameters (location, time), entity Id array ,numOfRecords Events Data Mart • ActiveMQ Messages, JMS based hornetQ, OracleQ, Microsoft based MSMQ • WCF based Pub/ Sub model • Format options - XML/JSON • REST API – Format options XML/JSON • Method Name: getAnalysisData • Input parameters: analysis collection name or id, filter parameters (location, time), entity Id array ,numOfRecords
  • Choosing the right technology for visualization• Don’t select a delivery technology for these reasons – Best to use StreamCentral • Centralize business logic in one place – use many tools to deliver the insight • Definition of KPIs • Rules for events • Alias’s for data attributes • Connectivity and transformation requirements of source data • Adding context to data • Select one or more delivery technologies for these reasons • Performance (in-memory aggregation) • Cross browser support, support for various tablets and mobile device platforms • Broad portfolio of charts and visualizations • Highly interactive • Ability to be integrated in portals for internal (employees) or external (partners or customers) consumption • Standards based like HTML5 and CSS3 • Can be hosted in a SaaS model
  • Data Security
  • StreamCentral Database Workbench administrator defines roles and specifies data access rules. Assign users to roles. StreamCentral builds and manages meta data for row level access • Centralize data security with StreamCentral • Custom applications and analytical/reporting tools only pass user id as part of their query to StreamCentral database. • Two types of row level security: 1. Underlying fact data based on dimensions (like time, location) and entities (like customer, department, site) 2. Denormalized aggregated data based on and/or rules StreamCentral row level security layer Managing row-level data security
  • Factual tables of StreamCentral Database Security tables of Stream Central DB StreamCentralSecurity ScrtyRoleID Role Processing Tables in StreamCentral Stream Central Database (MS SQL / HP Vertica) StreamCentral Metadata DB Workbench Administrator will manage data security by creating data access rule for Roles and assigning Users to Roles For data accessed from Stream Central Database via reporting / analytical Tools or API , Stream Central will determine the data access permission for that user Stream Central Workbench
  • Distributed Caching • Storing Time and Location dimension data for fast lookups and data standardization • Maintaining configuration information about the system which aids in managing updates to definitions • Storing entity data required for adding context to incoming data • Managing correlation of real-time data • Managing event detection • Processed data formatted to data mart specification • Managing batch data inserts into the database
  • Availability
  • OUTSIDE NETWORK .... CACHE CLUSTER Microsoft AppFabric Cache is a distributed caching technology that allows the cache to be high available by configuring more than one servers to participate in storing cache data which is often called as Cache Cluster. Software Network Load Balancing (NLB) Microsoft IIS Web server configured in Software NLB provided by Microsoft Windows Server allows all Websites to be highly available. Microsoft Message Queue persists unread messages in the queue in the event of sudden server shutdown. The physical hardware is available for clustering to ensure fail over in case of hardware failure Web Application StreamCentral Public API Workbench Application Reports / analytics Messaging Inbound Message Queue Publish Message Queue Processing Service Correlation Service Publish Service Workbench Database (StreamCentral MetaData) StreamCentral Database (Fact and aggregate data – Vertica/ MS SQL Server) Processing Engine, Correlation Engine, Publish Engine can be made to run on multiple physical servers to make these services always highly available. StreamCentral High Availability
  • Thank you for your time Raheel Retiwalla CTO - Virtus IT Ltd E: raheel.retiwalla@virtus-it.com M: +1 617 901 8370 A trusted partner29