This document provides an introduction to the WSO2 Analytics Platform. It discusses how the platform allows users to collect data from various sources using a sensor API, then perform analysis on the data through both batch and real-time means. Batch analysis uses technologies like Apache Spark and Hadoop to perform tasks like finding averages, max/min, and building KPIs. Real-time analysis uses complex event processing to run queries over streaming data and detect patterns. The platform also enables predictive analytics using machine learning algorithms and anomaly detection. Results are then communicated through dashboards and alerts.
Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateSrinath Perera
In this talk, we will discuss about the WSO2 Data Analytics platform that brings together all the technologies into one platform. It lets you collect data through a one sensor API, process it using batch, realtime or predictive technologies and communicate your results all within a single platform and user experience.
More details https://iwringer.wordpress.com/2015/03/18/introducing-wso2-analytics-platform-note-for-architects/
This slide deck provides an overview to WSO2 Big data platform and discuss some of its customer case studies and applications. It discuss Big Data in general, real time analytics WSO2 CEP, batch analytics WSO2 BAM, and new products like predictive analytics with WSO2 Machine Learner. For more information, please reach us though architecture@wso2.org.
Introduction to Large Scale Data Analysis with WSO2 Analytics PlatformSrinath Perera
Large scale data processing analyses and makes sense of large amounts of data. Spanning many fields, Large scale data processing brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture. Some usecases like Urban Planning can be slow, which is done in batch mode, while others like stock markets need results within Milliseconds, which are done in streaming fashion. Predictive analytics let us learn models from data often providing us ability to predict the outcome of our actions.
WSO2 Data analytics platform is fast and scalable platform that is being used by more than 40 organizations including Banks, Financial Institutions, Smart Cities, Hospitals, Media Companies, Telecom Companies, State and Federal Governments, and High Tech companies. This talk will start with a discussion on large scale data analysis. Then we will look at WSO2 Data analytics platform and discuss in detail how we can use the platform to build end to end Big data applications combining power of batch processing, real-time analytics, and predictive technologies.
Stratio Streaming is the result of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP engine as complex event processing engine.
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeDataWorks Summit
The business value of data decreases rapidly after it is created, particularly in use cases such as fraud prevention, cybersecurity, and real-time system monitoring. The high-volume, high-velocity datasets used to feed these use cases often contain valuable, but perishable, insights that must be acted upon immediately.
In order to maximize the value of their data enterprises must fundamentally change their approach to processing real-time data to focusing reducing their decision latency on the perishable insights that exist within their real-time data streams. Thereby enabling the organization to act upon them while the window of opportunity is open.
Generating timely insights in a high-volume, high-velocity data environment is challenging for a multitude of reasons. As the volume of data increases, so does the amount of time required to transmit it back to the datacenter and process it. Secondly, as the velocity of the data increases, the faster the data and the insights derived from it lose value.
In this talk, we will present a solution based on Apache Pulsar Functions that significantly reduces decision latency by using probabilistic algorithms to perform analytic calculations on the edge.
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2
Abundant data is all around. The most important aspect is how you as an organization can access the data, process it, and present information to the relevant authorities on time. To gain competitive advantage the means of accessing, processing and presenting the data should be optimal, highly available and scalable.
In this talk, we will discuss how you can leverage WSO2 Data Analytics Server, WSO2 IoT Server, WSO2 Enterprise Service Bus and other WSO2 products in order to analyze the data. We will also discuss different deployment patterns that can provide you with a suitable solution that lets you analyze relevant data historically, in real-time or interactively and predict future states to make better decisions for your organization’s success.
Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateSrinath Perera
In this talk, we will discuss about the WSO2 Data Analytics platform that brings together all the technologies into one platform. It lets you collect data through a one sensor API, process it using batch, realtime or predictive technologies and communicate your results all within a single platform and user experience.
More details https://iwringer.wordpress.com/2015/03/18/introducing-wso2-analytics-platform-note-for-architects/
This slide deck provides an overview to WSO2 Big data platform and discuss some of its customer case studies and applications. It discuss Big Data in general, real time analytics WSO2 CEP, batch analytics WSO2 BAM, and new products like predictive analytics with WSO2 Machine Learner. For more information, please reach us though architecture@wso2.org.
Introduction to Large Scale Data Analysis with WSO2 Analytics PlatformSrinath Perera
Large scale data processing analyses and makes sense of large amounts of data. Spanning many fields, Large scale data processing brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture. Some usecases like Urban Planning can be slow, which is done in batch mode, while others like stock markets need results within Milliseconds, which are done in streaming fashion. Predictive analytics let us learn models from data often providing us ability to predict the outcome of our actions.
WSO2 Data analytics platform is fast and scalable platform that is being used by more than 40 organizations including Banks, Financial Institutions, Smart Cities, Hospitals, Media Companies, Telecom Companies, State and Federal Governments, and High Tech companies. This talk will start with a discussion on large scale data analysis. Then we will look at WSO2 Data analytics platform and discuss in detail how we can use the platform to build end to end Big data applications combining power of batch processing, real-time analytics, and predictive technologies.
Stratio Streaming is the result of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP engine as complex event processing engine.
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeDataWorks Summit
The business value of data decreases rapidly after it is created, particularly in use cases such as fraud prevention, cybersecurity, and real-time system monitoring. The high-volume, high-velocity datasets used to feed these use cases often contain valuable, but perishable, insights that must be acted upon immediately.
In order to maximize the value of their data enterprises must fundamentally change their approach to processing real-time data to focusing reducing their decision latency on the perishable insights that exist within their real-time data streams. Thereby enabling the organization to act upon them while the window of opportunity is open.
Generating timely insights in a high-volume, high-velocity data environment is challenging for a multitude of reasons. As the volume of data increases, so does the amount of time required to transmit it back to the datacenter and process it. Secondly, as the velocity of the data increases, the faster the data and the insights derived from it lose value.
In this talk, we will present a solution based on Apache Pulsar Functions that significantly reduces decision latency by using probabilistic algorithms to perform analytic calculations on the edge.
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2
Abundant data is all around. The most important aspect is how you as an organization can access the data, process it, and present information to the relevant authorities on time. To gain competitive advantage the means of accessing, processing and presenting the data should be optimal, highly available and scalable.
In this talk, we will discuss how you can leverage WSO2 Data Analytics Server, WSO2 IoT Server, WSO2 Enterprise Service Bus and other WSO2 products in order to analyze the data. We will also discuss different deployment patterns that can provide you with a suitable solution that lets you analyze relevant data historically, in real-time or interactively and predict future states to make better decisions for your organization’s success.
Streamlio and IoT analytics with Apache PulsarStreamlio
To keep up with fast-moving IoT data, you need technology that can collect, process and store data with performance and scalability. This presentation from Data Day Texas looks at the technology requirements and how Apache Pulsar can help to meet them.
Big Data Analysis : Deciphering the haystack Srinath Perera
A primary outcome of Bigdata is to derive useful and actionable insights from large or challenges data collections. The goal is to run the transformations from data, to information, to knowledge, and finally to insights. This includes calculating simple analytics like Mean, Max, and Median, to derive overall understanding about data by building models, and finally to derive predictions from data. Some cases we can afford to wait to collect and processes them, while in other cases we need to know the outputs right away. MapReduce has been the defacto standard for data processing, and we will start our discussion from there. However, that is only one side of the problem. There are other technologies like Apache Spark and Apache Drill graining ground, and also realtime processing technologies like Stream Processing and Complex Event Processing. Finally there are lot of work on porting decision technologies like Machine learning into big data landscape. This talk discusses big data processing in general and look at each of those different technologies comparing and contrasting them.
These slides were designed for Apache Hadoop + Apache Apex workshop (University program).
Audience was mainly from third year engineering students from Computer, IT, Electronics and telecom disciplines.
I tried to keep it simple for beginners to understand. Some of the examples are using context from India. But, in general this would be good starting point for the beginners.
Advanced users/experts may not find this relevant.
Our journey with druid - from initial research to full production scaleItai Yaffe
Here at the Nielsen Marketing Cloud we use druid.io (http://druid.io/) as one of our main data stores, both for simple counts and for approximate count-distinct (DataSketches).
It’s been more than a year since we started using it, injecting billions of events each day to multiple druid clusters for different use-cases.
In this meet-up, we will share our journey, the challenges we had, the way we overcame them (at least most of them) and the steps we made to optimize the process around Druid to keep the solution cost effective.
Before diving into Druid, we will briefly present our data pipeline architecture, starting from the front-end serving system, deployed in number of geo-locations, to a centralized Kafka cluster in the cloud, and give some examples of the different processes that consume from Kafka and feed our different data sources.
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016cdmaxime
What we do:
-We build a system for the operation of modern data centers
-Triage and diagnostics, exploration, trends, advanced analytics of complex systems
-Our data: logs, metrics, human activity, anything that occurs in the data center
-Enterprise Software (i.e. we build for others.)
Today's presentation: how we built what we built on top of Apache Hadoop
First presentation for Savi's sponsorship of the Washington DC Spark Interactive. Discusses tips and lessons learned using Spark Streaming (24x7) to ingest and analyze Industrial Internet of Things (IIoT) data as part of a Lambda Architecture
Strata London 16: sightseeing, venues, and friendsNatalino Busa
Which venues have similar visiting patterns? How can we detect when a user is on vacation? Can we predict which venues will be favorited by users by examining their friends' preferences? Natalino Busa explains how these predictive analytics tasks can be accomplished by using Spark SQL, Spark ML, and just a few lines of Scala code.
My slides from the Big Data Applications meetup on 27th of July, talking about FlinkML. Also some things about open-source ML development and an illustation of interactive Flink machine learning with Apache Zeppelin.
In this talk, we will discuss the technical and non-technical challenges faced in designing a system used to obfuscate LinkedIn member data stored in Hadoop at scale.
To assist in this task, we built WhereHows (OpenSource) which serves as a data discovery and metadata catalog for all the datasets at LinkedIn. We integrated WhereHows with a compliance monitoring tool which uses Machine Learning to identify datasets with PII (Personal identity Information). We will cover the building blocks of the system and the lessons learned in the process.
Audience takeaways:
- What is takes to obfuscate member data at scale
- The challenges involved in designing the system
- The lessons learned throughout this journey
- What not to assume!
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...Spark Summit
The central premise of DataXu is to apply data science to better marketing. At its core, is the Real Time Bidding Platform that processes 2 Petabytes of data per day and responds to ad auctions at a rate of 2.1 million requests per second across 5 different continents. Serving on top of this platform is Dataxu’s analytics engine that gives their clients insightful analytics reports addressed towards client marketing business questions. Some common requirements for both these platforms are the ability to do real-time processing, scalable machine learning, and ad-hoc analytics. This talk will showcase DataXu’s successful use-cases of using the Apache Spark framework and Databricks to address all of the above challenges while maintaining its agility and rapid prototyping strengths to take a product from initial R&D phase to full production. The team will share their best practices and highlight the steps of large scale Spark ETL processing, model testing, all the way through to interactive analytics.
Introducing Kafka Connect and Implementing Custom ConnectorsItai Yaffe
Kobi Hikri (Independent Software Architect and Consultant):
Kobi provides a short intro to Kafka Connect, and then shows an actual code example of developing and dockerizing a custom connector.
Streamlio and IoT analytics with Apache PulsarStreamlio
To keep up with fast-moving IoT data, you need technology that can collect, process and store data with performance and scalability. This presentation from Data Day Texas looks at the technology requirements and how Apache Pulsar can help to meet them.
Big Data Analysis : Deciphering the haystack Srinath Perera
A primary outcome of Bigdata is to derive useful and actionable insights from large or challenges data collections. The goal is to run the transformations from data, to information, to knowledge, and finally to insights. This includes calculating simple analytics like Mean, Max, and Median, to derive overall understanding about data by building models, and finally to derive predictions from data. Some cases we can afford to wait to collect and processes them, while in other cases we need to know the outputs right away. MapReduce has been the defacto standard for data processing, and we will start our discussion from there. However, that is only one side of the problem. There are other technologies like Apache Spark and Apache Drill graining ground, and also realtime processing technologies like Stream Processing and Complex Event Processing. Finally there are lot of work on porting decision technologies like Machine learning into big data landscape. This talk discusses big data processing in general and look at each of those different technologies comparing and contrasting them.
These slides were designed for Apache Hadoop + Apache Apex workshop (University program).
Audience was mainly from third year engineering students from Computer, IT, Electronics and telecom disciplines.
I tried to keep it simple for beginners to understand. Some of the examples are using context from India. But, in general this would be good starting point for the beginners.
Advanced users/experts may not find this relevant.
Our journey with druid - from initial research to full production scaleItai Yaffe
Here at the Nielsen Marketing Cloud we use druid.io (http://druid.io/) as one of our main data stores, both for simple counts and for approximate count-distinct (DataSketches).
It’s been more than a year since we started using it, injecting billions of events each day to multiple druid clusters for different use-cases.
In this meet-up, we will share our journey, the challenges we had, the way we overcame them (at least most of them) and the steps we made to optimize the process around Druid to keep the solution cost effective.
Before diving into Druid, we will briefly present our data pipeline architecture, starting from the front-end serving system, deployed in number of geo-locations, to a centralized Kafka cluster in the cloud, and give some examples of the different processes that consume from Kafka and feed our different data sources.
Rocana Deep Dive OC Big Data Meetup #19 Sept 21st 2016cdmaxime
What we do:
-We build a system for the operation of modern data centers
-Triage and diagnostics, exploration, trends, advanced analytics of complex systems
-Our data: logs, metrics, human activity, anything that occurs in the data center
-Enterprise Software (i.e. we build for others.)
Today's presentation: how we built what we built on top of Apache Hadoop
First presentation for Savi's sponsorship of the Washington DC Spark Interactive. Discusses tips and lessons learned using Spark Streaming (24x7) to ingest and analyze Industrial Internet of Things (IIoT) data as part of a Lambda Architecture
Strata London 16: sightseeing, venues, and friendsNatalino Busa
Which venues have similar visiting patterns? How can we detect when a user is on vacation? Can we predict which venues will be favorited by users by examining their friends' preferences? Natalino Busa explains how these predictive analytics tasks can be accomplished by using Spark SQL, Spark ML, and just a few lines of Scala code.
My slides from the Big Data Applications meetup on 27th of July, talking about FlinkML. Also some things about open-source ML development and an illustation of interactive Flink machine learning with Apache Zeppelin.
In this talk, we will discuss the technical and non-technical challenges faced in designing a system used to obfuscate LinkedIn member data stored in Hadoop at scale.
To assist in this task, we built WhereHows (OpenSource) which serves as a data discovery and metadata catalog for all the datasets at LinkedIn. We integrated WhereHows with a compliance monitoring tool which uses Machine Learning to identify datasets with PII (Personal identity Information). We will cover the building blocks of the system and the lessons learned in the process.
Audience takeaways:
- What is takes to obfuscate member data at scale
- The challenges involved in designing the system
- The lessons learned throughout this journey
- What not to assume!
R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...Spark Summit
The central premise of DataXu is to apply data science to better marketing. At its core, is the Real Time Bidding Platform that processes 2 Petabytes of data per day and responds to ad auctions at a rate of 2.1 million requests per second across 5 different continents. Serving on top of this platform is Dataxu’s analytics engine that gives their clients insightful analytics reports addressed towards client marketing business questions. Some common requirements for both these platforms are the ability to do real-time processing, scalable machine learning, and ad-hoc analytics. This talk will showcase DataXu’s successful use-cases of using the Apache Spark framework and Databricks to address all of the above challenges while maintaining its agility and rapid prototyping strengths to take a product from initial R&D phase to full production. The team will share their best practices and highlight the steps of large scale Spark ETL processing, model testing, all the way through to interactive analytics.
Introducing Kafka Connect and Implementing Custom ConnectorsItai Yaffe
Kobi Hikri (Independent Software Architect and Consultant):
Kobi provides a short intro to Kafka Connect, and then shows an actual code example of developing and dockerizing a custom connector.
WSO2Con EU 2016: An Introduction to the WSO2 Analytics PlatformWSO2
In today’s connected, organizations have access to an enormous amount of data but only use a very small subset of it. This data can give you hindsight, oversight, insight and foresight about your enterprise and the world that communicates with. In can be leverage to gain a considerable competitive advantage in the market.
The WSO2 Data Analytics platform lets you collect data, explore it through batch, real-time, interactive and predictive processing technologies and communicate your results. In this talk, we will discuss the WSO2 Data Analytics platform and how it brings together all analytics technologies into a single platform and user experience.
Today’s highly connected world is flooding businesses with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. The WSO2 Analytics Platform enables businesses to do just that by providing batch, real-time, interactive and predictive analysis capabilities all in one place.
In this tutorial we will
* Plug in the WSO2 Analytics Platform to some common business use cases
* Showcase the numerous capabilities of the platform
* Demonstrate how to collect data, analyze, predict and communicate effectively
* Demonstrate how it can analyze integration, security and IoT scenarios
Stick around till the end and you will walk away with the necessary skills to create a winning data strategy for your organization to stay ahead of its competition.
WSO2Con ASIA 2016: An Introduction to the WSO2 Analytics PlatformWSO2
In today’s connected world organizations have access to an enormous amount of data. We often don’t know what they mean or how we can use them, in terms of hindsight, oversight, insight and foresight, to gain competitive advantage in the market. Use cases ranging from simple system monitoring to complex fraud analysis demands this.
The WSO2 Data Analytics platform lets you collect data, allows you to explore it through batch, real-time, interactive and predictive processing technologies and allows you to communicate your results. In this talk, we will discuss the WSO2 Data Analytics platform and how it brings together all analytics technologies into a single platform and user experience.
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2
Today’s highly connected world is flooding businesses with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. The WSO2 Analytics Platform enables businesses to do just that by providing batch, real-time, interactive and predictive analysis capabilities all in one place.
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2
Today’s highly connected world is flooding businesses with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. The WSO2 Analytics Platform enables businesses to do just that by providing batch, real-time, interactive and predictive analysis capabilities all in one place.
In this tutorial we will
Plug in the WSO2 Analytics Platform to some common business use cases
Showcase the numerous capabilities of the platform
Demonstrate how to collect data, analyze, predict and communicate effectively
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
by Adrian Hornsby, Technical Evanglist, AWS
As more and more organizations strive to gain real-time insights into their business, streaming data has become ubiquitous. Typical streaming data analytics solutions require specific skills and complex infrastructure. However, with Amazon Kinesis Analytics, you can analyze streaming data in real-time with standard SQL—there is no need to learn new programming languages or processing frameworks. In this session, we dive deep into the capabilities of Amazon Kinesis Analytics using real-world examples. We’ll present an end-to-end streaming data solution using Amazon Kinesis Streams for data ingestion, Amazon Kinesis Analytics for real-time processing, and Amazon Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Amazon Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
[WSO2Con Asia 2018] Patterns for Building Streaming AppsWSO2
This slide deck explains how to enable digital transformation through streaming analytics and how easily streaming applications can be implemented
Learn more: https://wso2.com/library/conference/2018/08/wso2con-asia-2018-patterns-for-building-streaming-apps/
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics PlatformWSO2
WSO2Con EU 2015: An Introduction to the WSO2 Data Analytics Platform
With Hadoop, we can easily process data from the disk, but this consumes a lot of time. The value of certain insights, such as a traffic alerts or heart attack alerts, degrades with time and handling this time sensitive data needs realtime technologies that can produce output within milliseconds. Moreover, some use cases need advanced analytics like machine learning.
In this talk, we will discuss about the WSO2 Data Analytics platform that brings together all the technologies into one platform. It lets you collect data through a one sensor API, process it using batch, realtime or predictive technologies and communicate your results all within a single platform and user experience.
Presenter:
Srinath Perera
Vice President – Research,
WSO2
Sumedha Rubasinghe, Director of API Architecture presented this talk at the API Strategy & Practice Conference in Chicago where he illustrated how organisations can analyse who uses their APIs, while understanding how statistics help in capacity planning, deployment, maintain schedules and trend analyses as well as assist the decision-making process. The session discussed scalable collection of statistics for API ecosystems, key design considerations, real-time and offline analysis as well as WSO2’s approach for dealing with these challenges.
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital EnterpriseWSO2
The WSO2 analytics platform provides a high performance, lean, enterprise-ready, streaming solution to solve data integration and analytics challenges faced by connected businesses. This platform offers real-time, interactive, machine learning and batch processing technologies that empower enterprises to build a digital business. This session explores how to enable digital transformation by building a data analytics platform.
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Com...WSO2
In this webinar, Sriskandarajah Suhothayan, technical lead at WSO2, will take a closer look at the following use cases:
Natural language processing capabilities of WSO2 CEP: Introducing basic constructs of the CEP
Analyzing a soccer game in Real time: Explaining how complicated scenarios can be implemented
Geo fencing capabilities of WSO2 CEP: Focusing on the CEP’s virtualization support
Swift distributed tracing method and tools v2zhang hua
A proposal of Swift session for OpenStack Atlanta design summit.
http://junodesignsummit.sched.org/event/0f185cd5bcc2c9b58c639bba25bc0025#.U3SZRa1dXd4
http://summit.openstack.org/cfp/details/354
Serverless London 2019 FaaS composition using Kafka and CloudEventsNeil Avery
FaaS composition using Kafka and Cloud-Events
LOCATION: Burton & Redgrave, DATE: November 7, 2019, TIME: 2:30 pm - 3:15 pm
https://serverlesscomputing.london/sessions/faas-composition-using-kafka-and-cloud-events/
Serverless functions or FaaS are all the rage. By leveraging well established event-driven microservice design principles and applying them to serverless functions we can build a homogenous ecosystem to run FaaS applications.
Kafka’s natural ability to store and replay events means serverless functions can not only be replayed, but they can also be used to choreograph call chains or driven using orchestration. Kafka also means we can democratize and organize FaaS environments in a way that scales across the enterprise.
Underpinning this mantra is the use of Cloud Events by the CNCF serverless working group (of which Confluent is an active member).
Objective of the talk
You will leave the talk with an understanding of what the future of cloud holds, a methodology for embracing serverless functions and how they become part of your journey to a cloud-native, event-driven architecture.
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Amazon Web Services
AWS offers services that revolutionize the scale and cost for customers to extract information from large data sets, commonly called Big Data. This session analyzes Amazon CloudFront logs combined with additional structured data as a scenario for correlating log and transactional data. Successfully implementing this type of solution requires architects and developers to assemble a set of services with multiple decision points. The session provides a design and example of architecting and implementing the scenario using Amazon S3, AWS Data Pipeline, Amazon Elastic MapReduce, and Amazon Redshift. It explores loading, query performance, security, incremental updates, and design trade-off decisions.
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...confluent
A powerful stream processing platform and an end-user friendly spreadsheet-interface, if this combination rings a bell, you should definitely attend our „Streamsheets and Apache Kafka“ webinar. While development is interactive with a web user interface, Streamsheets applications can run as mission-critical applications. They directly consume and produce event streams in Apache Kafka. One popular option is to run everything in the cloud leveraging the fully managed Confluent Cloud service on AWS, GCP or Azure. Without any coding or scripting, end-users leverage their existing spreadsheet skills to build customized streaming apps for analysis, dashboarding, condition monitoring or any kind of real-time pre-and post-processing of Kafka or KsqlDB streams and tables.
Hear Kai Waehner of Confluent and Kristian Raue of Cedalo on these topics:
• Where Apache Kafka and Streamsheets fit in the data ecosystem (Industrial IoT, Smart Energy, Clinical Applications, Finance Applications)
• Customer Story: How the Freiburg University Hospital uses Kafka and Streamsheets for dashboarding the utilization of clinical assets
• 15-Minutes Live Demonstration: Building a financial fraud detection dashboard based on Confluent Cloud, ksqlDB and Cedalo Cloud Streamsheets just using spreadsheet formulas.
Speaker:
Kai Waehner, Technology Evangelist, Confluent
Kristian Raue, Founder & Chief Technologist, cedalo
Similar to Introduction to WSO2 Data Analytics Platform (20)
Book: Software Architecture and Decision-MakingSrinath Perera
Uncertainty is the leading cause of mistakes made by practicing software architects. The primary goal of architecture is to handle uncertainty arising from user cases as well as architectural techniques. The book discusses how to make architectural decisions and manage uncertainty. From the book, You will learn common problems while designing a system, a default solution for each, more complex alternatives, and 5Q & 7P (Five Questions and Seven Principles) that help you choose.
Book, https://amzn.to/3v1MfZX
Blog: http://tinyurl.com/swdmblog
Six min video - https://youtu.be/jtnuHvPWlYU
We have critically evaluated how AI will shape integration use cases, their feasibility, and timelines. Emerging Technology Analysis Canvas (ETAC), a framework built to analyze emerging technologies, is the methodology of our study.
We observe that AI can significantly impact integration use cases and identify 13 AI-based use case classes for integration. Points to note include:
Enabling AI in an enterprise involves collecting, cleaning up, and creating a single representation of data as well as enforcing decisions and exposing data outside, each of which leads to many integration use cases. Hence, AI indirectly creates demand for integration.
AI needs data, which in some cases lead to significant competitive advantages. The need to collect data would drive vendors to offer most AI products in the cloud through APIs.
Due to lack of expertise and data, custom AI model building will be limited to large organizations. It is hard for small and medium size organization to build and maintain custom models.
The Role of Blockchain in Future IntegrationsSrinath Perera
We have critically evaluated blockchain-based integration use cases, their feasibility, and timelines. Emerging Technology Analysis Canvas (ETAC), a framework built to analyze emerging technologies, is the methodology of our study. Based on our analysis, we observe that blockchain can significantly impact integration use cases.
In our paper, we identify 30-plus blockchain-based use cases for integration and four architecture patterns. Notably, each use case we identified can be implemented using one of the architecture patterns. Furthermore, we also discuss challenges and risks posed by blockchains that would affect these architecture patterns.
Our webinar presents a critical analysis of serverless technology and our thoughts about its future. We use Emerging Technology Analysis Canvas (ETAC), a framework built to analyze emerging technologies, as the methodology of our study. Based on our analysis, we believe that serverless can significantly impact applications and software development workflows.
We’ve also made two further observations:
Limitations, such as tail latencies and cold starts, are not deal breakers for adoption. There are significant use cases that can work with existing serverless technologies despite these limitations.
We see a significant gap in required tooling and IDE support, best practices, and architecture blueprints. With proper tooling, it is possible to train existing enterprise developers to program with serverless. If proper tools are forthcoming, we believe serverless can cross the chasm in 3-5 years.
A detailed analysis can be found here: A Survey of Serverless: Status Quo and Future Directions. Join our webinar as we discuss this study, our conclusions, and evidence in detail.
1. Blockchain potential impact is real. If successful, Blockchain technologies can transform the way we live our day to day lives.
2. We believe technology is ready for limited applications in Digital Currency, Lightweight financial systems, Ledgers (of identity, ownership, status, and authority), Provenance (e.g. supply chains and other B2B scenarios) and Disintermediation, which we believe will happen in next three years.
3. However, with other use cases, blockchain faces significant challenges such as performance, irrevocability, need for regulation and lack of census mechanisms. These are hard problems and
4. It is not clear whether blockchain can sustain the current level of effort for extended period of 5+ years. There are many startups and they run the risk of running out of money before markets are ready. Failure of startups can inhibit further funding and investments.
5. Value and need of decentralization compared to centralized and semi-centralized alternatives is not clear.
A Visual Canvas for Judging New TechnologiesSrinath Perera
In the fast-changing technology world, the technology landscape shifts faster and faster. The agents of thses changes are new emerging technologies, which sometimes even create, destroy, or transform segments. In a shifting world, prevailing advantages are fleeting. Organizations that can master change and ride technology waves owns the future.
Not all emerging technologies live up to their promise. Every year, as a part of annual planning, most organizations need to decide relevance, impact, and the probability of success of emerging technologies and pick their bets. Although it is a regular decision there is no widely accepted framework for evaluating emerging technologies.
As a solution to this problem, we present “Emerging Technology Analysis Canvas” (ETAC), a framework to assess an individual emerging technology as a solution to this problem. Inspired by the Business Model Canvas, It represents different aspects of technology visually on a single page. This approach includes a set of questions that probe the technology arranged around a logical narrative. The visual representation is concise, compact, and comprehensible in a glance.
The talk discusses how analytics can attack privacy and what we can do about it. It discusses the legal responses (e.g. GDPR) as well technical responses ( differential privacy and homomorphic encryption).
The video is in https://www.facebook.com/eduscopelive/videos/314847475765297/ from 1.18.
Blockchain is often cited as one of the most impactful technology along with AI. It has attracted many startups, venture investments, and academic research. If successful, Blockchain technologies can transform the way, we live our day to day lives.
However, blockchain faces significant challenges such as performance, irrevocability, need for regulation and lack of census mechanisms. They are hard problems, and likely it will take at least 5-10 years to find answers to those problems.
Given the risk involved as well as the significant potential returns, we recommend a cautiously optimistic approach for blockchain with the focus on concrete use cases.
Today's Technology and Emerging Technology LandscapeSrinath Perera
We have seen the rise and fall of many technologies, some disappearing without a trace while others redefining the world. Collectively they have shaped our world beyond recognition. In this talk, Srinath will start with past technologies exploring their behavior. Then he will explore current middleware landscape, its composition, and relationships between different segments. He will discuss significant developments and discuss their future. Further, he will discuss emerging technologies, forces that shape them, and the promise of each technology, and finally, speculate about their evolution. You will walk away with knowledge on the evolution of middleware, the status quo, and discussion about how, at WSO2, we think those technologies will evolve.
Some died, some get by, but some have woven themselves to today's middleware so much that we do not notice them. The point I want to make is that not all emerging technologies are fads. Some are, and some are too early, like AI. But some are lasting.
The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera
First-generation stream processors, such as Apache Storm, wanted us to write code. It was a great start. However, when building real-world apps, which are used for a long time and evolve, writing code gets us into trouble.
If we want to query a database or query data stored in storage with Hadoop, we use SQL. Why can't we query data streaming using SQL? We can. Almost all open source stream processors, including Storm, Flink, and Kafka, have switched to SQL.
In this webinar, Srinath will talk about the evolution of stream processing, streaming SQL, the status quo, and what this means to stream applications. He will also dissect the experience of building streaming applications by exploring common patterns and pitfalls.
Analytics and AI: The Good, the Bad and the UglySrinath Perera
Analytics let us question the data, which in effect questions the world around us. This let us understand, monitor, and shape the world. AI let us discover connections, predict the possible futures and automate tasks.
These twin technologies can change the world around us. On one hand, make us efficient, connected, and fulfilled. At the same time, the change of status quo can replace jobs, affect lives and build biases into our systems that can marginalize millions.
In this talk, we will discuss core ideas behind analytics and AI, their possible impact, both good and bad outcomes, and challenges.
The dawn of digital businesses is upon us, with reimagined business models that make the best use of digital technologies such as automation, analytics, integration and cloud. Digital businesses are efficient, continuously optimizing, proactive, flexible and are able to fully understand their customers. Analytics is a key technology that helps in doing so. It acts as the eyes and ears of the system and provides a holistic view on the past and present so that decision-makers can predict what will happen in the future. This webinar will explore
Why becoming a digital business is not a choice
The role of analytics in digital transformation with examples
How best to leverage state of the art analytics technology
SoC Keynote:The State of the Art in Integration TechnologySrinath Perera
This talk discusses Outline of the state of the art of Enterprise Software and how we get there, as I see it. Also second part describes Ballerina, a new programming language WSO2 has built for Enterprise Computing.
It is presented as a Keynote at 11th Symposium and Summer School On Service-Oriented Computing.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
1. An Introduction to the WSO2
Analytics Platform
Srinath Perera
VP Research WSO2, Apache Member
(@srinath_perera)
srinath@wso2.com
2.
3.
4. Collect Data
One Sensor API to publish
events
- REST, Thrift, Java, JMS,
Kafka
- Java clients, java script clients*
First you define streams
(think it as a infinite table in
SQL DB)
Then publish events via
Sensor API
6. Collecting Data: Example
Java example: create and send events
Events send asynchronously
See client given in http://goo.gl/vIJzqc for more info
Agent agent = new Agent(agentConfiguration);
publisher = new AsyncDataPublisher("tcp://hostname:7612", .. );
StreamDefinition definition = new StreamDefinition(STREAM_NAME,VERSION);
definition.addPayloadData("sid", STRING);
...
publisher.addStreamDefinition(definition);
...
Event event = new Event();
event.setPayloadData(eventData);
publisher.publish(STREAM_NAME, VERSION, event); Send events
Define Stream
Initialize Stream
7. Data Collection Examples
• Collect data from inbuilt agents in
WSO2 products, Tomcat etc.
• Collecting your log data via log stash
• Collecting JVM and JMX stats via agent
• Ingesting data from message queues
such as JMS or Kafka
• Pulling data from a RSS feed, or
scraping a web page
• Write a custom agent to collect data
from your system and push it to DAS
Photo credit http://www.torange.us/ CC license
8. Analysis: Batch Analytics
• Batch analytics reads data from a disk ( or some other
storage) and process them record by record
• “MapReduce” is most widely used technology for batch
analytics
– Apache Hadoop
– Apache Spark 30X faster and much more flexible
• Analytics (Min, Max, average, correlation, histograms, might
join or group data in many ways)
• Key Performance indicators (KPIs)
– E.g. Profit per square feet for retail
• Presented as a Dashboard
9. SQL like Queries: Spark SQL
Since many understands SQL, Hive made
large scale data processing Big Data
accessible to many
Expressive, short, and sweet.
Define core operations that covers 90%
of problems
Lets experts dig in when they like! (via
User Defined functions)
insert overwrite table BusSpeed
select hour, average(v) as avgV, busID
from BusStream group by busID, getHour(ts);
10. Spark SQL Query
Count entries where username is not empty group by user name
and ordered by the count
SELECT username, COUNT(*) AS count FROM wikiData WHERE
username <> '' GROUP BY username ORDER BY count DESC
LIMIT 10
11. Usecase: API Usage
• Looking at different API calls by countries
• Designed to draw attention to what APIs are used and where
12. Value of some Insights degrade
Fast!
For some usecases ( e.g. stock
markets, traffic, surveillance, patient
monitoring) the value of insights
degrades very quickly with time.
We need technology that can produce
outputs fast
Static Queries, but need very fast output
(Alerts, Realtime control)
Dynamic and Interactive Queries ( Data
exploration)
14. CEP Queries 1
Calculate average temperature over a 1 minute sliding window
group by roomNo
Define Stream TempStream(roomNo string, temp double )
from TempStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
group by roomNo
insert all events into AvgRoomTempStream ;
15. CEP Queries 2
Using data from a Football game
Kick stream shows kicks by players on the ball
Ball possession is hit by me, followed by any number of hits by me,
followed by hit by someone else
from every k1 =KickStream,
KickStream[playerid = k1.playerid]*,
KickStream[playerid != k1.playerid]
select ..
insert into BallPosessionStream;
16. People
Tracking via
BLE
• Track people through BLE via
triangulation
• Higher level logic via Complex
Event Processing
• Traffic Monitoring
• Smart retail
• Airport management
18. Scaling CEP Queries on top of Storm
▪ Accepts CEP queries with hints about how to partition streams
▪ Partition streams, build a Apache Storm topology running CEP nodes as Storm
Sprouts, and run it. see http://goo.gl/pP3kdX for more info.
19. CEP Queries On Strom
@dist(parallel='4’) ask to run it with 4 nodes
Use partition definition to break the data so they can run in parallel
define partition on TempStream.region {
@dist(parallel='4’)
from TempStream[temp > 33]
insert into HighTempStream;
}
from HighTempStream#window(1h)
select max(temp)as max
insert into HourlyMaxTempStream;
20. Interactive Analytics
Best way to explore data is by
asking Ad-hoc questions
Interactive Analytics ( Search)
let you query the system and
receive fast results (<10s)
Shows data in context (e.g. by
grouping events from the
same transaction together)
Built using Lucence based
Indexes.
SparkSQL> SELECT * FROM TWITTER_DATA
21. Predictive Analytics
Can you “Write a program to drive a Car?”
Machine learning
Takes in lot of examples, and build a program
that matches those examples
We call that program a “model”
Lot of tools
- R ( Statistical language)
- Sci-kit learn (Python)
- Apache Spark’s MLBase and Apache Mahout
(Java)
22. Predictive Analytics in DAS
• Building models
– With WSO2 Machine
Learner Product via a
Wizard ( powered by
MLLib)
– Build model using R and
export them as PMML
• Built models can be used
them with both WSO2 CEP
and ESB
23. Using the Model
Within CEP
from InputStream#ml:predict(’/../diabetes-model', 'double')
select *
insert into PredictionStream;
<predict>
<model storage-location=”../downloaded-ml-model"/>
<features>
<feature name="SI2" expression="$body/features/SI2"/>
..
</features>
<predictionOutput property="result"/>
</predict>
Within ESB
24. WSO2 Machine Learner
• Upload or select data
• Explore the data
• Train a Machine learning
model
26. Supported Algorithms
• Deep Learning based classification (H2O’s Stacked Autoencoders
Classifier).
• Classification algorithms - Decision Trees, Linear Regression, Lasso
Regression, SVM, Naïve
• K-Mean clustering for unsupervised learning on your data
• Employ Anomaly Detection using K Means Algorithm to identify
fraud, network penetration and other difficult scenarios
• Recommendations Engine (Collaborative Filtering Algorithm)
27. Predict wait time in the Airport
• Predicting the time
to go through
airport
• Real-time updates
and events to
passengers
• Let airport manage
by allocate resources
28. Predict Promising Customers
• Typical website can get millions of users
• Only very small fraction coverts
• Each user, we know what he access, where is
works, country, what browser, OS, etc.
• Problem is to predict what users will covert
• Used Logistic regression, Random Forest,
Survival Modeling etc.
29. Predict Super Bowl
• Predicted 7 of the 11
games
• Done with Random
Forest Algorithm
• Even what we missed
are instructive
See Yuda’s post: Predicting the Super Bowl with Machine Learning
30. Anomaly Detection:Markov Models
• Can model probability
of a sequences
• Given a sequence, can
predict likelihood, and
use that to detect
anomalies.
• Implemented with
WSO2 CEP
31. Anomaly Detection: Clustering
• Use clustering to identify
normal behavior as clusters
• Consider points away from
all cluster as anomalies.
• Point is considered away
from a cluster if it is
outside 99% percentile line
for that cluster
• Includes in WSO2 ML
32. Communicate: Dashboards
• Dashboard give an “Overall idea”
in a glance (e.g. car dashboard)
– Boring when everything is good!!
• Build your own dashboard.
– WSO2 DAS supports a gadget
generation Wizard
– You can write your own Gadgets
using D3 and Javascript.
33. Gadget Generation Wizard
• Starts with data in tabular format
• Map each column to dimension in your plot
like X,Y, color, point size, etc
• Create a chart with few clicks
Powered by
VizGrammer lib
that uses Vaga
undneath (see
https://github.com/
wso2/VizGrammar
)
34. Communicate: Alerts
▪ Done with CEP Queries
▪ Last Mile
- Email, SMS
- Push notifications to a UI
- Pager
- Trigger physical Alarm
35. Real Life Use Cases
▪ Cisco ( OEM the platform with Cisco
solutions, Health, Smart Parking)
▪ Experian ( Digital Marketing) - see video
▪ Pacific Controls ( Smart City Platform, Vehicle
tracking, building monitoring) - see video
▪ Throttling and Anomaly Detection ( by group
of Telco companies)
▪ API Analytics (13+ customers)
No battle plan survives
contact with the enemy
--Helmuth von Moltke
36. Key Differentiators
• Open Source, under Apache 2 license
• Publish data once, analyze it anyway you like
experience.
• Flexible packaging or as a scalable cluster
• Rich, extensible, SQL-like configuration language
• Compact, easy to learn syntax addressing complex
requirements, such as time windows, patterns,
sequences which would be complex to develop in a
programming language such as Java.
• Rich set of data connectors, which can be easily
extended
37. More Information
▪ Introducing WSO2 Analytics Platform: Note for Architects,
https://iwringer.wordpress.com/2015/03/18/introducing-wso2-
analytics-platform-note-for-architects/
▪ WSO2 Data Analytics Server, http://wso2.com/products/data-
analytics-server/
▪ WSO2 Complex Event Processor,
http://wso2.com/products/complex-event-processor/
▪ WSO2 Machine Learner, http://wso2.com/products/machine-learner/