Large scale data processing analyses and makes sense of large amounts of data. Spanning many fields, Large scale data processing brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture. Some usecases like Urban Planning can be slow, which is done in batch mode, while others like stock markets need results within Milliseconds, which are done in streaming fashion. Predictive analytics let us learn models from data often providing us ability to predict the outcome of our actions.
WSO2 Data analytics platform is fast and scalable platform that is being used by more than 40 organizations including Banks, Financial Institutions, Smart Cities, Hospitals, Media Companies, Telecom Companies, State and Federal Governments, and High Tech companies. This talk will start with a discussion on large scale data analysis. Then we will look at WSO2 Data analytics platform and discuss in detail how we can use the platform to build end to end Big data applications combining power of batch processing, real-time analytics, and predictive technologies.
This slide deck provides an overview to WSO2 Big data platform and discuss some of its customer case studies and applications. It discuss Big Data in general, real time analytics WSO2 CEP, batch analytics WSO2 BAM, and new products like predictive analytics with WSO2 Machine Learner. For more information, please reach us though architecture@wso2.org.
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...Srinath Perera
Large scale data processing analyses and makes sense of large amounts of data. Although the field itself is not new, it is finding many usecases under the theme "Bigdata" where Google itself, IBM Watson, and Google's Driverless car are some of success stories. Spanning many fields, Large scale data processing brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture. Some usecases like Urban Planning can be slow, which is done in batch mode, while others like stock markets need results within Milliseconds, which are done in streaming fashion. There are different technologies for each case: MapReduce for batch processing and Complex Event Processing and Stream Processing for real-time usecases. Furthermore, the type of analysis range from basic statistics like mean to complicated prediction models based on machine Learning. In this talk, we will discuss data processing landscape: concepts, usecases, technologies and open questions while drawing examples from real world scenarios.
http://icter.org/conference/invited_speeches
Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateSrinath Perera
In this talk, we will discuss about the WSO2 Data Analytics platform that brings together all the technologies into one platform. It lets you collect data through a one sensor API, process it using batch, realtime or predictive technologies and communicate your results all within a single platform and user experience.
More details https://iwringer.wordpress.com/2015/03/18/introducing-wso2-analytics-platform-note-for-architects/
Solving DEBS Grand Challenge with WSO2 CEPSrinath Perera
The DEBS Grand Challenge is an annual event in which different event-based systems compete to solve a real-world problem. The 2014 challenge is to demonstrate scalable real- time analytics using high-volume sensor data collected from smart plugs over a one and a half month period. This paper aims to show how a general-purpose commercially available event-based system - the WSO2 Complex Event Processor (WSO2 CEP) - was used to solve this problem. We achieved 300k TPS with one node and neared 1Millions TPS with 4 nodes. In addition, we explore areas where we created extensions to the WSO2 CEP engine to better solve the challenge.
Big Data Analysis : Deciphering the haystack Srinath Perera
A primary outcome of Bigdata is to derive useful and actionable insights from large or challenges data collections. The goal is to run the transformations from data, to information, to knowledge, and finally to insights. This includes calculating simple analytics like Mean, Max, and Median, to derive overall understanding about data by building models, and finally to derive predictions from data. Some cases we can afford to wait to collect and processes them, while in other cases we need to know the outputs right away. MapReduce has been the defacto standard for data processing, and we will start our discussion from there. However, that is only one side of the problem. There are other technologies like Apache Spark and Apache Drill graining ground, and also realtime processing technologies like Stream Processing and Complex Event Processing. Finally there are lot of work on porting decision technologies like Machine learning into big data landscape. This talk discusses big data processing in general and look at each of those different technologies comparing and contrasting them.
This Tutorial will discuss and demonstrate how to implement different realtime streaming analytics patterns. We will start with counting usecases and progress into complex patterns like time windows, tracking objects, and detecting trends. We will start with Apache Storm and progress into Complex Event Processing based technologies.
Introduction to WSO2 Data Analytics PlatformSrinath Perera
WSO2 have had several analytics products: WSO2 BAM and WSO2 CEP for some time (or Big Data products if you prefer the term). We are added WSO2 Machine Learner, a product to create, evaluate, and deploy predictive models and renamed WSO2 BAM to WSO2 DAS ( Data Analytics Server).
The platform let you publish ( collect data) once and process them through batch ( Spark) , realtime ( CEP), search the data ( Lucene) and build machine learning models.
This post describes how all those fit within to a single story.
For more information, see https://iwringer.wordpress.com/2015/03/18/introducing-wso2-analytics-platform-note-for-architects/
With tens of thousands of Java servers running in production in enterprise, Java has become a language of choice for building production systems. If our machines are to exhibit acceptable performance, they require regular tuning.This talk takes a detailed look at techniques for tuning a Java Server.
This slide deck provides an overview to WSO2 Big data platform and discuss some of its customer case studies and applications. It discuss Big Data in general, real time analytics WSO2 CEP, batch analytics WSO2 BAM, and new products like predictive analytics with WSO2 Machine Learner. For more information, please reach us though architecture@wso2.org.
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...Srinath Perera
Large scale data processing analyses and makes sense of large amounts of data. Although the field itself is not new, it is finding many usecases under the theme "Bigdata" where Google itself, IBM Watson, and Google's Driverless car are some of success stories. Spanning many fields, Large scale data processing brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture. Some usecases like Urban Planning can be slow, which is done in batch mode, while others like stock markets need results within Milliseconds, which are done in streaming fashion. There are different technologies for each case: MapReduce for batch processing and Complex Event Processing and Stream Processing for real-time usecases. Furthermore, the type of analysis range from basic statistics like mean to complicated prediction models based on machine Learning. In this talk, we will discuss data processing landscape: concepts, usecases, technologies and open questions while drawing examples from real world scenarios.
http://icter.org/conference/invited_speeches
Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateSrinath Perera
In this talk, we will discuss about the WSO2 Data Analytics platform that brings together all the technologies into one platform. It lets you collect data through a one sensor API, process it using batch, realtime or predictive technologies and communicate your results all within a single platform and user experience.
More details https://iwringer.wordpress.com/2015/03/18/introducing-wso2-analytics-platform-note-for-architects/
Solving DEBS Grand Challenge with WSO2 CEPSrinath Perera
The DEBS Grand Challenge is an annual event in which different event-based systems compete to solve a real-world problem. The 2014 challenge is to demonstrate scalable real- time analytics using high-volume sensor data collected from smart plugs over a one and a half month period. This paper aims to show how a general-purpose commercially available event-based system - the WSO2 Complex Event Processor (WSO2 CEP) - was used to solve this problem. We achieved 300k TPS with one node and neared 1Millions TPS with 4 nodes. In addition, we explore areas where we created extensions to the WSO2 CEP engine to better solve the challenge.
Big Data Analysis : Deciphering the haystack Srinath Perera
A primary outcome of Bigdata is to derive useful and actionable insights from large or challenges data collections. The goal is to run the transformations from data, to information, to knowledge, and finally to insights. This includes calculating simple analytics like Mean, Max, and Median, to derive overall understanding about data by building models, and finally to derive predictions from data. Some cases we can afford to wait to collect and processes them, while in other cases we need to know the outputs right away. MapReduce has been the defacto standard for data processing, and we will start our discussion from there. However, that is only one side of the problem. There are other technologies like Apache Spark and Apache Drill graining ground, and also realtime processing technologies like Stream Processing and Complex Event Processing. Finally there are lot of work on porting decision technologies like Machine learning into big data landscape. This talk discusses big data processing in general and look at each of those different technologies comparing and contrasting them.
This Tutorial will discuss and demonstrate how to implement different realtime streaming analytics patterns. We will start with counting usecases and progress into complex patterns like time windows, tracking objects, and detecting trends. We will start with Apache Storm and progress into Complex Event Processing based technologies.
Introduction to WSO2 Data Analytics PlatformSrinath Perera
WSO2 have had several analytics products: WSO2 BAM and WSO2 CEP for some time (or Big Data products if you prefer the term). We are added WSO2 Machine Learner, a product to create, evaluate, and deploy predictive models and renamed WSO2 BAM to WSO2 DAS ( Data Analytics Server).
The platform let you publish ( collect data) once and process them through batch ( Spark) , realtime ( CEP), search the data ( Lucene) and build machine learning models.
This post describes how all those fit within to a single story.
For more information, see https://iwringer.wordpress.com/2015/03/18/introducing-wso2-analytics-platform-note-for-architects/
With tens of thousands of Java servers running in production in enterprise, Java has become a language of choice for building production systems. If our machines are to exhibit acceptable performance, they require regular tuning.This talk takes a detailed look at techniques for tuning a Java Server.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2UkZRIC.
Monal Daxini presents a blueprint for streaming data architectures and a review of desirable features of a streaming engine. He also talks about streaming application patterns and anti-patterns, and use cases and concrete examples using Apache Flink. Filmed at qconsf.com.
Monal Daxini is the Tech Lead for Stream Processing platform for business insights at Netflix. He helped build the petabyte scale Keystone pipeline running on the Flink powered platform. He introduced Flink to Netflix, and also helped define the vision for this platform. He has over 17 years of experience building scalable distributed systems.
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeDataWorks Summit
The business value of data decreases rapidly after it is created, particularly in use cases such as fraud prevention, cybersecurity, and real-time system monitoring. The high-volume, high-velocity datasets used to feed these use cases often contain valuable, but perishable, insights that must be acted upon immediately.
In order to maximize the value of their data enterprises must fundamentally change their approach to processing real-time data to focusing reducing their decision latency on the perishable insights that exist within their real-time data streams. Thereby enabling the organization to act upon them while the window of opportunity is open.
Generating timely insights in a high-volume, high-velocity data environment is challenging for a multitude of reasons. As the volume of data increases, so does the amount of time required to transmit it back to the datacenter and process it. Secondly, as the velocity of the data increases, the faster the data and the insights derived from it lose value.
In this talk, we will present a solution based on Apache Pulsar Functions that significantly reduces decision latency by using probabilistic algorithms to perform analytic calculations on the edge.
AI-Powered Streaming Analytics for Real-Time Customer ExperienceDatabricks
Interacting with customers in the moment and in a relevant, meaningful way can be challenging to organizations faced with hundreds of various data sources at the edge, on-premises, and in multiple clouds.
To capitalize on real-time customer data, you need a data management infrastructure that allows you to do three things:
1) Sense-Capture event data and stream data from a source, e.g. social media, web logs, machine logs, IoT sensors.
2) Reason-Automatically combine and process this data with existing data for context.
3) Act-Respond appropriately in a reliable, timely, consistent way. In this session we’ll describe and demo an AI powered streaming solution that can tackle the entire end-to-end sense-reason-act process at any latency (real-time, streaming, and batch) using Spark Structured Streaming.
The solution uses AI (e.g. A* and NLP for data structure inference and machine learning algorithms for ETL transform recommendations) and metadata to automate data management processes (e.g. parse, ingest, integrate, and cleanse dynamic and complex structured and unstructured data) and guide user behavior for real-time streaming analytics. It’s built on Spark Structured Streaming to take advantage of unified API’s, multi-latency and event time-based processing, out-of-order data delivery, and other capabilities.
You will gain a clear understanding of how to use Spark Structured Streaming for data engineering using an intelligent data streaming solution that unifies fast-lane data streaming and batch lane data processing to deliver in-the-moment next best actions that improve customer experience.
Streamlio and IoT analytics with Apache PulsarStreamlio
To keep up with fast-moving IoT data, you need technology that can collect, process and store data with performance and scalability. This presentation from Data Day Texas looks at the technology requirements and how Apache Pulsar can help to meet them.
Second presentation in Savi's sponsoring of the Washington DC Spark Interactive. Discusses use of Spark with Drools to create expert systems-based analytics for the Internet of Things (IoT)
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmadigital.com
Abstract: http://www.bigdataspain.org/program/thu/slot-7.html
Stratio Streaming is the result of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP engine as complex event processing engine.
Talk I gave at StratHadoop in Barcelona on November 21, 2014.
In this talk I discuss the experience we made with realtime analysis on high volume event data streams.
These slides were designed for Apache Hadoop + Apache Apex workshop (University program).
Audience was mainly from third year engineering students from Computer, IT, Electronics and telecom disciplines.
I tried to keep it simple for beginners to understand. Some of the examples are using context from India. But, in general this would be good starting point for the beginners.
Advanced users/experts may not find this relevant.
Margriet Groenendijk - Open data is available from an incredible number of data sources that can be linked to your own datasets. This talk will present examples of how to visualise and combine data from very different sources such as weather and climate, and statistics collected by individual countries using Python notebooks in Analytics for Apache Spark.
In this tutorial we walk through state-of-the-art streaming systems, algorithms, and deployment architectures and cover the typical challenges in modern real-time big data platforms and offering insights on how to address them. We also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, we explore the interplay between storage and stream processing and discuss future developments.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2UkZRIC.
Monal Daxini presents a blueprint for streaming data architectures and a review of desirable features of a streaming engine. He also talks about streaming application patterns and anti-patterns, and use cases and concrete examples using Apache Flink. Filmed at qconsf.com.
Monal Daxini is the Tech Lead for Stream Processing platform for business insights at Netflix. He helped build the petabyte scale Keystone pipeline running on the Flink powered platform. He introduced Flink to Netflix, and also helped define the vision for this platform. He has over 17 years of experience building scalable distributed systems.
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeDataWorks Summit
The business value of data decreases rapidly after it is created, particularly in use cases such as fraud prevention, cybersecurity, and real-time system monitoring. The high-volume, high-velocity datasets used to feed these use cases often contain valuable, but perishable, insights that must be acted upon immediately.
In order to maximize the value of their data enterprises must fundamentally change their approach to processing real-time data to focusing reducing their decision latency on the perishable insights that exist within their real-time data streams. Thereby enabling the organization to act upon them while the window of opportunity is open.
Generating timely insights in a high-volume, high-velocity data environment is challenging for a multitude of reasons. As the volume of data increases, so does the amount of time required to transmit it back to the datacenter and process it. Secondly, as the velocity of the data increases, the faster the data and the insights derived from it lose value.
In this talk, we will present a solution based on Apache Pulsar Functions that significantly reduces decision latency by using probabilistic algorithms to perform analytic calculations on the edge.
AI-Powered Streaming Analytics for Real-Time Customer ExperienceDatabricks
Interacting with customers in the moment and in a relevant, meaningful way can be challenging to organizations faced with hundreds of various data sources at the edge, on-premises, and in multiple clouds.
To capitalize on real-time customer data, you need a data management infrastructure that allows you to do three things:
1) Sense-Capture event data and stream data from a source, e.g. social media, web logs, machine logs, IoT sensors.
2) Reason-Automatically combine and process this data with existing data for context.
3) Act-Respond appropriately in a reliable, timely, consistent way. In this session we’ll describe and demo an AI powered streaming solution that can tackle the entire end-to-end sense-reason-act process at any latency (real-time, streaming, and batch) using Spark Structured Streaming.
The solution uses AI (e.g. A* and NLP for data structure inference and machine learning algorithms for ETL transform recommendations) and metadata to automate data management processes (e.g. parse, ingest, integrate, and cleanse dynamic and complex structured and unstructured data) and guide user behavior for real-time streaming analytics. It’s built on Spark Structured Streaming to take advantage of unified API’s, multi-latency and event time-based processing, out-of-order data delivery, and other capabilities.
You will gain a clear understanding of how to use Spark Structured Streaming for data engineering using an intelligent data streaming solution that unifies fast-lane data streaming and batch lane data processing to deliver in-the-moment next best actions that improve customer experience.
Streamlio and IoT analytics with Apache PulsarStreamlio
To keep up with fast-moving IoT data, you need technology that can collect, process and store data with performance and scalability. This presentation from Data Day Texas looks at the technology requirements and how Apache Pulsar can help to meet them.
Second presentation in Savi's sponsoring of the Washington DC Spark Interactive. Discusses use of Spark with Drools to create expert systems-based analytics for the Internet of Things (IoT)
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
Session presented at Big Data Spain 2015 Conference
15th Oct 2015
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmadigital.com
Abstract: http://www.bigdataspain.org/program/thu/slot-7.html
Stratio Streaming is the result of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP engine as complex event processing engine.
Talk I gave at StratHadoop in Barcelona on November 21, 2014.
In this talk I discuss the experience we made with realtime analysis on high volume event data streams.
These slides were designed for Apache Hadoop + Apache Apex workshop (University program).
Audience was mainly from third year engineering students from Computer, IT, Electronics and telecom disciplines.
I tried to keep it simple for beginners to understand. Some of the examples are using context from India. But, in general this would be good starting point for the beginners.
Advanced users/experts may not find this relevant.
Margriet Groenendijk - Open data is available from an incredible number of data sources that can be linked to your own datasets. This talk will present examples of how to visualise and combine data from very different sources such as weather and climate, and statistics collected by individual countries using Python notebooks in Analytics for Apache Spark.
In this tutorial we walk through state-of-the-art streaming systems, algorithms, and deployment architectures and cover the typical challenges in modern real-time big data platforms and offering insights on how to address them. We also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, we explore the interplay between storage and stream processing and discuss future developments.
Become Data Driven With Hadoop as-a-ServiceMammoth Data
This presentation gives an overview of what it means to be a data driven company, all of the pros and cons of becoming data driven, and a few softwares used in data management.
A presentation pertaining to the integration of real-time data to the cloud with significant potential in the areas of Industrial IT,Real-time sensor information processing and Smart grids applied to various vertical industries. This is related to my blog post at www.cloudshoring.in
There are many modern techniques for identifying anomalies in datasets. There are fewer that work as online algorithms suitable for application to real-time streaming data. What’s worse? Most of these methodologies require a deep understanding of the data itself. In this talk, we tour what the options are for identifying anomalies in real-time data and discuss how much we really need to know before hand to guess at the ever-useful question: is this normal?
Distributed Trace & Log Analysis using MLJorge Cardoso
The field of AIOps, also known as Artificial Intelligence for IT Operations, uses advanced technologies to dramatically improve the monitoring, operation, and troubleshooting of distributed systems. Its main premise is that operations can be automated using monitoring data to reduce the workload of operators (e.g., SREs or production engineers). Our current research explores how AIOps – and many related fields such as deep learning, machine learning, distributed traces, graph analysis, time-series analysis, sequence analysis, advanced statistics, NLP and log analysis – can be explored to effectively detect, localize, predict, and remediate failures in large-scale cloud infrastructures (>50 regions and AZs) by analyzing service management data (e.g., distributed traces, logs, events, alerts, metrics). In particular, this talk will describe how a particular monitoring data structure, called distributed traces, can be analyzed using deep learning to identify anomalies in its spans. This capability empowers operators to quickly identify which components of a distributed system are faulty.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are available for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Big Data and Fast Data combined – is it possible ? Introduction aux architectures Big Data. M. Ulises Fasoli, Senior Consultant Trivadis. Conférence donnée dans le cadre du Swiss Data Forum du 24 novembre 2015 à Lausanne
Monitoring as an entry point for collaborationJulien Pivotto
In the last years, we have been building complex stacks, made from lots of components. All of this backed by multiple teams. This talk will present how you can use monitoring to look at the business side and have everyone looking at the same dashboards, making cooperation a reality.
In this presentation we review the basic architecture behind SQL Server StreamInsight.
Regards,
Ing. Eduardo Castro Martínez, PhD – Microsoft SQL Server MVP
http://mswindowscr.org
http://comunidadwindows.org
Costa Rica
Technorati Tags: SQL Server
LiveJournal Tags: SQL Server
del.icio.us Tags: SQL Server
http://ecastrom.blogspot.com
http://ecastrom.wordpress.com
http://ecastrom.spaces.live.com
http://universosql.blogspot.com
http://todosobresql.blogspot.com
http://todosobresqlserver.wordpress.com
http://mswindowscr.org/blogs/sql/default.aspx
http://citicr.org/blogs/noticias/default.aspx
http://sqlserverpedia.blogspot.com/
Similar to Introduction to Large Scale Data Analysis with WSO2 Analytics Platform (20)
Book: Software Architecture and Decision-MakingSrinath Perera
Uncertainty is the leading cause of mistakes made by practicing software architects. The primary goal of architecture is to handle uncertainty arising from user cases as well as architectural techniques. The book discusses how to make architectural decisions and manage uncertainty. From the book, You will learn common problems while designing a system, a default solution for each, more complex alternatives, and 5Q & 7P (Five Questions and Seven Principles) that help you choose.
Book, https://amzn.to/3v1MfZX
Blog: http://tinyurl.com/swdmblog
Six min video - https://youtu.be/jtnuHvPWlYU
We have critically evaluated how AI will shape integration use cases, their feasibility, and timelines. Emerging Technology Analysis Canvas (ETAC), a framework built to analyze emerging technologies, is the methodology of our study.
We observe that AI can significantly impact integration use cases and identify 13 AI-based use case classes for integration. Points to note include:
Enabling AI in an enterprise involves collecting, cleaning up, and creating a single representation of data as well as enforcing decisions and exposing data outside, each of which leads to many integration use cases. Hence, AI indirectly creates demand for integration.
AI needs data, which in some cases lead to significant competitive advantages. The need to collect data would drive vendors to offer most AI products in the cloud through APIs.
Due to lack of expertise and data, custom AI model building will be limited to large organizations. It is hard for small and medium size organization to build and maintain custom models.
The Role of Blockchain in Future IntegrationsSrinath Perera
We have critically evaluated blockchain-based integration use cases, their feasibility, and timelines. Emerging Technology Analysis Canvas (ETAC), a framework built to analyze emerging technologies, is the methodology of our study. Based on our analysis, we observe that blockchain can significantly impact integration use cases.
In our paper, we identify 30-plus blockchain-based use cases for integration and four architecture patterns. Notably, each use case we identified can be implemented using one of the architecture patterns. Furthermore, we also discuss challenges and risks posed by blockchains that would affect these architecture patterns.
Our webinar presents a critical analysis of serverless technology and our thoughts about its future. We use Emerging Technology Analysis Canvas (ETAC), a framework built to analyze emerging technologies, as the methodology of our study. Based on our analysis, we believe that serverless can significantly impact applications and software development workflows.
We’ve also made two further observations:
Limitations, such as tail latencies and cold starts, are not deal breakers for adoption. There are significant use cases that can work with existing serverless technologies despite these limitations.
We see a significant gap in required tooling and IDE support, best practices, and architecture blueprints. With proper tooling, it is possible to train existing enterprise developers to program with serverless. If proper tools are forthcoming, we believe serverless can cross the chasm in 3-5 years.
A detailed analysis can be found here: A Survey of Serverless: Status Quo and Future Directions. Join our webinar as we discuss this study, our conclusions, and evidence in detail.
1. Blockchain potential impact is real. If successful, Blockchain technologies can transform the way we live our day to day lives.
2. We believe technology is ready for limited applications in Digital Currency, Lightweight financial systems, Ledgers (of identity, ownership, status, and authority), Provenance (e.g. supply chains and other B2B scenarios) and Disintermediation, which we believe will happen in next three years.
3. However, with other use cases, blockchain faces significant challenges such as performance, irrevocability, need for regulation and lack of census mechanisms. These are hard problems and
4. It is not clear whether blockchain can sustain the current level of effort for extended period of 5+ years. There are many startups and they run the risk of running out of money before markets are ready. Failure of startups can inhibit further funding and investments.
5. Value and need of decentralization compared to centralized and semi-centralized alternatives is not clear.
A Visual Canvas for Judging New TechnologiesSrinath Perera
In the fast-changing technology world, the technology landscape shifts faster and faster. The agents of thses changes are new emerging technologies, which sometimes even create, destroy, or transform segments. In a shifting world, prevailing advantages are fleeting. Organizations that can master change and ride technology waves owns the future.
Not all emerging technologies live up to their promise. Every year, as a part of annual planning, most organizations need to decide relevance, impact, and the probability of success of emerging technologies and pick their bets. Although it is a regular decision there is no widely accepted framework for evaluating emerging technologies.
As a solution to this problem, we present “Emerging Technology Analysis Canvas” (ETAC), a framework to assess an individual emerging technology as a solution to this problem. Inspired by the Business Model Canvas, It represents different aspects of technology visually on a single page. This approach includes a set of questions that probe the technology arranged around a logical narrative. The visual representation is concise, compact, and comprehensible in a glance.
The talk discusses how analytics can attack privacy and what we can do about it. It discusses the legal responses (e.g. GDPR) as well technical responses ( differential privacy and homomorphic encryption).
The video is in https://www.facebook.com/eduscopelive/videos/314847475765297/ from 1.18.
Blockchain is often cited as one of the most impactful technology along with AI. It has attracted many startups, venture investments, and academic research. If successful, Blockchain technologies can transform the way, we live our day to day lives.
However, blockchain faces significant challenges such as performance, irrevocability, need for regulation and lack of census mechanisms. They are hard problems, and likely it will take at least 5-10 years to find answers to those problems.
Given the risk involved as well as the significant potential returns, we recommend a cautiously optimistic approach for blockchain with the focus on concrete use cases.
Today's Technology and Emerging Technology LandscapeSrinath Perera
We have seen the rise and fall of many technologies, some disappearing without a trace while others redefining the world. Collectively they have shaped our world beyond recognition. In this talk, Srinath will start with past technologies exploring their behavior. Then he will explore current middleware landscape, its composition, and relationships between different segments. He will discuss significant developments and discuss their future. Further, he will discuss emerging technologies, forces that shape them, and the promise of each technology, and finally, speculate about their evolution. You will walk away with knowledge on the evolution of middleware, the status quo, and discussion about how, at WSO2, we think those technologies will evolve.
Some died, some get by, but some have woven themselves to today's middleware so much that we do not notice them. The point I want to make is that not all emerging technologies are fads. Some are, and some are too early, like AI. But some are lasting.
The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera
First-generation stream processors, such as Apache Storm, wanted us to write code. It was a great start. However, when building real-world apps, which are used for a long time and evolve, writing code gets us into trouble.
If we want to query a database or query data stored in storage with Hadoop, we use SQL. Why can't we query data streaming using SQL? We can. Almost all open source stream processors, including Storm, Flink, and Kafka, have switched to SQL.
In this webinar, Srinath will talk about the evolution of stream processing, streaming SQL, the status quo, and what this means to stream applications. He will also dissect the experience of building streaming applications by exploring common patterns and pitfalls.
Analytics and AI: The Good, the Bad and the UglySrinath Perera
Analytics let us question the data, which in effect questions the world around us. This let us understand, monitor, and shape the world. AI let us discover connections, predict the possible futures and automate tasks.
These twin technologies can change the world around us. On one hand, make us efficient, connected, and fulfilled. At the same time, the change of status quo can replace jobs, affect lives and build biases into our systems that can marginalize millions.
In this talk, we will discuss core ideas behind analytics and AI, their possible impact, both good and bad outcomes, and challenges.
The dawn of digital businesses is upon us, with reimagined business models that make the best use of digital technologies such as automation, analytics, integration and cloud. Digital businesses are efficient, continuously optimizing, proactive, flexible and are able to fully understand their customers. Analytics is a key technology that helps in doing so. It acts as the eyes and ears of the system and provides a holistic view on the past and present so that decision-makers can predict what will happen in the future. This webinar will explore
Why becoming a digital business is not a choice
The role of analytics in digital transformation with examples
How best to leverage state of the art analytics technology
SoC Keynote:The State of the Art in Integration TechnologySrinath Perera
This talk discusses Outline of the state of the art of Enterprise Software and how we get there, as I see it. Also second part describes Ballerina, a new programming language WSO2 has built for Enterprise Computing.
It is presented as a Keynote at 11th Symposium and Summer School On Service-Oriented Computing.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
1. Introduction to
Large Scale Data
Analysis and
WSO2 Analytics
Platform
Srinath Perera
Director Research WSO2, Apache Member
(@srinath_perera)
srinath@wso2.com
At Indiana University Bloomington
2. Who We are?
We are an opensource Middleware
company
- We build systems upon which others
build their systems
Venture funded – Intel Capital, Cisco,
Toba Capital
400+ people & Offices at Silicon valley, Sri Lanka, London and
Bloomington
Customers including Banks, Aircraft Manufacturers, Governments
(State and Federal), Media Companies, Telco, Retail, Healthcare ..
4. A Day inYour Life
Think about a day in your life?
- What is the best road to take?
- Would there be any bad weather?
- How to invest my money?
- How is my health?
There are many decisions that you can do
better if only you can access the data and
process them.
http://www.flickr.com/photos/kcolwell/55124616
CC licence
5.
6. Internet ofThings
Currently th physical world and
software worlds are detached
Internet of things promises to bridge
this
- It is about sensors and actuators
everywhere
- In your fridge, in your blanket, in your
chair, in your carpet.. Yes even in your
socks
- Umbrella that light up when there is
rain and medicine cups
7. What can We do with Big Data?
Optimize (World is inefficient)
- 30% food wasted farm to plate
- GE Save 1% initiative (http://goo.gl/eYC0QE )
- Trains => 2B/ year
- US healthcare => 20B/ year
Save lives
- Weather, Disease identification, Personalized treatment
Technology advancement
- Most high tech research are done via simulations
10. (Batch) Analytics
Scientists are doing this for 25 year with
MPI (1991) on special Hardware
- OpenMPI is being done at IU!
Took off with Google’s MapReduce
paper (2004), Apache Hadoop, Hive and
whole eco system created.
It was successful, So we are here!!
But, processing takes time.
11. Usecase:Targeted Advertising
Analytics Implemented with MapReduce or Queries
- Min, Max, average, correlation, histograms, might join or group data in
many ways
- Heatmaps, temporal trends
Key Performance indicators (KPIs)
- E.g. Profit per square feet for retail
12. Usecase: Big Data for development
Done using CDR data
People density noon vs. midnight
(red => increased, blue =>
decreased)
Urban Planning
- People distribution
- Mobility
- Waste Management
- E.g. see http://goo.gl/jPujmM
From: http://lirneasia.net/2014/08/what-does-big-data-say-about-sri-lanka/
13. Value of some Insights degrade Fast!
For some usecases ( e.g. stock markets, traffic, surveillance, patient
monitoring) the value of insights degrades very quickly with time.
- E.g. stock markets and speed of light
We need technology that can produce
outputs fast
- Static Queries, but need very fast output
(Alerts, Realtime control)
- Dynamic and Interactive Queries ( Data
exploration)
14.
15. Predictive Analytics
If we know how to solve a problem, that is if we know
a finite set of rules, then we can programs it.
For some problems (e.g. Drive a car, character
recognition), we do not know a finite fix rule set.
Instead of programming, we give lot of examples and
ask the computer to learn (often called Machine
Learning)
Lot of tools
- R ( Statistical language)
- Sci-kit learn (Phython)
- Apache Spark’s MLBase and Apache Mahout (Java)
16. Usecase: Predictive Maintenance
Idea is to fix the problem before it
happens, avoiding expensive
downtimes
- Airplanes, turbines, windmills
- Construction Equipment
- Car, Golf carts
How
- Build a model for normal operation
and compare deviation
- Match against known error patterns
17. Problem we are trying to
Solve!
Build a platform using which others can
build their analytics systems
- Collect, Analyze, Communicate
- End to end, starts from humans and ends
with humans
Different Audiences
- Technical (Developers)
- Non-technical (CXOs, sales, analysts)
There are two things you need to
know about business,: make
something users love and make
more than you spend.
--Paul Graham
( Lisp, Y-combinator)
18.
19. Running Example
Monitor Temperature and hot airflow across multiple buildings (e.g.
central AC)
- More people => hot
Analytics
- Historical behavior of temperature by the hour
- Alerts if temperature falls too much or too high
- Modeling and predicating temperature to adjust proactively
define TemperatureStream(ts long, buildingNo long, t double);
define AirflowStream(ts long, buildingNo long,
aflow double, aT);
20. Collect Data
One Sensor API to publish events
- REST, Thrift, Java, JMS, Kafka
- Java clients, java script clients*
First you define streams (think it
as a infinite table in SQL DB)
Then send events via API
* Challenges ( performance,
guaranteed delivery, scale)
Can send to batch pipeline, Realtime pipeline or both via
configuration!
21. Collecting Data: Example
Java example: create and send events
Events send asynchronously
See client given in http://goo.gl/vIJzqc for more info
Agent agent = new Agent(agentConfiguration);
publisher = new AsyncDataPublisher("tcp://hostname:7612", .. );
StreamDefinition definition = new StreamDefinition(STREAM_NAME,VERSION);
definition.addPayloadData("sid", STRING);
...
publisher.addStreamDefinition(definition);
...
Event event = new Event();
event.setPayloadData(eventData);
publisher.publish(STREAM_NAME, VERSION, event); Send events
Define Stream
Initialize Stream
22. Batch Analytics: Spark
Two frameworks: Hadoop (http://hadoop.apache.org ) and
Spark (https://spark.apache.org )
- Hadoop is a MapReduce implementation
Spark is faster (30X and ) and much more flexible.
They set a record at Gray Sort (100TB) 3X faster with 10X less
machines, http://goo.gl/r5LGvD
For Hadoop and MapReduce resources, Google it.
file = spark.textFile("hdfs://...”)
file.flatMap(tsToHourFunction)
.reduceByKey(lambda a, b: a+b)
23. SQL like Queries: Hive
Apache Hive provides a SQL like data
processing language
Since many understands SQL, Hive
made large scale data processing Big
Data accessible to many
Expressive, short, and sweet.
Define core operations that covers 90%
of problems
Lets experts dig in when they like! (via
User Defined functions)
24. HourlyTemperature Average
Hive compile the SQL like query to set of MapReduce jobs running
in Hadoop or Spark (in WSO2 BAM from 15, Q2 release)
insert overwrite table TemperatureHistory
select hour, average(t) as avgT, buildingId
from TemperatureStream group by buildingId, getHour(ts);
26. Operators: Filters
Assume a temperature stream
Here weather:convertFtoC() is a
user defined function. They are
used to extend the language.
define stream TemperatureStream(ts long, temp double);
from TemperatureStream[weather:convertFtoC(temp) > 30.0)
and roomNo != 2043]
select roomNo, temp
insert into HotRoomsStream ;
Usecases:
- Alerts , thresholds (e.g. Alarm on
high temperature)
- Preprocessing: filtering,
transformations (e.g. data cleanup)
27. Operators:Windows and Aggregation
Support many window types
- Batch Windows, Sliding windows, Custom windows
Usecases
- Simple counting (e.g. failure count)
- Counting with Windows ( e.g. failure count every hour)
from TemperatureStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
insert into HotRoomsStream ;
28. Operators: Patterns
Models a followed by relation: e.g.
event A followed by event B
Very powerful tool for tracking
and detecting patterns
from every (a1 = TemperatureStream)
-> a2 = TemperatureStream [temp > a1.temp + 5 ]
within 1 day
select a2.ts as ts, a2.temp – a1.temp as diff
insert into HotDayAlertStream;
Usecases
- Detecting Event Sequence Patterns
- Tracking
- Detect trends
29. Operators: Joins
Join two data streams based on a condition and windows
Usecases
- Data Correlation, Detect missing events, detecting erroneous data
- Joining event streams
from TemperatureStream [temp > 30.0]#window.time(1 min) as T
join RegulatorStream[isOn == false]#window.length(1) as R on
T.roomNo == R.roomNo
select T.roomNo, R.deviceID, ‘start’ as action insert into
RegulatorActionStream
30. Operators:Access Data from the Disk
Event tables allow users to map a database to a window and join a
data stream with the window
Usecases
- Merge with data in a database, collect, update data conditionally
define table HistTempTable(day long, avgT double);
from TemperatureStream#window.length(1) join OldTempTable
on getDayOfYear(ts) == HistTempTable.day && ts > avgT
select ts, temp
insert into PurchaseUserStream ;
31. Realtime Analytics Patterns
Simple counting (e.g. failure count)
Counting with Windows ( e.g. failure count every hour)
Preprocessing: filtering, transformations (e.g. data cleanup)
Alerts , thresholds (e.g. Alarm on high temperature)
Data Correlation, Detect missing events, detecting erroneous data
(e.g. detecting failed sensors)
Joining event streams (e.g. detect a hit on soccer ball)
Merge with data in a database, collect, update data conditionally
32. Realtime Analytics Patterns (contd.)
Detecting Event Sequence Patterns (e.g. small transaction followed
by large transaction)
Tracking - follow some related entity’s state in space, time etc. (e.g.
location of airline baggage, vehicle, tracking wild life)
Detect trends – Rise, turn, fall, Outliers, Complex trends like triple
bottom etc., (e.g. algorithmic trading, SLA, load balancing)
Learning a Model (e.g. Predictive maintenance)
Predicting next value and corrective actions (e.g. automated car)
33. Predictive Analytics
Build models and use them with
WSO2 CEP, BAM and ESB using
upcoming WSO2 Machine Learner
Product ( 2015 Q2)
Build model using R, export them as
PMML, and use within WSO2 CEP
Call R Scripts from CEP queries
Regression and Anomaly Detection
Operators in CEP
34. Predictive Analytics
WSO2 Machine Learner provide
an wizard to explore and build
model
E.g. Build a model to predict next 15
minutes temperature
- Trivial Option : (historical mean
+last 15m mean)/2
- Better model via ARIMA from time
series analysis
To know more, take a ML class
35. Communicate:
Dashboards
Idea is to given the “Overall idea” in a glance
(e.g. car dashboard)
Support for personalization, you can build
your own dashboard.
Also the entry point for Drill down
How to build?
- Dashboard via Google Gadget and content
via HTML5 + java scripts
- Use WSO2 User Engagement Server to
build a dashboard. (or a JSP or PHP)
- Use charting libraries like Vega or D3
36. Communicate:
Dashboards
Idea is to given the “Overall idea” in a glance
(e.g. car dashboard)
Support for personalization, you can build
your own dashboard.
Also the entry point for Drill down
How to build?
- Dashboard via Google Gadget and content
via HTML5 + java scripts
- Use WSO2 User Engagement Server to
build a dashboard. (or a JSP or PHP)
- Use charting libraries like Vega or D3
37. Communicate:Alerts
Detecting conditions can be done via
CEP Queries
Key is the “Last Mile”
- Email
- SMS
- Push notifications to a UI
- Pager
- Trigger physical Alarm
How?
- Select Email sender “Output Adaptor” from CEP, or send from CEP to ESB, and ESB has lot of
connectors
38. Communicate:APIs
With mobile Apps, most data are
exposed and shared as APIs
(REST/Json ) to end users.
Following are some challenges
- Security and Permissions
- API Discovery
- Billing, throttling, quote
- SLA enforcement
How?
- Write data to a database from CEP event tables
- Build Services via WSO2 Data Service
- Expose them as APIs via API Manager
39. Smart Home
2015 yearly DEBS (Distributed Event Based Systems)
DEBS Grand Challenge (http://goo.gl/0htxlj)
Smart Home electricity data: 2000 sensors, 40 houses,
4 Billion events
We posted (400K events/sec) and close to one million
distributed throughput with 4 nodes.
WSO2 CEP based solution is one of the four finalists
(with Dresden University of Technology, Fraunhofer
Institute, and Imperial College London)
Only generic solution to become a finalist
40. Case Study: Realtime Soccer Analysis
Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM
43. Conclusion
Goal: Build a platform using
which others can build their
analytics systems
- End to end, starts from humans
and ends with humans
Whole platform is opensource
under Apache License
What can you do with the
platform?
- Solve hard problems, build Great
Apps with the platform
- Add and contribute extensions to
the platform (e.g. GSoc
http://goo.gl/QNFP6Y )
- Fix problems ( Patches)
Find us at architecture@wso2.org list or Stackoverflow (tag
wso2)