This document discusses using Robust Random Cut Forest (RRCF) for continuous machine learning over streaming data. RRCF provides an efficient and highly scalable way to summarize streaming data and detect anomalies. It can also be used for attribution and directionality to explain anomalies, hotspot detection, classification, forecasting, missing value imputation, and anomaly detection in streaming directed graphs.
AWS is hosting the first FSI Cloud Symposium in Hong Kong, which will take place on Thursday, March 23, 2017 at Grand Hyatt Hotel. The event will bring together FSI customers, industry professional and AWS experts, to explore how to turn the dream of transformation, innovation and acceleration into reality by exploiting Cloud, Voice to Text and IoT technologies. The packed agenda includes expert sessions on a host of pressing issues, such as security and compliance, as well as customer experience sharing on how cloud computing is benefiting the industry.
Speaker: Timothee Cruse, Global Accounts Solutions Architect, AWS
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Amazon Web Services
Learning Objectives:
- Get an overview of streaming data and it's application in analytics and big data.
- Understand the factors driving the accelerating transformation of batch processing to real-time.
- Learn how you should plan for incorporating data streaming in your analytics and processing workloads.
Business can now easily perform real-time analytics on data that has been traditionally analyzed using batch processing in data warehouses or using Hadoop frameworks, and react to new information in minutes or seconds instead of hours or days. In this webinar, Forrester analyst Mike Gualtieri and Amazon Kinesis GM Roger Barga will discuss this prevalent trend, it's business significance, and how you should plan for it. You will also learn about the AWS services that can help you get started quickly with real-time, streaming applications fore your analytics and big data workloads.
AWS is hosting the first FSI Cloud Symposium in Hong Kong, which will take place on Thursday, March 23, 2017 at Grand Hyatt Hotel. The event will bring together FSI customers, industry professional and AWS experts, to explore how to turn the dream of transformation, innovation and acceleration into reality by exploiting Cloud, Voice to Text and IoT technologies. The packed agenda includes expert sessions on a host of pressing issues, such as security and compliance, as well as customer experience sharing on how cloud computing is benefiting the industry.
Speaker: Timothee Cruse, Global Accounts Solutions Architect, AWS
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Amazon Web Services
Learning Objectives:
- Get an overview of streaming data and it's application in analytics and big data.
- Understand the factors driving the accelerating transformation of batch processing to real-time.
- Learn how you should plan for incorporating data streaming in your analytics and processing workloads.
Business can now easily perform real-time analytics on data that has been traditionally analyzed using batch processing in data warehouses or using Hadoop frameworks, and react to new information in minutes or seconds instead of hours or days. In this webinar, Forrester analyst Mike Gualtieri and Amazon Kinesis GM Roger Barga will discuss this prevalent trend, it's business significance, and how you should plan for it. You will also learn about the AWS services that can help you get started quickly with real-time, streaming applications fore your analytics and big data workloads.
Explore the various options for streaming data on AWS, such as Amazon Kinesis and Amazon Managed Streaming for Kafka, and the various options for processing streams of data such as Apache Spark, Apache Flink, AWS Lambda, and Amazon Kinesis Analytics for Java. Let's explore what an architecture for processing Australia's new Open Banking data format at 60,000 transactions per second could look like.
Many customers across every market segment are interested in applying AI techniques to provide some kind of automated reasoning to many areas of real-time analytics where, traditionally, a human being or perhaps a rule-based expert system was the final arbitrer.
Sensors Everywhere: Unlocking the Promise of IoT | AWS Public Sector Summit 2016Amazon Web Services
With the growth of IoT, more and more objects can be classified as sensors. The police officer with the body camera, the trashcan monitoring waste, and the ambulance rushing to the hospital--all of these “things” can, in fact, be “sensors” that collect vital data. In this session, we provide an overview of innovations within IoT and mobile technology, and explore how the AWS platform is playing a role in the transformative new technology wave.
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Amazon Web Services
Speaker: Tara Walker, AWS
Customer Speaker: Digitata
Level: 200
Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data. With Amazon Kinesis, you can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications. In this session, we will provide an overview of streaming data applications with the Amazon Kinesis platform and present an end-to-end streaming data solution including data ingestion, real-time processing, and persistence.
The AWS cloud computing platform has disrupted big data. Managing big data applications used to be for only well-funded research organizations and large corporations, but not any longer. Hear from Ben Butler, Big Data Solutions Marketing Manager for AWS, to learn how our customers are using big data services in the AWS cloud to innovate faster than ever before. Not only is AWS technology available to everyone, but it is self-service, on-demand, and featuring innovative technology and flexible pricing models at low cost with no commitments. Learn from customer success stories, as Ben shares real-world case studies describing the specific big data challenges being solved on AWS. We will conclude with a discussion around the tutorials, public datasets, test drives, and our grants program - all of the resources needed to get you started quickly.
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...Amazon Web Services
This workshop explores the technology options, architectures, and implementations associated with instrumenting AR, VR, and simulated worlds. Using flight simulation as the primary use case, you learn to consume, process, store, and analyze high velocity telemetry as well as exploring control plane implementations using AWS IoT, AWS Lambda, Amazon Kinesis, and Amazon SNS. This is a hands-on workshop and you need a laptop (tablets are not suitable). You should have a solid understanding of AWS products and Node.js.
This is a brief presentation illustrating some best practices around building sensitive workloads in the AWS Cloud as well as how AWS services can make information security rigor much more scalable.
Notre cerveau est très attentif aux éléments en mouvement : de nombreux sites l’ont bien compris et utilisent désormais le CSS pour animer l’apparition de leur contenu. Et si au lieu d’intégrer des “objets inanimés qui bougent”, nous choisissions de faire évoluer des données en temps réel ? L’impact sur les utilisateurs en serait bien plus grand. Bonne nouvelle : le temps réel n’est plus réservé aux traders new-yorkais à chemise rayée ! Les données en temps réel sont partout : sur les réseaux sociaux, dans les transports, en économie collaborative, sur les ventes privées, etc. La mise à jour de la donnée a à la fois une utilité pratique et commerciale, tant pour l’annonceur que l’utilisateur final.
Au programme :
- Qu’est-ce que la donnée animée ?
- Quels sont les différents types de données animées ?
- Comment la mettre en oeuvre ? (Polling vs Server-sent events vs Websockets)
- Présentation de la solution Streamdata.io
- Samples clients de données animées (iOS, Android, JS, etc.).
Speaker :
Cédric Tran Xuan, Développeur chez Streamdata.io
Développeur chez Streamdata.io, Cédric participe à l’élaboration d’une plate-forme de push temps réel. Pendant son temps libre, il écume l’Alpes JUG, les HumanTalks et les Meetups de Grenoble ainsi que quelques MOOCs. Il joue avec des technos comme Java, Scala et Web (AngularJS, EmberJS, Riot.js, etc.).
AWS IoT is a managed cloud platform that lets connected devices easily and securely interact with cloud applications and other devices. In this session, we will discuss how constrained devices can leverage AWS IoT to send data to the cloud and receive commands back to the device using the protocol of their choice. We will discuss how devices can connect securely using MQTT and HTTP protocols, and how can developers and businesses can leverage the AWS IoT Rules Engine, Thing Shadows, and accelerate prototype development using AWS IoT Device SDKs. We will cover major hardware platforms from Arduino, Marvell, Dragonboard and MediaTek.
Running Mission Critical Workload for Financial Services Institutions on AWSAmazon Web Services
In this session we will walk through practical examples of how Financial Services Institutions (FSI) operate both common workloads and mission critical applications on AWS. Through real customer examples, we will show you how to leverage the AWS cloud platform to make your application more resilient, reliable and cost effective while increasing your visibility. You will also learn how FSI’s deploy, architect and secure their workloads on AWS and how to leverage platform features to extend and integrate your existing infrastructure with AWS.
IoT at scale - Monitor and manage devices with AWS IoT Device Management - SV...Amazon Web Services
AWS IoT Device Management makes it easy to securely onboard, organize, monitor, and remotely manage IoT devices at scale. In this hands-on, command line-oriented workshop, learn about AWS IoT Device Management features, such as onboarding options, over-the-air updates, fleet indexing, thing groups, and fine-grained logging. Gain an understanding of how to automate management of IoT devices.
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017Amazon Web Services
Introducing AWS IoT Analytics, the new managed service that lets customers structure, preprocess, store, analyze, and visualize connected device data. The service team will walk through the service’s features and introduce two customer use cases from the private beta.
Slides of QCon London 2016 talk. How stream processing is used within the Uber's Marketplace system to solve a wide range problems, including but not limited to realtime indexing and querying of geospatial time series, aggregation and computing of streaming data, and extracting patterns from data streams. In addition, it will touch upon various TimeSeries analysis and predictions. The underlying systems utilize many open source technologies such as Apache Kafka, Samza and Spark streaming.
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisAmazon Web Services
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. Amazon Kinesis can collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, marketing and financial information, manufacturing instrumentation and social media, and operational logs and metering data.
Reasons to attend:
- This session, will provide you with an overview of Amazon Kinesis.
- Learn about sample use cases and real life case studies.
- Learn how Amazon Kinesis can be integrated into your own applications.
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...Amazon Web Services
This session is an overview of IoT Analytics challenges and use cases with our customers. This session will cover analytics use cases from Consumer IoT to Industrial IoT. It will then show how AWS IoT Analytics helps customers solve these challenges in different IoT verticals.
Explore the various options for streaming data on AWS, such as Amazon Kinesis and Amazon Managed Streaming for Kafka, and the various options for processing streams of data such as Apache Spark, Apache Flink, AWS Lambda, and Amazon Kinesis Analytics for Java. Let's explore what an architecture for processing Australia's new Open Banking data format at 60,000 transactions per second could look like.
Many customers across every market segment are interested in applying AI techniques to provide some kind of automated reasoning to many areas of real-time analytics where, traditionally, a human being or perhaps a rule-based expert system was the final arbitrer.
Sensors Everywhere: Unlocking the Promise of IoT | AWS Public Sector Summit 2016Amazon Web Services
With the growth of IoT, more and more objects can be classified as sensors. The police officer with the body camera, the trashcan monitoring waste, and the ambulance rushing to the hospital--all of these “things” can, in fact, be “sensors” that collect vital data. In this session, we provide an overview of innovations within IoT and mobile technology, and explore how the AWS platform is playing a role in the transformative new technology wave.
Analyzing Streaming Data in Real-time - AWS Summit Cape Town 2018Amazon Web Services
Speaker: Tara Walker, AWS
Customer Speaker: Digitata
Level: 200
Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data. With Amazon Kinesis, you can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications. In this session, we will provide an overview of streaming data applications with the Amazon Kinesis platform and present an end-to-end streaming data solution including data ingestion, real-time processing, and persistence.
The AWS cloud computing platform has disrupted big data. Managing big data applications used to be for only well-funded research organizations and large corporations, but not any longer. Hear from Ben Butler, Big Data Solutions Marketing Manager for AWS, to learn how our customers are using big data services in the AWS cloud to innovate faster than ever before. Not only is AWS technology available to everyone, but it is self-service, on-demand, and featuring innovative technology and flexible pricing models at low cost with no commitments. Learn from customer success stories, as Ben shares real-world case studies describing the specific big data challenges being solved on AWS. We will conclude with a discussion around the tutorials, public datasets, test drives, and our grants program - all of the resources needed to get you started quickly.
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...Amazon Web Services
This workshop explores the technology options, architectures, and implementations associated with instrumenting AR, VR, and simulated worlds. Using flight simulation as the primary use case, you learn to consume, process, store, and analyze high velocity telemetry as well as exploring control plane implementations using AWS IoT, AWS Lambda, Amazon Kinesis, and Amazon SNS. This is a hands-on workshop and you need a laptop (tablets are not suitable). You should have a solid understanding of AWS products and Node.js.
This is a brief presentation illustrating some best practices around building sensitive workloads in the AWS Cloud as well as how AWS services can make information security rigor much more scalable.
Notre cerveau est très attentif aux éléments en mouvement : de nombreux sites l’ont bien compris et utilisent désormais le CSS pour animer l’apparition de leur contenu. Et si au lieu d’intégrer des “objets inanimés qui bougent”, nous choisissions de faire évoluer des données en temps réel ? L’impact sur les utilisateurs en serait bien plus grand. Bonne nouvelle : le temps réel n’est plus réservé aux traders new-yorkais à chemise rayée ! Les données en temps réel sont partout : sur les réseaux sociaux, dans les transports, en économie collaborative, sur les ventes privées, etc. La mise à jour de la donnée a à la fois une utilité pratique et commerciale, tant pour l’annonceur que l’utilisateur final.
Au programme :
- Qu’est-ce que la donnée animée ?
- Quels sont les différents types de données animées ?
- Comment la mettre en oeuvre ? (Polling vs Server-sent events vs Websockets)
- Présentation de la solution Streamdata.io
- Samples clients de données animées (iOS, Android, JS, etc.).
Speaker :
Cédric Tran Xuan, Développeur chez Streamdata.io
Développeur chez Streamdata.io, Cédric participe à l’élaboration d’une plate-forme de push temps réel. Pendant son temps libre, il écume l’Alpes JUG, les HumanTalks et les Meetups de Grenoble ainsi que quelques MOOCs. Il joue avec des technos comme Java, Scala et Web (AngularJS, EmberJS, Riot.js, etc.).
AWS IoT is a managed cloud platform that lets connected devices easily and securely interact with cloud applications and other devices. In this session, we will discuss how constrained devices can leverage AWS IoT to send data to the cloud and receive commands back to the device using the protocol of their choice. We will discuss how devices can connect securely using MQTT and HTTP protocols, and how can developers and businesses can leverage the AWS IoT Rules Engine, Thing Shadows, and accelerate prototype development using AWS IoT Device SDKs. We will cover major hardware platforms from Arduino, Marvell, Dragonboard and MediaTek.
Running Mission Critical Workload for Financial Services Institutions on AWSAmazon Web Services
In this session we will walk through practical examples of how Financial Services Institutions (FSI) operate both common workloads and mission critical applications on AWS. Through real customer examples, we will show you how to leverage the AWS cloud platform to make your application more resilient, reliable and cost effective while increasing your visibility. You will also learn how FSI’s deploy, architect and secure their workloads on AWS and how to leverage platform features to extend and integrate your existing infrastructure with AWS.
IoT at scale - Monitor and manage devices with AWS IoT Device Management - SV...Amazon Web Services
AWS IoT Device Management makes it easy to securely onboard, organize, monitor, and remotely manage IoT devices at scale. In this hands-on, command line-oriented workshop, learn about AWS IoT Device Management features, such as onboarding options, over-the-air updates, fleet indexing, thing groups, and fine-grained logging. Gain an understanding of how to automate management of IoT devices.
NEW LAUNCH! Introducing AWS IoT Analytics - IOT214 - re:Invent 2017Amazon Web Services
Introducing AWS IoT Analytics, the new managed service that lets customers structure, preprocess, store, analyze, and visualize connected device data. The service team will walk through the service’s features and introduce two customer use cases from the private beta.
Slides of QCon London 2016 talk. How stream processing is used within the Uber's Marketplace system to solve a wide range problems, including but not limited to realtime indexing and querying of geospatial time series, aggregation and computing of streaming data, and extracting patterns from data streams. In addition, it will touch upon various TimeSeries analysis and predictions. The underlying systems utilize many open source technologies such as Apache Kafka, Samza and Spark streaming.
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisAmazon Web Services
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. Amazon Kinesis can collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, marketing and financial information, manufacturing instrumentation and social media, and operational logs and metering data.
Reasons to attend:
- This session, will provide you with an overview of Amazon Kinesis.
- Learn about sample use cases and real life case studies.
- Learn how Amazon Kinesis can be integrated into your own applications.
NEW LAUNCH! AWS IoT Analytics from Consumer IoT to Industrial IoT - IOT211 - ...Amazon Web Services
This session is an overview of IoT Analytics challenges and use cases with our customers. This session will cover analytics use cases from Consumer IoT to Industrial IoT. It will then show how AWS IoT Analytics helps customers solve these challenges in different IoT verticals.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
9. Batch analytics operations take too long
BusinessValue
Time To Action
Data
originated
Analytics
performed
Insights
gleaned
Action
taken
Outdated
insights
Impotent or
harmful actions
PositiveNegative
Decision
made
Poor decision
10. Compress the analytics lifecycle
Maximize the value of data
BusinessValue
Time To Action
PositiveNegative
Maximum
Business
Value
21. Random Sample of a Stream
Reservoir Sampling [Vitter]
Maintain random sample of 5 points in a stream?
Keep heads with probability
Discard tails with probability
5
6
1
6
22. Insert – Case I
Start with the Root
If the point falls inside the bounding box
follow the path to the appropriate child
23. Insert – Case II
Theorem: Insert generates a tree T’ ~ T( )
25. Anomaly Score: Displacement
A point is an anomaly if its insertion greatly increases the tree size
( = sum of path lengths from root to leaves =
description length).
Inlier:
28. SQL to call
Random Cut
Forest
-- Compute an anomaly score for each record in the input stream
-- using Random Cut Forest
CREATE OR REPLACE PUMP “STREAM_PUMP” AS
INSERT INTO “TEMP STREAM”
SELECT STREAM “passengers”, “distance”, ANOMALY_SCORE
FROM TABLE (RANDOM_CUT_FOREST (
CURSOR(SELECT STREAM * FROM “SOURCE_SQL_STREAM”)))
-- creates a temporary stream.
CREATE OR REPLACE STREAM “TEMP_STREAM” (
“passengers” INTEGER,
“distance” DOUBLE,
“ANOMALY_SCORE” DOUBLE);
-- creates another stream for application output.
CREATE OR REPLACE STREAM “DESTINATION_SQL_STREAM” (
“passengers” INTEGER,
“distance” DOUBLE,
“ANOMALY_SCORE” DOUBLE);
-- Sort records by descending anomaly score, insert into output stream
CREATE OR REPLACE PUMP “OUTPUT_PUMP” AS
INSERT INTO “DESTINATION_SQL_STREAM”
SELECT STREAM * FROM “TEMP_STREAM”
ORDER BY FLOOR(“TEMP_STREAM”.ROWTIME TO SECOND), ANOMALY_SCORE
DESC;
32. Attribution and Directionality
Explainable/Transparent/Interpretable ML
“If my time-series data with 30 features yields an unusually high anomaly
score. How do I explain why this particular point in the time- series is unusual?
[..] Ideally I’m looking for some way to visualize “feature importance” for a
specific data point.”
--- Robin Meehan, Inasight.com
33. What is Attribution?
It’s the ratio of the “distance” of the anomaly from normal.
(It’s a distance in space of repeated patterns in the data.)
38. The Moving Example
A Fan/Turbine
1000 pts in each blade
Gaussian, for simplicity
Blades designed unequal
Rotate counterclockwise
3 special “query” points
100 trees, 256 points each
39. Anomaly Score at P1
What is going on at 90 degrees?
Blade overhead = Not an anomaly
41. Transparent Attributions
p1 is far away in x-coord most of the time
But what is happening to y?
x coordinate’s contribution for p1?
42. Directionality
Sharp transition when the blade
moves from above to below at p1!
Total score plummets.
Slowly rotating away
Total score remains high
47. False Alarms
Anomaly Detection with user feedback
““Alarms sounded 1 hour before the nurse discovered he was unresponsive. He
eventually died. An investigation found the alarm volume had been turned off.”
Alarm fatigue: Personnel become desensitized
560 alarm related deaths during 2005—2008 (FDA data)
49. Orders Data: vs.
Ground
truth
labels
RRCF
anomalies
Orders per
minute
Alarms
Labeled examples
Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May
Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May
Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May
50. Robust Random Cut Forrest
Summary of a dynamic data stream, efficient, number of use cases…
Anomaly Detection
Attribution and Directionality
Hotspot Detection
Classification
Forecasting
Missing Value Imputation
Anomaly Detection in Streaming Directed Graph
51. Unsupervised Online Adaptive Real-time
Amazon Kinesis Data Analytics
The easiest way to use machine learning!
Available Now
• Anomaly Detection
• Anomaly Detection with explanations
• Hotspot Detection (releasing soon!)
Coming Soon
• Classification
• Time Series Forecasting
• Missing Value Imputation
52. Contributors To This Project
Roger Barga, Kapil Chhabra, Charles Elkan,
Dhivya Eswaran, Christos Faloutsos, Praveen Gattu,
Gaurav Ghare, Sudipto Guha,
Shiva Kasiviswanathan, Nina Mishra,
Morteza Monemizadeh, Lauren Moos,
Yonatan Naamad, Ryan Nienhuis, Gourav Roy,
Okke Schrijvers, Joshua Tokle, and
Tal Wagner