We want to identify the point(s) in time at which the rate of event occurrences changes, where the number of events is increasing or decreasing in frequency.
Changepoint Detection with Bayesian InferenceFrank Kelly
An overview of the application of Bayesian Inference in the detection of changepoints in noisy time series data, applied to three different and diverse domains.
J07.00011 : Superconducting Parametric Cavities as an “Optical” Quantum Compu...Jimmy Shih-Chun Hung
Quantum information may be encoded into systems of discrete variables (DV) or continuous variables (CV). CV quantum computation has typically been studied at optical frequencies using linear quantum optics to realize Gaussian operations. To achieve universal computation, however, non-Gaussian resources such as the photon number measurements or the cubic phase state are necessary. In superconducting circuits, DV quantum computation is dominant. Here, we propose and study the superconducting parametric cavity for optical quantum computation using microwave photons. At optical frequencies, the qumodes are often separated spatial modes. Here we use the orthogonal frequency modes of the cavity. Gaussian operations between the modes are achieved via standard parametric interactions. In addition, the recent realization of three-photon spontaneous parametric downconversion in this system provides access to both a non-Gaussian gate and resource state, which provides a path to universality. We will present preliminary results towards the development of the parametric cavity for optical quantum computation starting with demonstrations of simple algorithms. One such algorithm is a quantum machine learning algorithm called Quantum Kitchen Sinks.
Changepoint Detection with Bayesian InferenceFrank Kelly
An overview of the application of Bayesian Inference in the detection of changepoints in noisy time series data, applied to three different and diverse domains.
J07.00011 : Superconducting Parametric Cavities as an “Optical” Quantum Compu...Jimmy Shih-Chun Hung
Quantum information may be encoded into systems of discrete variables (DV) or continuous variables (CV). CV quantum computation has typically been studied at optical frequencies using linear quantum optics to realize Gaussian operations. To achieve universal computation, however, non-Gaussian resources such as the photon number measurements or the cubic phase state are necessary. In superconducting circuits, DV quantum computation is dominant. Here, we propose and study the superconducting parametric cavity for optical quantum computation using microwave photons. At optical frequencies, the qumodes are often separated spatial modes. Here we use the orthogonal frequency modes of the cavity. Gaussian operations between the modes are achieved via standard parametric interactions. In addition, the recent realization of three-photon spontaneous parametric downconversion in this system provides access to both a non-Gaussian gate and resource state, which provides a path to universality. We will present preliminary results towards the development of the parametric cavity for optical quantum computation starting with demonstrations of simple algorithms. One such algorithm is a quantum machine learning algorithm called Quantum Kitchen Sinks.
Climate model parameterizations of cumulus convection and other clouds that form due to small-scale turbulent eddies are a leading source of uncertainty in predicting the sensitivity of global warming to greenhouse gas increases. Even though we can write down equations governing the physics of cloud formation and fluid motion, these cloud-forming eddies are not resolved by the grid of a climate model, so the subgrid covariability of cloud processes and turbulence must be parameterized. Many approaches are used, all involving numerous subjective assumptions. Even when optimized to match present-day climate, these approaches produce a broad range of predictions about how clouds will change in a future climate.
High resolution models which explicitly simulate the clouds and turbulence on a very fine computational grid more realistically simulate cloud formation compared to observations. But it has proved challenging to translate this skill into better climate model parameterizations.
We will present one naturally stochastic approach for this using a computationally expensive approach called ‘superparameterization’ and then we will lay out a vision for how machine learning could be used to do this translation, which amounts to a form of stochastic coarse-graining. Developing the statistical and computational methods to realize this vision is a good challenge for this SAMSI year.
The presentation discusses a new method for imaging seismic data that was recently implemented on the SmartGeo cloud computing portal (https://smartgeo.crs4.it/enginframe/eiagrid/eiagrid.xml). The method is particularly suited for near-surface applications such as geotechnical engineering or environmental studies. It is shown that instead of limiting the stacking velocity analysis to single Common-Midpoint-Gathers, groups of neighboring Common-Midpoint-Gathers gathers are considered to identify entire reflection surfaces in the data. As a result the extracted kinematic properties of the subsurface, e.g. wave-propagation velocities, are more reliable and the final data stacking leads to a more detailed subsurface image even in case of noisy prestack data and laterally strongly variable velocities. In the second part of the presentation, the successful application of the proposed method is discussed in a case study based on a ultra-shallow seismic SH-wave data set recorded close to Teulada, Sardinia, Italy.
Viene presentato e discusso (in inglese) in dettaglio l'utilizzo della piattaforma EIAGRID/SmartGEO in due casi studio significativi per le applicazioni geotecniche e ambientali. Al termine, l'utente interessato dovrebbe essere in grado di utilizzare in modo autonomo la piattaforma attraverso il portale SmartGEO.
Ill-posedness formulation of the emission source localization in the radio- d...Ahmed Ammar Rebai PhD
To contact the authors : tarek.salhi@gmail.com and ahmed.rebai2@gmail.com
In the field of radio detection in astroparticle physics, many studies have shown the strong dependence of the solution of the radio-transient sources localization problem (the radio-shower time of arrival on antennas) such solutions are purely numerical artifacts. Based on a detailed analysis of some already published results of radio-detection experiments like : CODALEMA 3 in France, AERA in Argentina and TREND in China, we demonstrate the ill-posed character of this problem in the sens of Hadamard. Two approaches have been used as the existence of solutions degeneration and the bad conditioning of the mathematical formulation problem. A comparison between experimental results and simulations have been made, to highlight the mathematical studies. Many properties of the non-linear least square function are discussed such as the configuration of the set of solutions and the bias.
We develop a multi-scale streaming anomaly score that takes into account a family of window sizes, making the algorithm scale invariant across a different types of time series with varying pseudo-periodic structure. We explore different aggregation methods of the multi-scale anomaly score to obtain a final anomaly score. We evaluate the performance on the Yahoo! and Numenta Anomaly Benchmark(NAB) datasets.
5. Pedro Ladeira, Datamine - Probability of Achieving Certain Economic Outcom...Kristy Marshall
Probability of Achieving Certain Economic Outcomes (Summit); New Mine Design Technologies
Contact sales@dataminesoftware.com to enquire about Datamine software solutions.
CLEARMiner: Mining of Multitemporal Remote Sensing ImagesEditor IJCATR
A new unsupervised algorithm, called CLimate and rEmote sensing Association patterns Miner, for mining association patterns on
heterogeneous time series from climate and remote sensing data, integrated in a remote sensing information system is developed to improve the
monitoring of sugar cane fields. The system, called RemoteAgri, consists of a large database of climate data and low-resolution remote sensing
images, an image pre-processing module, a time series extraction module, and time series mining methods. The time series mining method
transforms series to symbolic representation in order to identify patterns in a multitemporal satellite images and associate them with patterns in
other series within a temporal sliding window. The validation process was achieved with agro climatic data and NOAA-AVHRR images of
sugar cane fields. Rules generated by the new algorithm show the association patterns in different periods of time in each time series, pointing to
a time delay between the occurrences of patterns in the series analyzed, corroborating what specialists usually forecast without having the burden
of dealing with many data charts. This new method can be used by agro meteorologists to mine and discover knowledge from their long time
series of past and forecasting data, being a valuable tool to support their decision-making process.
Finding Top-k Dominance on Incomplete Big Data Using MapReduce FrameworkNavid Kalaei
Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes large. Finding top-k dominant values in this type of dataset is a challenging procedure. In this presentation, I introduced a novel approach based on MapReduce and Bit Map Indexing algorithm introduced by students of the University of Nevada.
Paper: https://ieeexplore.ieee.org/abstract/document/8267252
Performed analysis on Temperature, Wind Speed, Humidity and Pressure data-sets and implemented decision tree & clustering to predict possibility of rain
Created graphs and plots using algorithms such as k-nearest neighbors, naïve bayes, decision tree and k means clustering
Climate model parameterizations of cumulus convection and other clouds that form due to small-scale turbulent eddies are a leading source of uncertainty in predicting the sensitivity of global warming to greenhouse gas increases. Even though we can write down equations governing the physics of cloud formation and fluid motion, these cloud-forming eddies are not resolved by the grid of a climate model, so the subgrid covariability of cloud processes and turbulence must be parameterized. Many approaches are used, all involving numerous subjective assumptions. Even when optimized to match present-day climate, these approaches produce a broad range of predictions about how clouds will change in a future climate.
High resolution models which explicitly simulate the clouds and turbulence on a very fine computational grid more realistically simulate cloud formation compared to observations. But it has proved challenging to translate this skill into better climate model parameterizations.
We will present one naturally stochastic approach for this using a computationally expensive approach called ‘superparameterization’ and then we will lay out a vision for how machine learning could be used to do this translation, which amounts to a form of stochastic coarse-graining. Developing the statistical and computational methods to realize this vision is a good challenge for this SAMSI year.
The presentation discusses a new method for imaging seismic data that was recently implemented on the SmartGeo cloud computing portal (https://smartgeo.crs4.it/enginframe/eiagrid/eiagrid.xml). The method is particularly suited for near-surface applications such as geotechnical engineering or environmental studies. It is shown that instead of limiting the stacking velocity analysis to single Common-Midpoint-Gathers, groups of neighboring Common-Midpoint-Gathers gathers are considered to identify entire reflection surfaces in the data. As a result the extracted kinematic properties of the subsurface, e.g. wave-propagation velocities, are more reliable and the final data stacking leads to a more detailed subsurface image even in case of noisy prestack data and laterally strongly variable velocities. In the second part of the presentation, the successful application of the proposed method is discussed in a case study based on a ultra-shallow seismic SH-wave data set recorded close to Teulada, Sardinia, Italy.
Viene presentato e discusso (in inglese) in dettaglio l'utilizzo della piattaforma EIAGRID/SmartGEO in due casi studio significativi per le applicazioni geotecniche e ambientali. Al termine, l'utente interessato dovrebbe essere in grado di utilizzare in modo autonomo la piattaforma attraverso il portale SmartGEO.
Ill-posedness formulation of the emission source localization in the radio- d...Ahmed Ammar Rebai PhD
To contact the authors : tarek.salhi@gmail.com and ahmed.rebai2@gmail.com
In the field of radio detection in astroparticle physics, many studies have shown the strong dependence of the solution of the radio-transient sources localization problem (the radio-shower time of arrival on antennas) such solutions are purely numerical artifacts. Based on a detailed analysis of some already published results of radio-detection experiments like : CODALEMA 3 in France, AERA in Argentina and TREND in China, we demonstrate the ill-posed character of this problem in the sens of Hadamard. Two approaches have been used as the existence of solutions degeneration and the bad conditioning of the mathematical formulation problem. A comparison between experimental results and simulations have been made, to highlight the mathematical studies. Many properties of the non-linear least square function are discussed such as the configuration of the set of solutions and the bias.
We develop a multi-scale streaming anomaly score that takes into account a family of window sizes, making the algorithm scale invariant across a different types of time series with varying pseudo-periodic structure. We explore different aggregation methods of the multi-scale anomaly score to obtain a final anomaly score. We evaluate the performance on the Yahoo! and Numenta Anomaly Benchmark(NAB) datasets.
5. Pedro Ladeira, Datamine - Probability of Achieving Certain Economic Outcom...Kristy Marshall
Probability of Achieving Certain Economic Outcomes (Summit); New Mine Design Technologies
Contact sales@dataminesoftware.com to enquire about Datamine software solutions.
CLEARMiner: Mining of Multitemporal Remote Sensing ImagesEditor IJCATR
A new unsupervised algorithm, called CLimate and rEmote sensing Association patterns Miner, for mining association patterns on
heterogeneous time series from climate and remote sensing data, integrated in a remote sensing information system is developed to improve the
monitoring of sugar cane fields. The system, called RemoteAgri, consists of a large database of climate data and low-resolution remote sensing
images, an image pre-processing module, a time series extraction module, and time series mining methods. The time series mining method
transforms series to symbolic representation in order to identify patterns in a multitemporal satellite images and associate them with patterns in
other series within a temporal sliding window. The validation process was achieved with agro climatic data and NOAA-AVHRR images of
sugar cane fields. Rules generated by the new algorithm show the association patterns in different periods of time in each time series, pointing to
a time delay between the occurrences of patterns in the series analyzed, corroborating what specialists usually forecast without having the burden
of dealing with many data charts. This new method can be used by agro meteorologists to mine and discover knowledge from their long time
series of past and forecasting data, being a valuable tool to support their decision-making process.
Finding Top-k Dominance on Incomplete Big Data Using MapReduce FrameworkNavid Kalaei
Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes large. Finding top-k dominant values in this type of dataset is a challenging procedure. In this presentation, I introduced a novel approach based on MapReduce and Bit Map Indexing algorithm introduced by students of the University of Nevada.
Paper: https://ieeexplore.ieee.org/abstract/document/8267252
Performed analysis on Temperature, Wind Speed, Humidity and Pressure data-sets and implemented decision tree & clustering to predict possibility of rain
Created graphs and plots using algorithms such as k-nearest neighbors, naïve bayes, decision tree and k means clustering
Probability and random processes project based learning template.pdfVedant Srivastava
To understand the concept of Monte –Carlo Method and its various applications and it rely on repeated and random sampling to obtain numerical result.
Developing the computational algorithms to solve the problem related to random sampling.
Objective also contains simulation of specific problem in Matlab Software.
Shor's algorithm is for quantum computer. Using this algorithm any arbitrarily large number can be factored in polynomial time. which is not possible in classical computer
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
У рамках програми «Підвищення кваліфікації фахівців нафтогазової галузі України для міжнародного співробітництва та роботи у західних компаніях», за підтримки компанії «Shell» 6 березня в аудиторії ВНЗ «Інститут Тутковського» відбулися курси підвищення кваліфікації на тему «Від побудови сейсмічних зображень до інверсії».
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
2. Problem Statement
• A sample of observations (counts) from a Poisson
process, where events occur randomly over a given
time period
• A chart representation typically shows time on the
horizontal axis and the number of events on the
vertical axis.
• We want to identify the point(s) in time at which the
rate of event occurrences changes, where the number
of events is increasing or decreasing in frequency.
2
3. Poisson Distribution
pwäˈsän ˌdistrəˌbyo͞oSH(ə)n/
noun: Poisson distribution; plural noun: Poisson distributions
A discrete frequency distribution that gives the probability of a
number of independent events occurring in a fixed time.
Origin:
Named after the French mathematical physicist Siméon-Denis
Poisson (1781–1840).
https://www.google.com/webhp?sourceid=chrome-
instant&ion=1&espv=2&ie=UTF-8#q=poisson%20distribution
3
6. Calculations
• Estimate the rate at which events occur
before the shift (μ).
• Estimate the rate at which events occur
after the shift (ƛ).
• Estimate the time when the actual shift
happens (k).
6
7. Methodology
From the article:
“Apply a Markov Chain Monte Carlo (MCMC) sampling
approach to estimate the population parameters at each
possible k, from the beginning of your data set to the end of it.
The values you get at each time step will be dependent only on
the values you computed at the previous time step (that’s where
the Markov Chain part of this problem comes in). There are lots
of different ways to hop around the parameter space, and each
hopping strategy has a fancy name (e.g. Metropolis-Hastings,
Gibbs Sampler, “reversible jump”).”
7
8. Markov Chain Monte Carlo
(MCMC)
From Wikipedia:
“In statistics, Markov chain Monte Carlo (MCMC) methods are a
class of algorithms for sampling from a probability distribution based
on constructing a Markov chain that has the desired distribution as
its equilibrium distribution. The state of the chain after a number of
steps is then used as a sample of the desired distribution. The
quality of the sample improves as a function of the number of steps.”
http://stats.stackexchange.com/questions/165/how-would-you-
explain-markov-chain-monte-carlo-mcmc-to-a-layperson
8
9. MCMC, continued
“Convergence of the Metropolis-Hastings algorithm. MCMC attempts
to approximate the blue distribution with the orange distribution”
9
10. Back to the Example…
Rizzo, author of Statistical Computing with R
https://www.crcpress.com/Statistical-Computing-with-R/Rizzo/9781584885450
From the article: “Here are the models for prior (hypothesized) distributions that Rizzo uses, based on the
Gibbs Sampler approach:
• mu comes from a Gamma distribution with shape parameter of (0.5 + the sum of all your frequencies
UP TO the point in time, k, you’re currently at) and a rate of (k + b1)
• lambda comes from a Gamma distribution with shape parameter of (0.5 + the sum of all your
frequencies after the point in time, k, you’re currently at) and a rate of (n – k + b1) where n is the
number of the year you’re currently processing
• b1 comes from a Gamma distribution with a shape parameter of 0.5 and a rate of (mu + 1)
• b2 comes from a Gamma distribution with a shape parameter of 0.5 and a rate of (lambda + 1)
• L is a likelihood function based on k, mu, lambda, and the sum of all the frequencies up until that point
in time, k”
10
11. Gibbs Sampler Code
“At each iteration, you pick a value of k to represent a
point in time where a change might have occurred. You
slice your data into two chunks: the chunk that
happened BEFORE this point in time, and the chunk that
happened AFTER this point in time. Using your data,
you apply a Poisson Process with a (Hypothesized)
Gamma Distributed Rate as your model. This is a pretty
common model for this particular type of problem. It’s
like randomly cutting a deck of cards and taking the
average of the values in each of the two cuts… then
doing the same thing again… a thousand times.”
11
12. Simulation Results
“Knowing the distributions of mu, lambda, and k from
hopping around our data will help us estimate values for
the true population parameters. At the end of the
simulation, we have an array of 1000 values of k, an
array of 1000 values of mu, and an array of 1000 values
of lambda — we use these to estimate the real values of
the population parameters. Typically, algorithms that do
this automatically throw out a whole bunch of them in the
beginning (the “burn-in” period) — Rizzo tosses out 200
observations — even though some statisticians (e.g.
Geyer) say that the burn-in period is unnecessary.”
12
14. Demo
• Source Code from Rizzo’s Book
• Run in RStudio
• “The change point happened between the 39th
and 40th observations, the arrival rate before the
change point was 3.14 arrivals per unit time, and
the rate after the change point was 0.93 arrivals per
unit time. (Cool!)”
14
16. Random Thoughts
• How soon can we detect change points? It can be
quite easy to locate one in retrospect, but what about
real-time change detection?
• Identifying changes in trend is difficult, so why is
change point detection better than traditional time
series analysis?
• My recurring thought has been that if someone
showed you this chart, then you could just point to the
change point and save some compute cycles. Duh!
16