The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5Robert Grossman
This is a talk I gave in San Diego on July 29, 2009 explaining some of the impact and some of the opportunities of cloud computing on predictive analytics.
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
This deck describes general framework techniques for Large Scale Machine Learning systems. It explains Apachhe SystemML specific Optimizer and Runtime techniques. It will describe data structures, DAG compilation, operator selection including fused operators, dynamic recompilation, inter procedure analysis and some ongoing research projects.
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
This deck describes general framework techniques for Large Scale Machine Learning systems. It explains Apache SystemML specific Optimizer and Runtime techniques. It will describe data structures, DAG compilation, operator selection including fused operators, dynamic recompilation, inter procedure analysis and some ongoing research projects.
ABSTRACT: In the field of computer science known as "machine learning," a computer makes predictions about
the tasks it will perform next by examining the data that has been given to it. The computer can access data via
interacting with the environment or by using digitalized training sets. In contrast to static programming
algorithms, which require explicit human guidance, machine learning algorithms may learn from data and
generate predictions on their own. Various supervised and unsupervised strategies, including rule-based
techniques, logic-based techniques, instance-based techniques, and stochastic techniques, have been presented in
order to solve problems. Our paper's main goal is to present a comprehensive comparison of various cutting-edge
supervised machine learning techniques.
Oftentimes data scientists have specific modeling problems that call for highly customized solutions, which can lead to writing new optimization routines. In this talk we will discuss writing large-scale optimization algorithms in Python.
Starting from a quick review of the math behind convex optimization, we will implement some common algorithms with custom tweaks, first in NumPy and then at scale with Dask arrays. Leveraging the distributed dask scheduler, we will also look at asynchronous variants of these algorithms. While looking at these implementations, we will discuss the challenges of properly testing optimization routines. The focus will be on applications to large scale generalized linear models and will include a demo of the currently in-development dask-glm project. We will end with some benchmarks comparing dask-glm with the SciPy stack (statsmodels, scikit-learn) as well as other popular big data tools such as H20. This talk is written from the perspective of a data scientist, not a nuts-and-bolts computer scientist, and so is focused on customizing and extending the SciPy stack for large scale data science problems.
This talk will be co-presented by Chris White (Capital One) and Hussain Sultan (AQN Strategies).
Comparative study of optimization algorithms on convolutional network for aut...IJECEIAES
The last 10 years have been the decade of autonomous vehicles. Advances in intelligent sensors and control schemes have shown the possibility of real applications.
Deep learning, and in particular convolutional networks have become a fundamental
tool in the solution of problems related to environment identification, path planning,
vehicle behavior, and motion control. In this paper, we perform a comparative study of
the most used optimization strategies on the convolutional architecture residual neural network (ResNet) for an autonomous driving problem as a previous step to the
development of an intelligent sensor. This sensor, part of our research in reactive
systems for autonomous vehicles, aims to become a system for direct mapping of sensory information to control actions from real-time images of the environment. The
optimization techniques analyzed include stochastic gradient descent (SGD), adaptive gradient (Adagrad), adaptive learning rate (Adadelta), root mean square propagation (RMSProp), Adamax, adaptive moment estimation (Adam), nesterov-accelerated
adaptive moment estimation (Nadam), and follow the regularized leader (Ftrl). The
training of the deep model is evaluated in terms of convergence, accuracy, recall, and
F1-score metrics. Preliminary results show a better performance of the deep network
when using the SGD function as an optimizer, while the Ftrl function presents the
poorest performances.
Identifying intersections among a set of d-dimensional rectangular regions (d-rectangles) is a common problem in many simulation and modeling applications. Since algorithms for computing intersections over a large number of regions can be computationally demanding, an obvious solution is to take advantage of the multiprocessing capabilities of modern multicore processors. Unfortunately, many solutions employed for the Data Distribution Management service of the High Level Architecture are either inefficient, or can only partially be parallelized. In this paper we propose the Interval Tree Matching (ITM) algorithm for computing intersections among d-rectangles. ITM is based on a simple Interval Tree data structure, and exhibits an embarrassingly parallel structure. We implement the ITM algorithm, and compare its sequential performance with two widely used solutions (brute force and sort-based matching). We also analyze the scalability of ITM on shared-memory multicore processors. The results show that the sequential implementation of ITM is competitive with sort-based matching; moreover, the parallel implementation provides good speedup on multicore processors.
The GREDOR project. Redesigning the decision chain for managing distribution ...Université de Liège (ULg)
This presentation proposes an integrated methodology for redesigning the decision chain in distribution networks for integrating renewable energy and demand side management.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Presentation from the EPRI-Sandia Symposium on Secure and Resilient Microgrids: Microgrid Design Toolkit, presented by John Eddy, Sandia National Laboratories, Baltimore, MD, August 29-31, 2016.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5Robert Grossman
This is a talk I gave in San Diego on July 29, 2009 explaining some of the impact and some of the opportunities of cloud computing on predictive analytics.
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
This deck describes general framework techniques for Large Scale Machine Learning systems. It explains Apachhe SystemML specific Optimizer and Runtime techniques. It will describe data structures, DAG compilation, operator selection including fused operators, dynamic recompilation, inter procedure analysis and some ongoing research projects.
Apache SystemML Optimizer and Runtime techniques by Matthias BoehmArvind Surve
This deck describes general framework techniques for Large Scale Machine Learning systems. It explains Apache SystemML specific Optimizer and Runtime techniques. It will describe data structures, DAG compilation, operator selection including fused operators, dynamic recompilation, inter procedure analysis and some ongoing research projects.
ABSTRACT: In the field of computer science known as "machine learning," a computer makes predictions about
the tasks it will perform next by examining the data that has been given to it. The computer can access data via
interacting with the environment or by using digitalized training sets. In contrast to static programming
algorithms, which require explicit human guidance, machine learning algorithms may learn from data and
generate predictions on their own. Various supervised and unsupervised strategies, including rule-based
techniques, logic-based techniques, instance-based techniques, and stochastic techniques, have been presented in
order to solve problems. Our paper's main goal is to present a comprehensive comparison of various cutting-edge
supervised machine learning techniques.
Oftentimes data scientists have specific modeling problems that call for highly customized solutions, which can lead to writing new optimization routines. In this talk we will discuss writing large-scale optimization algorithms in Python.
Starting from a quick review of the math behind convex optimization, we will implement some common algorithms with custom tweaks, first in NumPy and then at scale with Dask arrays. Leveraging the distributed dask scheduler, we will also look at asynchronous variants of these algorithms. While looking at these implementations, we will discuss the challenges of properly testing optimization routines. The focus will be on applications to large scale generalized linear models and will include a demo of the currently in-development dask-glm project. We will end with some benchmarks comparing dask-glm with the SciPy stack (statsmodels, scikit-learn) as well as other popular big data tools such as H20. This talk is written from the perspective of a data scientist, not a nuts-and-bolts computer scientist, and so is focused on customizing and extending the SciPy stack for large scale data science problems.
This talk will be co-presented by Chris White (Capital One) and Hussain Sultan (AQN Strategies).
Comparative study of optimization algorithms on convolutional network for aut...IJECEIAES
The last 10 years have been the decade of autonomous vehicles. Advances in intelligent sensors and control schemes have shown the possibility of real applications.
Deep learning, and in particular convolutional networks have become a fundamental
tool in the solution of problems related to environment identification, path planning,
vehicle behavior, and motion control. In this paper, we perform a comparative study of
the most used optimization strategies on the convolutional architecture residual neural network (ResNet) for an autonomous driving problem as a previous step to the
development of an intelligent sensor. This sensor, part of our research in reactive
systems for autonomous vehicles, aims to become a system for direct mapping of sensory information to control actions from real-time images of the environment. The
optimization techniques analyzed include stochastic gradient descent (SGD), adaptive gradient (Adagrad), adaptive learning rate (Adadelta), root mean square propagation (RMSProp), Adamax, adaptive moment estimation (Adam), nesterov-accelerated
adaptive moment estimation (Nadam), and follow the regularized leader (Ftrl). The
training of the deep model is evaluated in terms of convergence, accuracy, recall, and
F1-score metrics. Preliminary results show a better performance of the deep network
when using the SGD function as an optimizer, while the Ftrl function presents the
poorest performances.
Identifying intersections among a set of d-dimensional rectangular regions (d-rectangles) is a common problem in many simulation and modeling applications. Since algorithms for computing intersections over a large number of regions can be computationally demanding, an obvious solution is to take advantage of the multiprocessing capabilities of modern multicore processors. Unfortunately, many solutions employed for the Data Distribution Management service of the High Level Architecture are either inefficient, or can only partially be parallelized. In this paper we propose the Interval Tree Matching (ITM) algorithm for computing intersections among d-rectangles. ITM is based on a simple Interval Tree data structure, and exhibits an embarrassingly parallel structure. We implement the ITM algorithm, and compare its sequential performance with two widely used solutions (brute force and sort-based matching). We also analyze the scalability of ITM on shared-memory multicore processors. The results show that the sequential implementation of ITM is competitive with sort-based matching; moreover, the parallel implementation provides good speedup on multicore processors.
The GREDOR project. Redesigning the decision chain for managing distribution ...Université de Liège (ULg)
This presentation proposes an integrated methodology for redesigning the decision chain in distribution networks for integrating renewable energy and demand side management.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Presentation from the EPRI-Sandia Symposium on Secure and Resilient Microgrids: Microgrid Design Toolkit, presented by John Eddy, Sandia National Laboratories, Baltimore, MD, August 29-31, 2016.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
4. How do commercial buildings work?
Facility managers (FMs) oversee the day to day operations of a commercial
building.
Used to be a whole team!
Shrinking maintenance budgets and increasing complexity makes this a
challenging problem.
Building operations are automated by Building Management Systems (BMSs).
·
·
·
·
4/26
5. What is a BMS?
Commercial buildings contain Building Management Systems (BMSs) to
improve indoor environment quality and reduce energy consumption.
A BMS will control heating, cooling, ventilation and lighting systems.
Contain thousands of points for sensors (temperature, humidity), actuators
(fans, motors, dampers) and software (schedule, trend logs, calculations).
A BMS will monitor sensors and adjust actuators based on their readings.
For example, if high temperatures are recorded in a room, dampers will open
and air handlers will modulate to provide cooler air.
·
·
·
·
·
5/26
7. How does this work in practice?
Vendor sets up a BMS. The BMS will behave in a certain way based predefined
rules.
BMS systems are costly to implement and to modify. Can require a lot of
coding to change the BMS's behaviour.
The bigger the BMS is the harder it is to find what matters. Locating problems
is difficult and time-consuming.
For example, a heating valve might be locked open. If this isn't detected the
BMS will cool the room to reach the required temperature.
·
·
·
·
7/26
8. So what can we do?
Help facility managers identify if a BMS is operating optimally.
Buildings Alive's goal is to
·
Fault detection
Diagnostics
-
-
·
Collect BMS data using our E2 device
Analyse and transform data into useful information.
Help guide FM's to find out what's wrong.
Provide timely and actionable information.
-
-
-
-
8/26
12. Feature generation
Dealing with thousands of unevenly spaced time-series.
Uneven spacing in time-series presents difficulties.
Rather than rounding or imputing data we can generate features and work
with them instead.
·
·
·
12/26
13. What features might be useful?
Feature generation for time-series clustering is discussed in Wang, Smith, and Hyndman (2006). Some
useful features for our case might be
Normalise these features using their median, , and interquartile range, ,
Mean
Standard deviation
Kurtosis
Skewness
Biggest change ( )
Smallest change ( )
Number of "mean crossings" per day
·
·
·
·
· { − }maxi
∣
∣yti
yti−1
∣
∣
· { − }mini
∣
∣yti
yti−1
∣
∣
·
M IQR
= .y
∗
y − M
IQR
13/26
16. Dimension reduction and clustering
Too many sensors to visualise easily.
Use dimensionality reduction.
Identify clusters and singletons.
·
·
·
16/26
17. Which clustering algorithm?
Method Advantages Disadvantages
K-means Easy to learn. Outperformed by other algorithms.
Hierarchical clustering
Informative - produces a
dendrogram.
Not suitable for large data sets -
time complexity.
Affinity propagation
Automatically determines number of
clusters.
Not suitable for large data sets - time
complexity.
Spectral clustering Good performance.
See Nadler and Galun (2007). Time complexity
of .
( log(n))n
2
( t)n
2
( )n
3
17/26
19. Obligatory mathematics slide
Spectral clustering
We are given points and a similarity matrix . Define the weight matrix, degree matrix and
graph Laplacian as
where,
Once is determined find the eigenvectors corresponding to the smallest eigenvalues of .
Finally, cluster the rows of using K-means.
n ∈xi ℝ
p
S
W
D
L
= ( ) ∈wij ℝ
n×n
= diag ( )di
= D − W,
is the weight between nodes and based on , and,
is the weighted degree of node .
· wij i j S
· =di ∑
n
j=1
wij i
L m Zn×m m L
Zn×m
19/26
21. Dash
Recently released by Plotly.
Easily build web applications for
data analytics.
Open sourced under the MIT
license.
Works nicely with the existing Plotly
graphing libraries.
·
·
·
·
Python equivalent of R's Shiny.·
21/26
26. References
“Comparing Different Clustering Algorithms on Toy Datasets.” 2017. http://scikit-
learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html.
Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2001. The Elements of
Statistical Learning. Vol. 1. Springer series in statistics New York.
Murphy, Kevin P. 2012. Machine Learning: A Probabilistic Perspective. MIT Press.
Nadler, Boaz, and Meirav Galun. 2007. “Fundamental Limitations of Spectral
Clustering.” In Advances in Neural Information Processing Systems 19, edited by P B
Schölkopf, J C Platt, and T Hoffman, 1017–24. MIT Press.
Von Luxburg, Ulrike. 2007. “A Tutorial on Spectral Clustering.” Statistics and
Computing.
Wang, Xiaozhe, Kate Smith, and Rob Hyndman. 2006. “Characteristic-Based
Clustering for Time Series Data.” Data Mining and Knowledge Discovery 13 (3): 335–
64.
26/26