CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Systems: An Overview and Some Theoretical Results - Zhengyuan Zhu, Feb 13, 2018
The asynchronous parallel algorithms are developed to solve massive optimization problems in a distributed data system, which can be run in parallel on multiple nodes with little or no synchronization. Recently they have been successfully implemented to solve a range of difficult problems in practice. However, the existing theories are mostly based on fairly restrictive assumptions on the delays, and cannot explain the convergence and speedup properties of such algorithms. In this talk we will give an overview on distributed optimization, and discuss some new theoretical results on the convergence of asynchronous parallel stochastic gradient algorithm with unbounded delays. Simulated and real data will be used to demonstrate the practical implication of these theoretical results.
International Journal of Managing Information Technology (IJMIT)IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph, the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network. SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed. In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
International Journal of Managing Information Technology (IJMIT)IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph, the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network. SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed. In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
This is a presentation that I gave to my research group. It is about probabilistic extensions to Principal Components Analysis, as proposed by Tipping and Bishop.
The variational Gaussian process (VGP), a Bayesian nonparametric model which adapts its shape to match com- plex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity.
Fuzzy clustering algorithm can not obtain good clustering effect when the sample characteristic is not
obvious and need to determine the number of clusters firstly. For thi0s reason, this paper proposes an
adaptive fuzzy kernel clustering algorithm. The algorithm firstly use the adaptive function of clustering
number to calculate the optimal clustering number, then the samples of input space is mapped to highdimensional
feature space using gaussian kernel and clustering in the feature space. The Matlab simulation
results confirmed that the algorithm's performance has greatly improvement than classical clustering algorithm and has faster convergence speed and more accurate clustering results
In this talk I will show that standard graph features such as degree distribution of the transaction graph may not be sufficient to capture network dynamics and its potential impact on fluctuations of Bitcoin price. In contrast, the new graph associated topological features computed using the tools of persistent homology, are found to exhibit a high utility for predicting Bitcoin price dynamics. Using the proposed persistent homology-based techniques, I will present the ChainNet platform, a new elegant, easily extendable and computationally light approach for graph representation learning on Blockchain.
This talk builds on recent empirical work addressing the extent to which the transaction graph serves as an early-warning indicator for large financial losses. By identifying certain sub-graphs ('chainlets') with causal effect on price movements, we demonstrate the impact of extreme transaction graph activity on the intraday volatility of the Bitcoin prices series. In particular, we infer the loss distributions conditional on extreme chainlet activity. Armed with this empirical representation, we propose a modeling approach to explore conditions under which the market is stabilized by transaction graph aware agents.
We approach the screening problem - i.e. detecting which inputs of a computer model significantly impact the output - from a formal Bayesian model selection point of view. That is, we place a Gaussian process prior on the computer model and consider the $2^p$ models that result from assuming that each of the subsets of the $p$ inputs affect the response. The goal is to obtain the posterior probabilities of each of these models. In this talk, we focus on the specification of objective priors on the model-specific parameters and on convenient ways to compute the associated marginal likelihoods. These two problems that normally are seen as unrelated, have challenging connections since the priors proposed in the literature are specifically designed to have posterior modes in the boundary of the parameter space, hence precluding the application of approximate integration techniques based on e.g. Laplace approximations. We explore several ways of circumventing this difficulty, comparing different methodologies with synthetic examples taken from the literature.
Authors: Gonzalo Garcia-Donato (Universidad de Castilla-La Mancha) and Rui Paulo (Universidade de Lisboa)
This is a presentation that I gave to my research group. It is about probabilistic extensions to Principal Components Analysis, as proposed by Tipping and Bishop.
The variational Gaussian process (VGP), a Bayesian nonparametric model which adapts its shape to match com- plex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity.
Fuzzy clustering algorithm can not obtain good clustering effect when the sample characteristic is not
obvious and need to determine the number of clusters firstly. For thi0s reason, this paper proposes an
adaptive fuzzy kernel clustering algorithm. The algorithm firstly use the adaptive function of clustering
number to calculate the optimal clustering number, then the samples of input space is mapped to highdimensional
feature space using gaussian kernel and clustering in the feature space. The Matlab simulation
results confirmed that the algorithm's performance has greatly improvement than classical clustering algorithm and has faster convergence speed and more accurate clustering results
In this talk I will show that standard graph features such as degree distribution of the transaction graph may not be sufficient to capture network dynamics and its potential impact on fluctuations of Bitcoin price. In contrast, the new graph associated topological features computed using the tools of persistent homology, are found to exhibit a high utility for predicting Bitcoin price dynamics. Using the proposed persistent homology-based techniques, I will present the ChainNet platform, a new elegant, easily extendable and computationally light approach for graph representation learning on Blockchain.
This talk builds on recent empirical work addressing the extent to which the transaction graph serves as an early-warning indicator for large financial losses. By identifying certain sub-graphs ('chainlets') with causal effect on price movements, we demonstrate the impact of extreme transaction graph activity on the intraday volatility of the Bitcoin prices series. In particular, we infer the loss distributions conditional on extreme chainlet activity. Armed with this empirical representation, we propose a modeling approach to explore conditions under which the market is stabilized by transaction graph aware agents.
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
Similar to CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Systems: An Overview and Some Theoretical Results - Zhengyuan Zhu, Feb 13, 2018
We approach the screening problem - i.e. detecting which inputs of a computer model significantly impact the output - from a formal Bayesian model selection point of view. That is, we place a Gaussian process prior on the computer model and consider the $2^p$ models that result from assuming that each of the subsets of the $p$ inputs affect the response. The goal is to obtain the posterior probabilities of each of these models. In this talk, we focus on the specification of objective priors on the model-specific parameters and on convenient ways to compute the associated marginal likelihoods. These two problems that normally are seen as unrelated, have challenging connections since the priors proposed in the literature are specifically designed to have posterior modes in the boundary of the parameter space, hence precluding the application of approximate integration techniques based on e.g. Laplace approximations. We explore several ways of circumventing this difficulty, comparing different methodologies with synthetic examples taken from the literature.
Authors: Gonzalo Garcia-Donato (Universidad de Castilla-La Mancha) and Rui Paulo (Universidade de Lisboa)
Design and Implementation of Variable Radius Sphere Decoding Algorithmcsandit
Sphere Decoding (SD) algorithm is an implement deco
ding algorithm based on Zero Forcing
(ZF) algorithm in the real number field. The classi
cal SD algorithm is famous for its
outstanding Bit Error Rate (BER) performance and de
coding strategy. The algorithm gets its
maximum likelihood solution by recursive shrinking
the searching radius gradually. However, it
is too complicated to use the method of shrinking t
he searching radius in ground
communication system. This paper proposed a Variabl
e Radius Sphere Decoding (VR-SD)
algorithm based on ZF algorithm in order to simplif
y the complex searching steps. We prove the
advantages of VR-SD algorithm by analyzing from the
derivation of mathematical formulas and
the simulation of the BER performance between SD an
d VR-SD algorithm.
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
The ability to mine and extract useful information automatically, from large datasets, is a
common concern for organizations (having large datasets), over the last few decades. Over the
internet, data is vastly increasing gradually and consequently the capacity to collect and store
very large data is significantly increasing.
Existing clustering algorithms are not always efficient and accurate in solving clustering
problems for large datasets.
However, the development of accurate and fast data classification algorithms for very large
scale datasets is still a challenge. In this paper, various algorithms and techniques especially,
approach using non-smooth optimization formulation of the clustering problem, are proposed
for solving the minimum sum-of-squares clustering problems in very large datasets. This
research also develops accurate and real time L2-DC algorithm based with the incremental
approach to solve the minimum
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
In this work, we propose to apply trust region optimization to deep reinforcement
learning using a recently proposed Kronecker-factored approximation to
the curvature. We extend the framework of natural policy gradient and propose
to optimize both the actor and the critic using Kronecker-factored approximate
curvature (K-FAC) with trust region; hence we call our method Actor Critic using
Kronecker-Factored Trust Region (ACKTR). To the best of our knowledge, this
is the first scalable trust region natural gradient method for actor-critic methods.
It is also a method that learns non-trivial tasks in continuous control as well as
discrete control policies directly from raw pixel inputs. We tested our approach
across discrete domains in Atari games as well as continuous domains in the MuJoCo
environment. With the proposed methods, we are able to achieve higher
rewards and a 2- to 3-fold improvement in sample efficiency on average, compared
to previous state-of-the-art on-policy actor-critic methods. Code is available at
https://github.com/openai/baselines.
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsFabian Pedregosa
As datasets continue to increase in size and multi-core computer architectures are developed, asynchronous parallel optimization algorithms become more and more essential to the field of Machine Learning. In this talk I will describe two of our recent contributions to this topic. First, we highlight an important technical issue present in a large fraction of the recent convergence proofs for asynchronous parallel optimization algorithms and propose a new framework that resolves it [1]. Second, we propose a novel asynchronous variant of SAGA, a stochastic method that combines the low cost per iteration of SGD with the fast convergence rates of gradient descent [2]
[1] Leblond, R., Pedregosa, F., & Lacoste-Julien, S. (2018). Improved asynchronous parallel optimization analysis for stochastic incremental methods. arXiv:1801.03749, https://arxiv.org/pdf/1801.03749.pdf
[2] Pedregosa, F., Leblond, R., & Lacoste-Julien, S. (2017). Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization. In Advances in Neural Information Processing Systems, http://papers.nips.cc/paper/6611-breaking-the-nonsmooth-barrier-a-scalable-parallel-method-for-composite-optimization.pdf
We consider the problem of model estimation in episodic Block MDPs. In these MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are interested in estimating the latent state decoding function (the mapping from the observations to latent states) based on data generated under a fixed behavior policy. We derive an information-theoretical lower bound on the error rate for estimating this function and present an algorithm approaching this fundamental limit. In turn, our algorithm also provides estimates of all the components of the MDP.
We apply our results to the problem of learning near-optimal policies in the reward-free setting. Based on our efficient model estimation algorithm, we show that we can infer a policy converging (as the number of collected samples grows large) to the optimal policy at the best possible asymptotic rate. Our analysis provides necessary and sufficient conditions under which exploiting the block structure yields improvements in the sample complexity for identifying near-optimal policies. When these conditions are met, the sample complexity in the minimax reward-free setting is improved by a multiplicative factor $n$, where $n$ is the number of contexts.
After we applied the stochastic Galerkin method to solve stochastic PDE, and solve large linear system, we obtain stochastic solution (random field), which is represented in Karhunen Loeve and PCE basis. No sampling error is involved, only algebraic truncation error. Now we would like to escape classical MCMC path to compute the posterior. We develop an Bayesian* update formula for KLE-PCE coefficients.
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...IJCSEA Journal
A hybrid learning automata–genetic algorithm (HLGA) is proposed to solve QoS routing optimization problem of next generation networks. The algorithm complements the advantages of the learning Automato Algorithm(LA) and Genetic Algorithm(GA). It firstly uses the good global search capability of LA to generate initial population needed by GA, then it uses GA to improve the Quality of Service(QoS) and acquiring the optimization tree through new algorithms for crossover and mutation operators which are an NP–Complete problem. In the proposed algorithm, the connectivity matrix of edges is used for genotype representation. Some novel heuristics are also proposed for mutation, crossover, and creation of random individuals. We evaluate the performance and efficiency of the proposed HLGA-based algorithm in comparison with other existing heuristic and GA-based algorithms by the result of simulation. Simulation results demonstrate that this paper proposed algorithm not only has the fast calculating speed and high accuracy but also can improve the efficiency in Next Generation Networks QoS routing. The proposed algorithm has overcome all of the previous algorithms in the literature..
Geoid height determination is one of the major problems of geodesy because usage of satellite
techniques in geodesy isgetting increasing. Geoid heights can be determined using different methods according
to the available data. Soft computing methods such as Fuzzy logic and neural networks became so popular that
they are used to solve many engineering problems. Fuzzy logic theory and later developments in uncertainty
assessment have enabled us to develop more precise models for our requirements. In this study, How to
construct the best fuzzy model is examined. For this purpose, three different data sets were taken and two
different kinds (two inpust one output and three inputs one output) fuzzy model were formed for the calculation
of geoid heights in Istanbul (Turkey). The Fuzzy models results of these were compared with geoid heights
obtained by GPS/levelling methods. The fuzzy approximation models were tested on the test points.
Similar to CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Systems: An Overview and Some Theoretical Results - Zhengyuan Zhu, Feb 13, 2018 (20)
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
Bayesian Additive Regression Trees (BART) has been shown to be an effective framework for modeling nonlinear regression functions, with strong predictive performance in a variety of contexts. The BART prior over a regression function is defined by independent prior distributions on tree structure and leaf or end-node parameters. In observational data settings, Bayesian Causal Forests (BCF) has successfully adapted BART for estimating heterogeneous treatment effects, particularly in cases where standard methods yield biased estimates due to strong confounding.
We introduce BART with Targeted Smoothing, an extension which induces smoothness over a single covariate by replacing independent Gaussian leaf priors with smooth functions. We then introduce a new version of the Bayesian Causal Forest prior, which incorporates targeted smoothing for modeling heterogeneous treatment effects which vary smoothly over a target covariate. We demonstrate the utility of this approach by applying our model to a timely women's health and policy problem: comparing two dosing regimens for an early medical abortion protocol, where the outcome of interest is the probability of a successful early medical abortion procedure at varying gestational ages, conditional on patient covariates. We discuss the benefits of this approach in other women’s health and obstetrics modeling problems where gestational age is a typical covariate.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
The world of health care is full of policy interventions: a state expands eligibility rules for its Medicaid program, a medical society changes its recommendations for screening frequency, a hospital implements a new care coordination program. After a policy change, we often want to know, “Did it work?” This is a causal question; we want to know whether the policy CAUSED outcomes to change. One popular way of estimating causal effects of policy interventions is a difference-in-differences study. In this controlled pre-post design, we measure the change in outcomes of people who are exposed to the new policy, comparing average outcomes before and after the policy is implemented. We contrast that change to the change over the same time period in people who were not exposed to the new policy. The differential change in the treated group’s outcomes, compared to the change in the comparison group’s outcomes, may be interpreted as the causal effect of the policy. To do so, we must assume that the comparison group’s outcome change is a good proxy for the treated group’s (counterfactual) outcome change in the absence of the policy. This conceptual simplicity and wide applicability in policy settings makes difference-in-differences an appealing study design. However, the apparent simplicity belies a thicket of conceptual, causal, and statistical complexity. In this talk, I will introduce the fundamentals of difference-in-differences studies and discuss recent innovations including key assumptions and ways to assess their plausibility, estimation, inference, and robustness checks.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This seminar discussed ways in which to produce professional academic writing, from academic papers to research proposals or technical writing in general.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
More from The Statistical and Applied Mathematical Sciences Institute (20)
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...NelTorrente
In this research, it concludes that while the readiness of teachers in Caloocan City to implement the MATATAG Curriculum is generally positive, targeted efforts in professional development, resource distribution, support networks, and comprehensive preparation can address the existing gaps and ensure successful curriculum implementation.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
Azure Interview Questions and Answers PDF By ScholarHat
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Systems: An Overview and Some Theoretical Results - Zhengyuan Zhu, Feb 13, 2018
1. Distributed Optimization: An Overview and Some
Theoretical Results
Zhengyuan Zhu
Joint work with Xin Zhang and Jia Kevin Liu
Department of Statistics and
Center for Survey Statistics Methodology
Iowa State University
2/13/2018
2. Introduction
1 Introduction
2 Asynchronous Stochastic Gradient Descent
3 Convergence Analysis
4 Numerical Study
5 Conclusion
6 Preliminary Work
Zhang & Liu & Zhu Asyn-SGD 2 / 35
3. Introduction
Introduction
Connection to remote senesing WK:
My research interests: spatial statistics, spatial sampling design,
survey statistics.
National Resouce Inventory survey, remote sensing data help improve
survey estimates of agricultural statistics and natural resources.
Massive spatial-temporal imputation (gap-filling), computationally
effieicent functional approach, application to Landsat and MODIS
data.
Massive Imputation for hyper-spectral satellite data (OCO2), data
sparse in space-time
Unmixing problem for the SMOS and OCO2 data.
Original title: Asynchronous Stochastic Gradient Descent with
unbounded delay on nonconvex problem
Actual title: Distributed Optimization: An Overview and Some
Theoretical Results
Zhang & Liu & Zhu Asyn-SGD 3 / 35
4. Introduction
Distributed Computation
Problem: Datasets are becoming extremely large (features and sample
size), and they may be collected and/or stored in a distributed system.
With Moore’s law coming to an end in a few years (2025?), we can
no longer rely on hardware improvements anymore.
Distributed computation study how to divide a ’big’ problem into
several small parts and allocate these parts to many computers, then
combine ’local’ results to obtain the final result;
Bertsekas and Tsitsiklis (1999) provided a general frame work for
parallele and distributed computation.
Micro chip level: multi-core CPU/GPU
Macro data center level: networked cloud computing
Zhang & Liu & Zhu Asyn-SGD 4 / 35
6. Introduction
Some issues relevant to theory of data system
Centralized vs local computation: local computation reduces data
transfer costs, and have less issue with data privacy and
confidentianity.
Synchronous vs asynchronous methods: synchronization could involve
significant communication overhead; server variability may lead to
inefficiency; asyn methods may have convergence issue depending on
the delay function and the algorithm.
Data homogeneity vs heteroscedasticity
Homogeneous: Databases Ξ1, ..., Ξk are shared, i.i.d or stationary. The
objective function computed at each local machine is unbiased;
Heterogeneous: Databases Ξ1, ..., Ξk are not i.i.d or stationary, i.e.,
they could be from different sources or collection with differen
methods. The objective function in each local machine may be biased;
Trade-off in computation, communication, and inference precision.
Zhang & Liu & Zhu Asyn-SGD 6 / 35
7. Introduction
Distributed Optimization Algorithms
Some of the well-studied algorithms for distributed optimization:
Stochastic Gradient Descent (SGD) Bottou (1998, 2011) theory and
application to large scale machine learning; Recht et. al. (2011)
Asynchoronous SGD algorithm HOGWILD!; Lian et. al. (2015),
convergence rate for non-convex problem with bounded delay;
Alternating Direction Method of Multiplier (ADMM) Gabay and B.
Mercier (1976), Boyd et.al. (2010), Chang et. al. (2015), Hong
(2017)
Distributed quasi-newton methods for faster rates of convergence
Eisen et. al. (2017) uses gradient to estimate the curvature,
Mansoori (2017) used a matrix splitting technique to compute
Newton direction in a distributed way.
Zhang & Liu & Zhu Asyn-SGD 7 / 35
8. Introduction
Applications in ML
Distributed optimization, and in particular distributed SGD, has become a
very popular way to speedup machine learning algorithms. Some successful
examples:
In ?, parallel system is used to train SVM, which could save
computational time and avoid out of memory;
? designed an parallelizable method, called CCD++, for matrix
factorization in large-scale recommender systems.;
Distributed deep learning: ? proposed two distributed algorithms,
Downpour SGD and Sandblaster, to train DNN; Abadi et. al. (2016)
introduced TensorFlow for large scale machine learning
Zhang & Liu & Zhu Asyn-SGD 8 / 35
10. Asynchronous Stochastic Gradient Descent
Overview for Stochastic Gradient Descent (SGD)
Our work focuses on the feasiblity of a distributed asynchronous
optimization algorithm, Asynchronous Stochastic Gradient Descent, under
unbounded delays.
Also referred to as stochastic approximation in the literature;
First introduced in ? and ?;
The idea: simply use a noisy unbiased gradient to replace the
unknown true gradient in the gradient descent algorithm;
The stochastic gradient descent works as follows. To solve the
following optimization:
min
x∈Rd
f (x) = E[F(x; ξ)], (1)
Let xk+1 = xk − γkG(xk), where xk presents the parameter in k-th
iteration, G(xk) is a noisy unbiased gradient based on xk;
Zhang & Liu & Zhu Asyn-SGD 10 / 35
11. Asynchronous Stochastic Gradient Descent
Asynchronous Stochastic Gradient Descent (Asyn-SGD)
Asyn-SGD is a extension framework based on SGD. It could be
implemented as following:
For workers,
compute gradients G with current parameter x and random sample ξ;
report gradients to server;
For server,
collect the certain amount (M) of gradients from workers;
update current parameter with these gradients;
Zhang & Liu & Zhu Asyn-SGD 11 / 35
12. Asynchronous Stochastic Gradient Descent
Asynchronous Stochastic Gradient Descent (Asyn-SGD)
Algorithm 1 Asynchronous Stochastic Gradient Descent (Asyn-SGD)
Require: Database Ξ, step size {γk}, initial point x0, batch size M;
Ensure: xk;
At parameter server:
1: for i= 1, 2 ,..., k do
2: Collecting M gradients G(xi−τi,m
; ξi,m) from workers;
3: Updating xi+1 = xi − γi
M
m=1 G(xi−τi,m
; ξi,m);
4: end for
At workers:
5: Receive current parameter x∗ from parameter server;
6: Randomly select sample ξ from database;
7: Compute stochastic gradient G(x∗; ξ) and report it to server;
Here τi,m is the delay in i-th iteration and m-th batch.
Zhang & Liu & Zhu Asyn-SGD 12 / 35
13. Asynchronous Stochastic Gradient Descent
Asynchronous Stochastic Gradient Descent with
Incremental batch size (Asyn-SGDI)
A modified version of Asyn-SGD is to increase the batch size when
determining the undate direction. With large batch size, the variance of
the gradient noise would decrease, which might lead to a faster result.
Algorithm 2 Asyn SGD with increment batch size (Asyn-SGDI)
Require: Database Ξ, step size {γk}, initial point x0, increasing batch
size{Mi = ni M};
Ensure: xk;
At parameter server:
1: for i= 1, 2 ,..., k do
2: Collecting M gradients G(xi−τi,m
; ξi,m) from workers;
3: Updating xi+1 = xi − γi
ni
Mi
m=1 G(xi−τi,m
; ξi,m);
4: end for
Zhang & Liu & Zhu Asyn-SGD 13 / 35
14. Convergence Analysis
1 Introduction
2 Asynchronous Stochastic Gradient Descent
3 Convergence Analysis
4 Numerical Study
5 Conclusion
6 Preliminary Work
Zhang & Liu & Zhu Asyn-SGD 14 / 35
15. Convergence Analysis
General Assumption
Assumption
(Lower bounded objective function) For objective function f , there exists a
optimal point x∗, s.t. ∀x, f (x) ≥ f (x∗).
Assumption
(Lipschitz Continuous Gradient) The objective function f satisfies
f (x) − f (y) ≤ L x − y , ∀x, y.
Assumption
(Unbiased graidents with bounded variance) The stochastic gradient
G(x; ξ) is unbiased with bounded variance, that is to say:
1 E(G(x; ξ)) = f (x), ∀x, ξ;
2 E( (G(x; ξ)) − f (x) 2) ≤ σ2, ∀x;
Zhang & Liu & Zhu Asyn-SGD 15 / 35
16. Convergence Analysis
Restriction for probabilities of delay variables
Assumption
There exists a sequence {ci }, such that
cj+1 +
γkML2
2
k
i=j
iP(τk = i) ≤ cj , ∀ j, k, (2)
where τk presents the maximun delay in k-th iteration: τk = maxm τk,m
and γk is the step size.
Here, {ci } is the weight in the asynchronicity error.
Zhang & Liu & Zhu Asyn-SGD 16 / 35
17. Convergence Analysis
Convergence Analysis for Asyn-SGD
Now we can give the convergence result for Asyn-SGD:
Theorem
Assume above assumptions hold and the stepsize {γk} satisfies
1 γk ≤ 1
2Mc1+ML, ∀k;
2 γk is unsummable but γ2
k is summable;
where M is fixed batch size, L is Lipschitz constant in Assumption 2 and
c1 is from the sequence in Assumption 4, then we have
E[ ∞
k=1 γk f (xk) 2] < ∞, and E[ f (xk) 2] → 0.
Corollary
If the step size γk = O(1/(K1/2log(K)), ∀ > 0, then the asymptotic
convergence rate for Asyn-SGD is
E( f (xk) 2
) = o(1/
√
K).
Zhang & Liu & Zhu Asyn-SGD 17 / 35
18. Convergence Analysis
Convergence Analysis for Asyn-SGD with incremental
batch size
Similarly, we can get the convergence result for Asyn-SGD with
incremental batch size:
Theorem
Assume the above assumptions hold and the size of database is infinite,
set batch size {Mk := nkM} satisfying that ∞
k=1
1
nk
< ∞ and step size
{γk} satisfying that γk ≤ 1
2M1c1+M1L, ∀k, then we have
E[ ∞
k=1 γk f (xk) 2] < ∞, and E[ f (xk) 2] → 0.
Corollary
For > 0, and nk = o( 1
k1+ ), with fixed stepsize satisfying the requirement
in Theorem 3.2, we have E( f (xk) 2) = o(1/K).
Zhang & Liu & Zhu Asyn-SGD 18 / 35
19. Convergence Analysis
Bounded Delay Variable
First we consider a simple case, in which the delay variables are bounded.
Corollary
(Bounded Delay Variable) If the delay variables {τk} are bounded, then
{ci } exists.
This is a very common case, as discussed in ?, ? etc.. This scenario is
reasonable as long as all the worker runs with evenly speed.
Zhang & Liu & Zhu Asyn-SGD 19 / 35
20. Convergence Analysis
I.I.D. Delay Variable
Second case is to assume that for the sequence of delays {τk} is I.I.D. and
the commmon distribution has finite second moment. This scenoria is
rational when the iteration number is very large and the system has
reached the stationarity.
Corollary
(I.I.D. Delay Variale) If the probability series {τk} is I.I.D as τ and τ has
finite second moment, then {ci } exists.
Zhang & Liu & Zhu Asyn-SGD 20 / 35
21. Convergence Analysis
Uniform Upper Bound
Third case: the delay variables can have different distributions as long as
uniformly they could be bounded by a second moment finite sequence.
This is a more general case.
Corollary
(Uniformly Upper Bounded Probability Series) Consider the probability
series of delay variables {τk}∞
k=1, if there exists a series {ai }∞
i=1 s.t.
1
∞
i=1 i2ai < ∞;
2 P(τk = i) ≤ ai , ∀k;
then {ci } exists.
Zhang & Liu & Zhu Asyn-SGD 21 / 35
22. Numerical Study
1 Introduction
2 Asynchronous Stochastic Gradient Descent
3 Convergence Analysis
4 Numerical Study
5 Conclusion
6 Preliminary Work
Zhang & Liu & Zhu Asyn-SGD 22 / 35
23. Numerical Study
Example 1: MLE for MVN Covariance Matrix
First, we consider maximum likelihood estimation for the covariance matrix
in multivariate normal distribution. This problem can be formulated as:
min
Σ∈Rd×d
ln |Σ| +
1
n
n
i=1
(xi − µ)T
Σ−1
(xi − µ) (3)
subjet to Σ 0
where Σ is the covariance matrix, µ is the mean vector and xi are samples.
The gradient for this problem has been derived in ?.
Zhang & Liu & Zhu Asyn-SGD 23 / 35
24. Numerical Study
Example 1: MLE for MVN Covariance Matrix
We randomly generate data from multivariate normal distribuion with
mean as (0, 0) and covariance matrix as (10, 3; 3, 5).
(a) is with bounded delay variable and the upper bound is 50; (b)
uses poisson delay with parameter 30; in (c), we simulates a virtual
system and the working time t for each worker follows the same
model, t ∼ Exp(λ) and λ ∼ Gamma(2, 1).
The green solid line is the convergence result for Asyn-SGD with
O(1/k) step size, the orange dotted line is the convergence result for
Asyn-SGD with O(1/(K1/2log(K))) step size and the purple dashed
line is the convergence result for Asyn-SGDI.
Zhang & Liu & Zhu Asyn-SGD 24 / 35
25. Numerical Study
Example 1: MLE for MVN Covariance Matrix
(a) bounded by 50 (b) Poi(50) (c) System delay
Figure: Convergence for Asyn-SGD and Asyn-SGDI
In the three cases, the l2 norm of gradient will go to zero while Asyn-SGDI
is fastest and Asyn-SGD with stepsize O(1/k) is slowest.
Zhang & Liu & Zhu Asyn-SGD 25 / 35
26. Numerical Study
Example 1: MLE for MVN Covariance Matrix
We consider an extreme case where the delay variable is from discrete
uniform distribution (evenly probability).
Figure: A counter example when Asyn-SGD fails
Zhang & Liu & Zhu Asyn-SGD 26 / 35
27. Numerical Study
Example 1: MLE for MVN Covariance Matrix
We also compare the computation time of Syn-SGD, Asyn-SGD and
Asyn-SGDI on this problem. The step size for Syn-SGD and Asyn-SGD is
O(1/k) and the step size for Asyn-SGD is constant.
Figure: Computation time for three algorithms: the red line is for Syn-SGD; blue dotted line is
for Asyn-SGD; black dotdash line is for Asyn-SGDI
Zhang & Liu & Zhu Asyn-SGD 27 / 35
28. Numerical Study
Example 2: Low Rank Matrix Completion
This problem is to find the lowest rank matrix X which matches the
expectation of observed symmetric matrices, E[A]. It could be
mathematically formulated as following:
min E[ A − YY T 2
F ] (4)
subjet to Y ∈ Rn×p
where X = YY T . Using SGD to solve this problem has been discussed in
many works, including ? and ? etc.
Zhang & Liu & Zhu Asyn-SGD 28 / 35
29. Numerical Study
Example 2: Low Rank Matrix Completion
(a) bounded by 50 (b) Poi(30) (c) System delay
Figure: Convergence for Asyn-SGD and Asyn-SGDI
Zhang & Liu & Zhu Asyn-SGD 29 / 35
30. Conclusion
1 Introduction
2 Asynchronous Stochastic Gradient Descent
3 Convergence Analysis
4 Numerical Study
5 Conclusion
6 Preliminary Work
Zhang & Liu & Zhu Asyn-SGD 30 / 35
31. Conclusion
Conclusion
In our work, we analyze the convergence of Asyn-SGD on nonconvex
optimization problem with unbounded delay;
We propose a new Lyapurov function, which consists of classical error
and asynchronicity error;
A sufficient condition for delay variable is given to guarantee the
convergence of Asyn-SGD;
With proper stepsize, the asymptotic convergence rate for Asyn-SGD
is o(1/
√
k) and that for Asyn-SGDI is o(1/k).
This algorithm requires local gradient to be unbiased. For
heterogeneous case, we are working on an ADMM based asynchonous
solution.
Zhang & Liu & Zhu Asyn-SGD 31 / 35
32. Preliminary Work
1 Introduction
2 Asynchronous Stochastic Gradient Descent
3 Convergence Analysis
4 Numerical Study
5 Conclusion
6 Preliminary Work
Zhang & Liu & Zhu Asyn-SGD 32 / 35
33. Preliminary Work
Distributed Computing and ADMM
Consider following problem:
Data are distributed in several machine, let’s say Ξ1, Ξ2, ..., Ξk;
The objective function is
min
x
L(x; Ξ1, Ξ2, ..., Ξk) =
k
i=1
Li (x; Ξi ); (5)
Communication cost is too expensive so each machine could only
”see” local objection function Li (x; Ξi );
Data is biased, which means xi = arg min Li (x; Ξi ) is not consistent.
Zhang & Liu & Zhu Asyn-SGD 33 / 35
34. Preliminary Work
Problem Formulation
Reformuating the problem:
min
x
L(x; Ξ1, Ξ2, ..., Ξk) =
k
i=1
Li (x; Ξi ) (6)
⇒ min
x
k
i=1
Li (xi ; Ξi ), s.t.xi = x, ∀ i (7)
The corresponding augmented Lagrangian function:
L({xi }, x; y) =
k
i=1
Li (xi ; Ξi ) +
k
i=1
yk, xk − x +
k
i=1
ρi
2
xi − x 2
; (8)
Thus, x and yk could be updated in central server; xk could be
updated in loacl machine. Only x, xk and yk are transported between
the central server and local machines.
Zhang & Liu & Zhu Asyn-SGD 34 / 35
35. Preliminary Work
ADMM based parallel computing framework
Algorithm 3 ADMM based parallel computing framework
Require: Database {Ξi }, {ρi }, initial point;
Ensure: xT ;
At parameter server:
1: for t= 1, 2 ,..., T do
2: Collect xk from local machines;
3: Update xt+1: xt+1 = arg minx
K
i=1 yt
i , xt
i − x + K
i=1
ρi
2 xi − x 2;
4: Update yt+1
k = yt
k + ρk(xt+1
k − xt+1);
5: end for At local machine i:
6: Receive current yt+1
i and xt+1 from parameter server;
7: Update
xt+1
i = arg min
xi
Li (x; Ξi ) +
K
i=1
yt+1
i , xi − xt+1
+
ρi
2
xi − xt+1 2
;
Zhang & Liu & Zhu Asyn-SGD 35 / 35