This document discusses linear combinations of radioactive decay models for analyzing the performance of generational garbage collection. It summarizes previous work showing that younger-first and older-first generational collectors can outperform each other on different programs, depending on the distribution of object lifetimes. The document aims to better characterize the theoretical dividing line between programs suited for younger-first vs older-first collection by using linear combinations of radioactive decay models to model object lifetime distributions. Preliminary results show the dividing line is more complex than previously thought, and that non-generational collection is generally suboptimal compared to generational approaches.
Ramos, almeida: artificial ant colonies in digital image habitats – a mass be...ArchiLab 7
This document discusses using artificial ant colony behavior models to perform image segmentation. It begins by summarizing previous work using ant colony models for optimization problems. It then describes the Chialvo and Millonas model of ant swarm behavior, which uses pheromone deposition and evaporation on a grid to simulate ant trails. The document proposes extending this model to digital image habitats, treating pixel intensities as the landscape. It argues that global image perception could emerge from the collective behavior of individual "ants" reacting locally to the pheromone field and pixel intensities. The goal is for the ant colony to implicitly learn and represent the image through their adaptive pheromone deposition.
The Algorithms of Life - Scientific Computing for Systems Biologyinside-BigData.com
In this deck from ISC 2019, Ivo Sbalzarini from TU Dresden presents: The Algorithms of Life - Scientific Computing for Systems Biology. In his talk, Sbalzarini mainly discussed the rapidly growing importance and influence in the life sciences for scientific high-performance computing.
"Scientific high-performance computing is of rapidly growing importance and influence in the life sciences. Thanks to the increasing knowledge about the molecular foundations of life, recent advances in biomedical data science, and the availability of predictive biophysical theories that can be numerically simulated, mechanistic understanding of the emergence of life comes within reach. Computing is playing a pivotal and catalytic role in this scientific revolution, both as a tool of investigation and hypothesis testing, but also as a school of thought and systems model. This is because a developing tissue, embryo, or organ can itself be seen as a massively parallel distributed computing system that collectively self-organizes to bring about behavior we call life. In any multicellular organism, every cell constantly takes decisions about growth, division, and migration based on local information, with cells communicating with each other via chemical, mechanical, and electrical signals across length scales from nanometers to meters. Each cell can therefore be understood as a mechano-chemical processing element in a complexly interconnected million- or billion-core computing system. Mechanistically understanding and reprogramming this system is a grand challenge. While the “hardware” (proteins, lipids, etc.) and the “source code” (genetic code) are increasingly known, we known virtually nothing about the algorithms that this code implements on this hardware. Our vision is to contribute to this challenge by developing computational methods and software systems for high-performance data analysis, inference, and numerical simulation of computer models of biological tissues, incorporating the known biochemistry and biophysics in 3D-space and time, in order to understand biological processes on an algorithmic basis. This ranges from real-time approaches to biomedical image analysis, to novel simulation languages for parallel high-performance computing, to virtual reality and machine learning for 3D microscopy and numerical simulations of coupled biochemical-biomechanical models. The cooperative, interdisciplinary effort to develop and advance our understanding of life using computational approaches not only places high-performance computing center stage, but also provides stimulating impulses for the future development of this field."
Watch the video: https://wp.me/p3RLHQ-kBB
Learn more: https://www.isc-hpc.com/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Coates p 1999: exploring 3_d design worlds using lindenmeyer systems and gen...ArchiLab 7
This document describes research using Lindenmeyer systems and genetic programming to explore 3D design worlds. Lindenmeyer systems are string rewriting systems that can be used to recursively generate 3D objects. The researchers used an isospatial grid to represent 3D space and represent objects as spheres inserted into the grid points. Genetic programming operations like crossover and mutation were performed on the Lindenmeyer system production rules to evolve new designs. Initial experiments tested the crossover operation and explored evolving objects in simple virtual environments with objectives like avoiding or seeking certain conditions.
A Technique for Partially Solving a Family of Diffusion Problemsijtsrd
Our aim in this paper is to expose the interesting role played by differ integral specifically, semi derivatives and semi integrals in solving certain diffusion problems. Along with the wave equation and Laplace equation, the diffusion equation is one of the three fundamental partial differential equation of mathematical physics. I will not discuss convential solutions of the diffusion equation at all. These range from closed form solutions for very simple model problems to computer methods for approximating the concentration of the diffusing substance on a network of points. Such solutions are described extensively in the literature .My purpose, rather, is to expose a technique for partially solving a family of diffusion problems, a technique that leads to a compact equation which is first order partially and half order temporally. I shall show that, for semi finite systems initially at equilibrium, our semi differential equation leads to a relationship between the intensive variable and the flux at the boundary. Use of this relationship then obviates the need to solve the original diffusion equation in those problems for which this behavior at the boundary is of primary importance. I shall, in fact, freely make use of the general properties established for differ integral operators as if all my functions were differ integrable. Dr. Ayaz Ahmad "A Technique for Partially Solving a Family of Diffusion Problems" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-6 , October 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18576.pdf
A guide to molecular mechanics and quantum chemical calculationsSapna Jha
This document provides an introduction and guide to molecular mechanics and quantum chemical calculations. It is divided into four main sections. The first section defines various theoretical models used for quantum chemical and molecular mechanics calculations. The second section evaluates the performance of different theoretical models for predicting properties such as geometries, reaction energies, vibrational frequencies, and more. The third section discusses practical strategies for carrying out calculations. The fourth section presents case studies that illustrate how calculations can provide insight into chemistry. The guide aims to help chemists select appropriate computational methods for different applications.
This document provides an overview of using quantum computers for quantum simulation. It discusses how quantum computers can efficiently store quantum states using superpositions, while classical computers require exponential resources. The document reviews Lloyd's method for implementing time evolution under a Hamiltonian via Trotterization. It also discusses other techniques like pseudo-spectral methods and quantum lattice gases. The goal of quantum simulation is to study properties of quantum systems that cannot be efficiently simulated classically.
A comprehensive review of the firefly algorithmsXin-She Yang
This document provides a comprehensive review of firefly algorithms. It begins with background on swarm intelligence and how firefly algorithms were inspired by the flashing lights of fireflies. It then describes the basic structure of firefly algorithms, including initializing a population of fireflies, evaluating their fitness, sorting by fitness, selecting the best solution, and moving fireflies toward more attractive solutions over generations. The document reviews applications of firefly algorithms in areas like continuous, combinatorial, and multi-objective optimization as well as engineering problems. It concludes by discussing exploration vs exploitation in firefly algorithms and directions for further development.
Ramos, almeida: artificial ant colonies in digital image habitats – a mass be...ArchiLab 7
This document discusses using artificial ant colony behavior models to perform image segmentation. It begins by summarizing previous work using ant colony models for optimization problems. It then describes the Chialvo and Millonas model of ant swarm behavior, which uses pheromone deposition and evaporation on a grid to simulate ant trails. The document proposes extending this model to digital image habitats, treating pixel intensities as the landscape. It argues that global image perception could emerge from the collective behavior of individual "ants" reacting locally to the pheromone field and pixel intensities. The goal is for the ant colony to implicitly learn and represent the image through their adaptive pheromone deposition.
The Algorithms of Life - Scientific Computing for Systems Biologyinside-BigData.com
In this deck from ISC 2019, Ivo Sbalzarini from TU Dresden presents: The Algorithms of Life - Scientific Computing for Systems Biology. In his talk, Sbalzarini mainly discussed the rapidly growing importance and influence in the life sciences for scientific high-performance computing.
"Scientific high-performance computing is of rapidly growing importance and influence in the life sciences. Thanks to the increasing knowledge about the molecular foundations of life, recent advances in biomedical data science, and the availability of predictive biophysical theories that can be numerically simulated, mechanistic understanding of the emergence of life comes within reach. Computing is playing a pivotal and catalytic role in this scientific revolution, both as a tool of investigation and hypothesis testing, but also as a school of thought and systems model. This is because a developing tissue, embryo, or organ can itself be seen as a massively parallel distributed computing system that collectively self-organizes to bring about behavior we call life. In any multicellular organism, every cell constantly takes decisions about growth, division, and migration based on local information, with cells communicating with each other via chemical, mechanical, and electrical signals across length scales from nanometers to meters. Each cell can therefore be understood as a mechano-chemical processing element in a complexly interconnected million- or billion-core computing system. Mechanistically understanding and reprogramming this system is a grand challenge. While the “hardware” (proteins, lipids, etc.) and the “source code” (genetic code) are increasingly known, we known virtually nothing about the algorithms that this code implements on this hardware. Our vision is to contribute to this challenge by developing computational methods and software systems for high-performance data analysis, inference, and numerical simulation of computer models of biological tissues, incorporating the known biochemistry and biophysics in 3D-space and time, in order to understand biological processes on an algorithmic basis. This ranges from real-time approaches to biomedical image analysis, to novel simulation languages for parallel high-performance computing, to virtual reality and machine learning for 3D microscopy and numerical simulations of coupled biochemical-biomechanical models. The cooperative, interdisciplinary effort to develop and advance our understanding of life using computational approaches not only places high-performance computing center stage, but also provides stimulating impulses for the future development of this field."
Watch the video: https://wp.me/p3RLHQ-kBB
Learn more: https://www.isc-hpc.com/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Coates p 1999: exploring 3_d design worlds using lindenmeyer systems and gen...ArchiLab 7
This document describes research using Lindenmeyer systems and genetic programming to explore 3D design worlds. Lindenmeyer systems are string rewriting systems that can be used to recursively generate 3D objects. The researchers used an isospatial grid to represent 3D space and represent objects as spheres inserted into the grid points. Genetic programming operations like crossover and mutation were performed on the Lindenmeyer system production rules to evolve new designs. Initial experiments tested the crossover operation and explored evolving objects in simple virtual environments with objectives like avoiding or seeking certain conditions.
A Technique for Partially Solving a Family of Diffusion Problemsijtsrd
Our aim in this paper is to expose the interesting role played by differ integral specifically, semi derivatives and semi integrals in solving certain diffusion problems. Along with the wave equation and Laplace equation, the diffusion equation is one of the three fundamental partial differential equation of mathematical physics. I will not discuss convential solutions of the diffusion equation at all. These range from closed form solutions for very simple model problems to computer methods for approximating the concentration of the diffusing substance on a network of points. Such solutions are described extensively in the literature .My purpose, rather, is to expose a technique for partially solving a family of diffusion problems, a technique that leads to a compact equation which is first order partially and half order temporally. I shall show that, for semi finite systems initially at equilibrium, our semi differential equation leads to a relationship between the intensive variable and the flux at the boundary. Use of this relationship then obviates the need to solve the original diffusion equation in those problems for which this behavior at the boundary is of primary importance. I shall, in fact, freely make use of the general properties established for differ integral operators as if all my functions were differ integrable. Dr. Ayaz Ahmad "A Technique for Partially Solving a Family of Diffusion Problems" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-6 , October 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18576.pdf
A guide to molecular mechanics and quantum chemical calculationsSapna Jha
This document provides an introduction and guide to molecular mechanics and quantum chemical calculations. It is divided into four main sections. The first section defines various theoretical models used for quantum chemical and molecular mechanics calculations. The second section evaluates the performance of different theoretical models for predicting properties such as geometries, reaction energies, vibrational frequencies, and more. The third section discusses practical strategies for carrying out calculations. The fourth section presents case studies that illustrate how calculations can provide insight into chemistry. The guide aims to help chemists select appropriate computational methods for different applications.
This document provides an overview of using quantum computers for quantum simulation. It discusses how quantum computers can efficiently store quantum states using superpositions, while classical computers require exponential resources. The document reviews Lloyd's method for implementing time evolution under a Hamiltonian via Trotterization. It also discusses other techniques like pseudo-spectral methods and quantum lattice gases. The goal of quantum simulation is to study properties of quantum systems that cannot be efficiently simulated classically.
A comprehensive review of the firefly algorithmsXin-She Yang
This document provides a comprehensive review of firefly algorithms. It begins with background on swarm intelligence and how firefly algorithms were inspired by the flashing lights of fireflies. It then describes the basic structure of firefly algorithms, including initializing a population of fireflies, evaluating their fitness, sorting by fitness, selecting the best solution, and moving fireflies toward more attractive solutions over generations. The document reviews applications of firefly algorithms in areas like continuous, combinatorial, and multi-objective optimization as well as engineering problems. It concludes by discussing exploration vs exploitation in firefly algorithms and directions for further development.
STATISTICAL DIVERSIONSPeter Petocz and Eric SoweyMacquarie.docxdessiechisomjj4
STATISTICAL DIVERSIONS
Peter Petocz and Eric Sowey
Macquarie University, Sydney and
The University of New South Wales
Sydney, Australia.
“The world is a complicated place.” This is often
heard in social conversation, and the speaker gen-
erally lets it go at that. But consider what it means
for someone who is trying to understand how things
actually work in this complicated world – how the
brain detects patterns, how consumers respond to
rises in credit card interest rates, how aeroplane
wings deflect during a supersonic flight and so on.
Understanding will not get very far without some
initially simplified representation of whatever situ-
ation is being examined. We call such a simplified
representation a model of reality. A neat definition
of a model is “a concise abstraction of reality.” A
model is an abstraction in the sense that it does not
include every detail of reality, but only those details
that are centrally relevant to the matter under inves-
tigation. And a model is concise in the sense that it
is relatively easy to comprehend and to work with.
A simple example of a model is a page of a street
directory. The page shows the directions and names
of streets in a certain locality and represents, by a
colour coding, the relative importance of the streets
as traffic arteries. It’s an abstraction of reality in
that it supplies the main information that a motorist
needs, but little else. For example, it’s two-
dimensional and so does not show the steepness of
hills; neither does it show all the buildings that line
those streets nor the boundaries of the land that
each building occupies. And the page is concise
in that it’s drawn on a small scale (typically,
1 cm = 100 m).
Because there are many different kinds of things in
the world that we seek to understand, there are
many different kinds of models. However, there is a
basic distinction between physical models and alge-
braic (also called computational) models. A physical
model is, as the name suggests, some kind of object
(whether in two or in three dimensions). Each page
in a street directory is evidently a physical model. So
is an architect’s three-dimensional representation of
the finished appearance of a building, and so also is
a child’s balsa wood aeroplane. An algebraic model,
by contrast, uses equations to describe the main
features of interest in a real world situation and
their interrelations. If these equations describe rela-
tions that are certain, or relations where chance
influences are ignored, then the model is called a
mathematical model. Newton’s three ‘laws’ of
motion and Einstein’s famous equation, E = mc 2,
are examples of mathematical models. If, however,
the equations explicitly include the influence of
chance, then the model is called a statistical model.
Although introductory textbooks of statistics may
not highlight the fact, all the standard probability
distributions (binomial, Poisson, Normal, etc) are
indeed statistical models. To see this in the case.
The document discusses methods for studying quantum dynamics, localization, and quantum machine learning. It is divided into four parts. Part I develops numerical and analytical methods for studying complex quantum systems and their dynamics. Part II explores scenarios where dynamics is slow and information is localized, focusing on disorder-induced and kinetically constrained localization. Part III designs thermodynamic protocols to speed up thermalization while keeping dissipated work constant. Part IV examines intersections between tensor network methods and machine learning, applying techniques like neural network quantum states and tensor networks to problems in machine learning and approximating probability distributions.
This document provides a summary of recent publications related to research conducted at the WPI-ICReDD. It lists five publications from 2018-2019 related to catalysis and materials science. It then discusses the research projects and personnel involved in the JST CREST program that is funding this work. The document outlines the goals of using data-driven approaches and machine learning to optimize materials discovery and design. It proposes a multilevel framework that combines in-house and public data along with quality control and annotations to advance the field.
Design , Analysis And Manufacturing of Garbage compactor - a Reviewijiert bestjournal
The collection of waste is vital work that ensures our communities remain pleasant environments in which to live. But major problem we are facing today is transportation of the waste which can be reduced by compacting or reducing size of particular waste. A compactor can be used to reduce the volume of waste streams. The waste weight will remain the same so there will be no savings from the total amount of waste produced. However,savings will occur because waste volume will be reduced by approximately 80% which will decr ease the number of times the dumpster will need to be emptied,therefore resulting in lower pick up fees..
Simplicial closure and higher-order link predictionAustin Benson
This document summarizes research on simplicial closure and higher-order link prediction in network science. It finds that groups of nodes often interact through complex trajectories before reaching "simplicial closure" where all nodes are jointly present in a simplex. Predicting these closed simplices is framed as a higher-order link prediction problem. Various score functions are proposed based on edge weights, node neighborhoods, and similarity measures. Scores combining local edge weight information consistently perform well, outperforming classical link prediction approaches. The results provide insights into higher-order structure and a framework for evaluating models of complex relational data.
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach IJECEIAES
The document presents a new approach called Bat-Cluster (BC) for automated graph clustering. BC combines the Fast Fourier Domain Positioning (FFDP) algorithm and the Bat Algorithm. FFDP positions graph nodes, then Bat Algorithm optimizes clustering by finding configurations that minimize the Davies-Bouldin Index. BC is tested on four benchmark graphs and outperforms Particle Swarm Optimization, Ant Colony Optimization, and Differential Evolution in providing higher clustering precision.
AN OPTIMIZATION ALGORITHM BASED ON BACTERIA BEHAVIORijaia
Paradigms based on competition have shown to be useful for solving difficult problems. In this paper we present a new approach for solving hard problems using a collaborative philosophy. A collaborative philosophy can produce paradigms as interesting as the ones found in algorithms based on a competitive philosophy. Furthermore, we show that the performance - in problems associated to explosive combinatorial - is comparable to the performance obtained using a classic evolutive approach.
COLOCATION MINING IN UNCERTAIN DATA SETS: A PROBABILISTIC APPROACHIJCI JOURNAL
In this paper we investigate colocation mining problem in the context of uncertain data. Uncertain data is a
partially complete data. Many of the real world data is Uncertain, for example, Demographic data, Sensor
networks data, GIS data etc.,. Handling such data is a challenge for knowledge discovery particularly in
colocation mining. One straightforward method is to find the Probabilistic Prevalent colocations (PPCs).
This method tries to find all colocations that are to be generated from a random world. For this we first
apply an approximation error to find all the PPCs which reduce the computations. Next find all the
possible worlds and split them into two different worlds and compute the prevalence probability. These
worlds are used to compare with a minimum probability threshold to decide whether it is Probabilistic
Prevalent colocation (PPCs) or not. The experimental results on the selected data set show the significant
improvement in computational time in comparison to some of the existing methods used in colocation
mining.
Quantum communication and quantum computingIOSR Journals
Abstract: The subject of quantum computing brings together ideas from classical information theory, computer
science, and quantum physics. This review aims to summarize not just quantum computing, but the whole
subject of quantum information theory. Information can be identified as the most general thing which must
propagate from a cause to an effect. It therefore has a fundamentally important role in the science of physics.
However, the mathematical treatment of information, especially information processing, is quite recent, dating
from the mid-20th century. This has meant that the full significance of information as a basic concept in physics
is only now being discovered. This is especially true in quantum mechanics. The theory of quantum information
and computing puts this significance on a firm footing, and has led to some profound and exciting new insights
into the natural world. Among these are the use of quantum states to permit the secure transmission of classical
information (quantum cryptography), the use of quantum entanglement to permit reliable transmission of
quantum states (teleportation), the possibility of preserving quantum coherence in the presence of irreversible
noise processes (quantum error correction), and the use of controlled quantum evolution for efficient
computation (quantum computation). The common theme of all these insights is the use of quantum
entanglement as a computational resource.
Keywords: quantum bits, quantum registers, quantum gates and quantum networks
This document discusses a new machine learning method for differentiating between quark and gluon jets using data from the ALICE experiment at CERN. Key points:
- The method uses features of jet substructure to construct discriminant variables to classify jets as initiated by quarks or gluons. Hundreds of features are explored.
- Data preprocessing steps are described, including removing unusable features, addressing class noise in jet labeling, and ranking features by information gain.
- Feature ranking identified both previously proposed discriminating features as well as new intriguing variables for better quark/gluon jet discrimination.
This document summarizes research using matrix product states (MPS) to simulate the dynamics of atoms in an optical lattice. MPS allows modeling of larger systems than conventional exact calculations by only keeping the most relevant quantum mechanical combinations. The researcher investigated MPS accuracy by comparing hopping predictions to exact calculations, finding convergence up to a certain precision. Future work will apply MPS to more complex lattice systems and geometries to replicate experiments.
This document provides an overview and table of contents for the Physical Chemistry textbook by McQuarrie and Simon. It describes the LibreTexts project which openly licenses free online textbooks. The document outlines 13 interconnected open education libraries covering a range of fields from basic to advanced levels. It notes that the LibreTexts libraries are supported by various educational organizations and that the content is licensed for free use and adaptation with attribution.
This document discusses fractal geometry and its applications in materials science. It begins by providing background on fractals and how they were discovered to describe natural patterns. Fractals have fractional dimensions and self-similar patterns across different scales. Non-linear dynamics and chaos theory are then introduced to study irregular patterns in nature. Specific fractal objects like the Cantor set and Koch curve are described. The document outlines how fractal analysis can be used to characterize microstructures, surfaces, cracks and particles in materials using techniques like box counting to determine fractal dimension. Finally, the role of image processing in materials science images for quantitative microstructure analysis is briefly discussed.
Quantum Cloning on Macroorganisms Containing Quantum InformationQUESTJOURNAL
ABSTRACT: This article indicates that the macroorganisms may be cloned. A model for teleportation of internal quantum state and stationary movement of a macroorganism is proposed. This is an important step towards potential teleportation of an organism in the future. In particular, are derived strict limits without signaling for probabilistic cloning and super-replication, which coincide with the corresponding optimally achievable known accuracies and rates. In the context of quantum metrology, the alignment of the reference frame and the estimation of the state at maximum likelihood, the interactions with the world may reveal the current state of the quantum system. This thesis is built around the hypothetical copying of the macroorganism, although the macroorganism contains quantum information. Additional details are given on the equivalence of the asymptotic phase-covariant cloning and the phase estimation for different indicators of the quality.
Simplicial closure and higher-order link prediction (SIAMNS18)Austin Benson
This document summarizes research on modeling and predicting the formation of higher-order relationships or interactions between nodes in network datasets. It introduces the concept of "simplicial closure" to describe how groups of nodes interact over time until forming a simplex or higher-order relationship. The researchers propose "higher-order link prediction" as a framework to evaluate models for predicting the formation of new simplices. They test various methods for scoring open triangles based on edge weights and other structural properties to predict which will become closed triangles. The results show these approaches can significantly outperform random prediction, with simply averaging edge weights often performing well.
Spreading processes on temporal networksPetter Holme
This document discusses temporal networks and how temporal structures can impact dynamical processes on networks. It begins by describing different types of temporal networks including person-to-person communication, information dissemination, physical proximity, and cellular biology networks. It then discusses methods for analyzing temporal network structures like inter-event times and how bursty or heavy-tailed distributions can slow spreading compared to memory-less processes. The document also presents examples of how neutralizing temporal structures like inter-event times or beginning/end times can impact spreading simulations. Finally, it discusses how different temporal network datasets exhibit diverse temporal structures.
This document provides an introduction to computational cubical homology. It begins by summarizing simplicial homology, including definitions of simplicial complexes, chains, and the boundary operator. It then introduces cubical homology, defining k-cubes, chains, and the cubical boundary operator. The document describes how computational homology uses linear algebra and the Smith normal form algorithm to compute homology groups. It concludes by discussing computational tools for homology and applications to image analysis and data science.
Evaluating the Use of Clustering for Automatically Organising Digital Library...pathsproject
Paper by Mark M. Hall, Mark Stevenson and Paul D. Clough from the Information School /Department of Computer Science, University of Sheffield, UK
24-27 September 2012
TPDL 2012, Cyprus
Developing effective meta heuristics for a probabilisticHari Rajagopalan
This document summarizes an article that evaluates four meta-heuristics (evolutionary algorithm, tabu search, simulated annealing, and hybridized hill-climbing) for solving a probabilistic location model called the maximum expected coverage location problem (MEXCLP). The MEXCLP aims to locate a limited number of ambulances to maximize expected coverage of demand points within a response time threshold. The article uses statistical experimental design to objectively analyze the performance of the four meta-heuristics on test problems of varying sizes. The results show that on average tabu search and simulated annealing find high quality solutions in the least amount of time, especially for large problems requiring dynamic redeployment, though all four methods produced good results
STATISTICAL DIVERSIONSPeter Petocz and Eric SoweyMacquarie.docxdessiechisomjj4
STATISTICAL DIVERSIONS
Peter Petocz and Eric Sowey
Macquarie University, Sydney and
The University of New South Wales
Sydney, Australia.
“The world is a complicated place.” This is often
heard in social conversation, and the speaker gen-
erally lets it go at that. But consider what it means
for someone who is trying to understand how things
actually work in this complicated world – how the
brain detects patterns, how consumers respond to
rises in credit card interest rates, how aeroplane
wings deflect during a supersonic flight and so on.
Understanding will not get very far without some
initially simplified representation of whatever situ-
ation is being examined. We call such a simplified
representation a model of reality. A neat definition
of a model is “a concise abstraction of reality.” A
model is an abstraction in the sense that it does not
include every detail of reality, but only those details
that are centrally relevant to the matter under inves-
tigation. And a model is concise in the sense that it
is relatively easy to comprehend and to work with.
A simple example of a model is a page of a street
directory. The page shows the directions and names
of streets in a certain locality and represents, by a
colour coding, the relative importance of the streets
as traffic arteries. It’s an abstraction of reality in
that it supplies the main information that a motorist
needs, but little else. For example, it’s two-
dimensional and so does not show the steepness of
hills; neither does it show all the buildings that line
those streets nor the boundaries of the land that
each building occupies. And the page is concise
in that it’s drawn on a small scale (typically,
1 cm = 100 m).
Because there are many different kinds of things in
the world that we seek to understand, there are
many different kinds of models. However, there is a
basic distinction between physical models and alge-
braic (also called computational) models. A physical
model is, as the name suggests, some kind of object
(whether in two or in three dimensions). Each page
in a street directory is evidently a physical model. So
is an architect’s three-dimensional representation of
the finished appearance of a building, and so also is
a child’s balsa wood aeroplane. An algebraic model,
by contrast, uses equations to describe the main
features of interest in a real world situation and
their interrelations. If these equations describe rela-
tions that are certain, or relations where chance
influences are ignored, then the model is called a
mathematical model. Newton’s three ‘laws’ of
motion and Einstein’s famous equation, E = mc 2,
are examples of mathematical models. If, however,
the equations explicitly include the influence of
chance, then the model is called a statistical model.
Although introductory textbooks of statistics may
not highlight the fact, all the standard probability
distributions (binomial, Poisson, Normal, etc) are
indeed statistical models. To see this in the case.
The document discusses methods for studying quantum dynamics, localization, and quantum machine learning. It is divided into four parts. Part I develops numerical and analytical methods for studying complex quantum systems and their dynamics. Part II explores scenarios where dynamics is slow and information is localized, focusing on disorder-induced and kinetically constrained localization. Part III designs thermodynamic protocols to speed up thermalization while keeping dissipated work constant. Part IV examines intersections between tensor network methods and machine learning, applying techniques like neural network quantum states and tensor networks to problems in machine learning and approximating probability distributions.
This document provides a summary of recent publications related to research conducted at the WPI-ICReDD. It lists five publications from 2018-2019 related to catalysis and materials science. It then discusses the research projects and personnel involved in the JST CREST program that is funding this work. The document outlines the goals of using data-driven approaches and machine learning to optimize materials discovery and design. It proposes a multilevel framework that combines in-house and public data along with quality control and annotations to advance the field.
Design , Analysis And Manufacturing of Garbage compactor - a Reviewijiert bestjournal
The collection of waste is vital work that ensures our communities remain pleasant environments in which to live. But major problem we are facing today is transportation of the waste which can be reduced by compacting or reducing size of particular waste. A compactor can be used to reduce the volume of waste streams. The waste weight will remain the same so there will be no savings from the total amount of waste produced. However,savings will occur because waste volume will be reduced by approximately 80% which will decr ease the number of times the dumpster will need to be emptied,therefore resulting in lower pick up fees..
Simplicial closure and higher-order link predictionAustin Benson
This document summarizes research on simplicial closure and higher-order link prediction in network science. It finds that groups of nodes often interact through complex trajectories before reaching "simplicial closure" where all nodes are jointly present in a simplex. Predicting these closed simplices is framed as a higher-order link prediction problem. Various score functions are proposed based on edge weights, node neighborhoods, and similarity measures. Scores combining local edge weight information consistently perform well, outperforming classical link prediction approaches. The results provide insights into higher-order structure and a framework for evaluating models of complex relational data.
Bat-Cluster: A Bat Algorithm-based Automated Graph Clustering Approach IJECEIAES
The document presents a new approach called Bat-Cluster (BC) for automated graph clustering. BC combines the Fast Fourier Domain Positioning (FFDP) algorithm and the Bat Algorithm. FFDP positions graph nodes, then Bat Algorithm optimizes clustering by finding configurations that minimize the Davies-Bouldin Index. BC is tested on four benchmark graphs and outperforms Particle Swarm Optimization, Ant Colony Optimization, and Differential Evolution in providing higher clustering precision.
AN OPTIMIZATION ALGORITHM BASED ON BACTERIA BEHAVIORijaia
Paradigms based on competition have shown to be useful for solving difficult problems. In this paper we present a new approach for solving hard problems using a collaborative philosophy. A collaborative philosophy can produce paradigms as interesting as the ones found in algorithms based on a competitive philosophy. Furthermore, we show that the performance - in problems associated to explosive combinatorial - is comparable to the performance obtained using a classic evolutive approach.
COLOCATION MINING IN UNCERTAIN DATA SETS: A PROBABILISTIC APPROACHIJCI JOURNAL
In this paper we investigate colocation mining problem in the context of uncertain data. Uncertain data is a
partially complete data. Many of the real world data is Uncertain, for example, Demographic data, Sensor
networks data, GIS data etc.,. Handling such data is a challenge for knowledge discovery particularly in
colocation mining. One straightforward method is to find the Probabilistic Prevalent colocations (PPCs).
This method tries to find all colocations that are to be generated from a random world. For this we first
apply an approximation error to find all the PPCs which reduce the computations. Next find all the
possible worlds and split them into two different worlds and compute the prevalence probability. These
worlds are used to compare with a minimum probability threshold to decide whether it is Probabilistic
Prevalent colocation (PPCs) or not. The experimental results on the selected data set show the significant
improvement in computational time in comparison to some of the existing methods used in colocation
mining.
Quantum communication and quantum computingIOSR Journals
Abstract: The subject of quantum computing brings together ideas from classical information theory, computer
science, and quantum physics. This review aims to summarize not just quantum computing, but the whole
subject of quantum information theory. Information can be identified as the most general thing which must
propagate from a cause to an effect. It therefore has a fundamentally important role in the science of physics.
However, the mathematical treatment of information, especially information processing, is quite recent, dating
from the mid-20th century. This has meant that the full significance of information as a basic concept in physics
is only now being discovered. This is especially true in quantum mechanics. The theory of quantum information
and computing puts this significance on a firm footing, and has led to some profound and exciting new insights
into the natural world. Among these are the use of quantum states to permit the secure transmission of classical
information (quantum cryptography), the use of quantum entanglement to permit reliable transmission of
quantum states (teleportation), the possibility of preserving quantum coherence in the presence of irreversible
noise processes (quantum error correction), and the use of controlled quantum evolution for efficient
computation (quantum computation). The common theme of all these insights is the use of quantum
entanglement as a computational resource.
Keywords: quantum bits, quantum registers, quantum gates and quantum networks
This document discusses a new machine learning method for differentiating between quark and gluon jets using data from the ALICE experiment at CERN. Key points:
- The method uses features of jet substructure to construct discriminant variables to classify jets as initiated by quarks or gluons. Hundreds of features are explored.
- Data preprocessing steps are described, including removing unusable features, addressing class noise in jet labeling, and ranking features by information gain.
- Feature ranking identified both previously proposed discriminating features as well as new intriguing variables for better quark/gluon jet discrimination.
This document summarizes research using matrix product states (MPS) to simulate the dynamics of atoms in an optical lattice. MPS allows modeling of larger systems than conventional exact calculations by only keeping the most relevant quantum mechanical combinations. The researcher investigated MPS accuracy by comparing hopping predictions to exact calculations, finding convergence up to a certain precision. Future work will apply MPS to more complex lattice systems and geometries to replicate experiments.
This document provides an overview and table of contents for the Physical Chemistry textbook by McQuarrie and Simon. It describes the LibreTexts project which openly licenses free online textbooks. The document outlines 13 interconnected open education libraries covering a range of fields from basic to advanced levels. It notes that the LibreTexts libraries are supported by various educational organizations and that the content is licensed for free use and adaptation with attribution.
This document discusses fractal geometry and its applications in materials science. It begins by providing background on fractals and how they were discovered to describe natural patterns. Fractals have fractional dimensions and self-similar patterns across different scales. Non-linear dynamics and chaos theory are then introduced to study irregular patterns in nature. Specific fractal objects like the Cantor set and Koch curve are described. The document outlines how fractal analysis can be used to characterize microstructures, surfaces, cracks and particles in materials using techniques like box counting to determine fractal dimension. Finally, the role of image processing in materials science images for quantitative microstructure analysis is briefly discussed.
Quantum Cloning on Macroorganisms Containing Quantum InformationQUESTJOURNAL
ABSTRACT: This article indicates that the macroorganisms may be cloned. A model for teleportation of internal quantum state and stationary movement of a macroorganism is proposed. This is an important step towards potential teleportation of an organism in the future. In particular, are derived strict limits without signaling for probabilistic cloning and super-replication, which coincide with the corresponding optimally achievable known accuracies and rates. In the context of quantum metrology, the alignment of the reference frame and the estimation of the state at maximum likelihood, the interactions with the world may reveal the current state of the quantum system. This thesis is built around the hypothetical copying of the macroorganism, although the macroorganism contains quantum information. Additional details are given on the equivalence of the asymptotic phase-covariant cloning and the phase estimation for different indicators of the quality.
Simplicial closure and higher-order link prediction (SIAMNS18)Austin Benson
This document summarizes research on modeling and predicting the formation of higher-order relationships or interactions between nodes in network datasets. It introduces the concept of "simplicial closure" to describe how groups of nodes interact over time until forming a simplex or higher-order relationship. The researchers propose "higher-order link prediction" as a framework to evaluate models for predicting the formation of new simplices. They test various methods for scoring open triangles based on edge weights and other structural properties to predict which will become closed triangles. The results show these approaches can significantly outperform random prediction, with simply averaging edge weights often performing well.
Spreading processes on temporal networksPetter Holme
This document discusses temporal networks and how temporal structures can impact dynamical processes on networks. It begins by describing different types of temporal networks including person-to-person communication, information dissemination, physical proximity, and cellular biology networks. It then discusses methods for analyzing temporal network structures like inter-event times and how bursty or heavy-tailed distributions can slow spreading compared to memory-less processes. The document also presents examples of how neutralizing temporal structures like inter-event times or beginning/end times can impact spreading simulations. Finally, it discusses how different temporal network datasets exhibit diverse temporal structures.
This document provides an introduction to computational cubical homology. It begins by summarizing simplicial homology, including definitions of simplicial complexes, chains, and the boundary operator. It then introduces cubical homology, defining k-cubes, chains, and the cubical boundary operator. The document describes how computational homology uses linear algebra and the Smith normal form algorithm to compute homology groups. It concludes by discussing computational tools for homology and applications to image analysis and data science.
Evaluating the Use of Clustering for Automatically Organising Digital Library...pathsproject
Paper by Mark M. Hall, Mark Stevenson and Paul D. Clough from the Information School /Department of Computer Science, University of Sheffield, UK
24-27 September 2012
TPDL 2012, Cyprus
Developing effective meta heuristics for a probabilisticHari Rajagopalan
This document summarizes an article that evaluates four meta-heuristics (evolutionary algorithm, tabu search, simulated annealing, and hybridized hill-climbing) for solving a probabilistic location model called the maximum expected coverage location problem (MEXCLP). The MEXCLP aims to locate a limited number of ambulances to maximize expected coverage of demand points within a response time threshold. The article uses statistical experimental design to objectively analyze the performance of the four meta-heuristics on test problems of varying sizes. The results show that on average tabu search and simulated annealing find high quality solutions in the least amount of time, especially for large problems requiring dynamic redeployment, though all four methods produced good results
Similar to Linear combinations of radioactive decay models for generational (20)
Ready to Unlock the Power of Blockchain!Toptal Tech
Imagine a world where data flows freely, yet remains secure. A world where trust is built into the fabric of every transaction. This is the promise of blockchain, a revolutionary technology poised to reshape our digital landscape.
Toptal Tech is at the forefront of this innovation, connecting you with the brightest minds in blockchain development. Together, we can unlock the potential of this transformative technology, building a future of transparency, security, and endless possibilities.
Discover the benefits of outsourcing SEO to Indiadavidjhones387
"Discover the benefits of outsourcing SEO to India! From cost-effective services and expert professionals to round-the-clock work advantages, learn how your business can achieve digital success with Indian SEO solutions.
Understanding User Behavior with Google Analytics.pdfSEO Article Boost
Unlocking the full potential of Google Analytics is crucial for understanding and optimizing your website’s performance. This guide dives deep into the essential aspects of Google Analytics, from analyzing traffic sources to understanding user demographics and tracking user engagement.
Traffic Sources Analysis:
Discover where your website traffic originates. By examining the Acquisition section, you can identify whether visitors come from organic search, paid campaigns, direct visits, social media, or referral links. This knowledge helps in refining marketing strategies and optimizing resource allocation.
User Demographics Insights:
Gain a comprehensive view of your audience by exploring demographic data in the Audience section. Understand age, gender, and interests to tailor your marketing strategies effectively. Leverage this information to create personalized content and improve user engagement and conversion rates.
Tracking User Engagement:
Learn how to measure user interaction with your site through key metrics like bounce rate, average session duration, and pages per session. Enhance user experience by analyzing engagement metrics and implementing strategies to keep visitors engaged.
Conversion Rate Optimization:
Understand the importance of conversion rates and how to track them using Google Analytics. Set up Goals, analyze conversion funnels, segment your audience, and employ A/B testing to optimize your website for higher conversions. Utilize ecommerce tracking and multi-channel funnels for a detailed view of your sales performance and marketing channel contributions.
Custom Reports and Dashboards:
Create custom reports and dashboards to visualize and interpret data relevant to your business goals. Use advanced filters, segments, and visualization options to gain deeper insights. Incorporate custom dimensions and metrics for tailored data analysis. Integrate external data sources to enrich your analytics and make well-informed decisions.
This guide is designed to help you harness the power of Google Analytics for making data-driven decisions that enhance website performance and achieve your digital marketing objectives. Whether you are looking to improve SEO, refine your social media strategy, or boost conversion rates, understanding and utilizing Google Analytics is essential for your success.
Gen Z and the marketplaces - let's translate their needsLaura Szabó
The product workshop focused on exploring the requirements of Generation Z in relation to marketplace dynamics. We delved into their specific needs, examined the specifics in their shopping preferences, and analyzed their preferred methods for accessing information and making purchases within a marketplace. Through the study of real-life cases , we tried to gain valuable insights into enhancing the marketplace experience for Generation Z.
The workshop was held on the DMA Conference in Vienna June 2024.
Instagram has become one of the most popular social media platforms, allowing people to share photos, videos, and stories with their followers. Sometimes, though, you might want to view someone's story without them knowing.
Linear combinations of radioactive decay models for generational
1. Science of Computer Programming 62 (2006) 184–203
www.elsevier.com/locate/scico
Linear combinations of radioactive decay models for generational
garbage collection
William D. Clinger∗, Fabio V. Rojas
Northeastern University, 360 Huntington Ave, Boston, MA 02115, USA
Received 29 April 2005; received in revised form 16 January 2006; accepted 24 February 2006
Available online 16 June 2006
Abstract
A program’s distribution of object lifetimes is one of the factors that determines whether and how much it will benefit from
generational garbage collection, and from what kind of generational collector. Linear combinations of radioactive decay models
appear adequate for modelling object lifetimes in many programs, especially when the goal is to analyze the relative or theoretical
performance of simple generational collectors.
The boundary between models that favor younger-first generational collectors and models that favor older-first generational
collectors is mathematically complex, even for highly idealized collectors. For linear combinations of radioactive decay models,
non-generational collection is rarely competitive with idealized generational collection, even at that boundary.
c 2006 Elsevier B.V. All rights reserved.
Keywords: Garbage collection; Radioactive decay model
1. Introduction and related work
Garbage collection is a technology that automatically reclaims unreachable heap storage [14,31]. Generational
garbage collectors divide the heap into two or more regions, known as generations because they often group objects of
similar age, and collect these generations at different times [14,16,30]. Most generational garbage collectors attempt
to collect younger generations more frequently than older generations, so we call them younger-first collectors.
In an effort to characterize the theoretical relationship between object lifetimes and the performance of generational
garbage collection, Baker reasoned that an idealized exponential (radioactive) decay model of object lifetimes might
represent a dividing line between programs that benefit from generational collection and programs that do not. In
particular, Baker conjectured that generational collection would perform no better and no worse than non-generational
collection for the radioactive decay model [2].
Clinger calculated that this is not so: younger-first generational collectors actually perform worse than non-
generational collectors for the radioactive decay model, but a novel renewal-older-first generational collector of his
invention performs better [6].
This discovery, and the realization that the Mature Object Space collector (the train algorithm) behaves much like an
older-first collector, led Stefanovi´c to invent and to simulate an entire family of older-first algorithms for generational
∗ Corresponding author. Tel.: +1 617 373 8687.
E-mail addresses: will@ccs.neu.edu (W.D. Clinger), frojas@ccs.neu.edu (F.V. Rojas).
0167-6423/$ - see front matter c 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.scico.2006.02.005
2. W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203 185
garbage collection [13,20,25,26]. Hansen and (independently) Moss’s research group implemented variations of the
renewal-older-first and Stefanovi´c’s deferred-older-first algorithms in research-quality implementations of Scheme
and Java, demonstrating that these older-first algorithms are quite viable [4,9,10,28].
In fact, these older-first collectors outperform conventional younger-first collectors on several GC-intensive
benchmarks. On the other hand, the experimental results also show that younger-first collectors outperform older-first
collectors on some other benchmarks [9,10]. Evidently some programs perform better with younger-first collection,
some perform better with older-first collection, and many perform about the same with either.
Why? Baker’s original question concerning the theoretical dividing line between younger-first generational and
non-generational collection has not been answered, and we must now ask that question with regard to younger-first
versus older-first generational collection [27]. The radioactive decay model is not the dividing line, but what is? On
the boundary, where the theoretical efficiencies of younger-first and older-first collection coincide, is non-generational
collection optimal?
In this paper we use linear combinations of radioactive decay models to calculate mark/cons ratios (the most
important theoretical predictor of amortized efficiency, defined in Section 3) for several idealized garbage collectors.
These calculations show there is no simple line dividing programs that are better suited for younger-first collection
from programs that are better suited for older-first collection. In this paper, the dividing line appears as a complex
five-dimensional surface, but this is undoubtedly a crude oversimplification. Our calculations also show that non-
generational collection is usually far from optimal on that surface, which suggests that non-generational garbage
collection is seldom a good compromise between younger-first and older-first generational collection.
The good news is that detailed empirical studies of object lifetimes, as reported independently by Stefanovi´c
and Hansen [9,25], suggest that linear combinations of radioactive decay models often do a reasonably good job
of modelling the lifetimes of objects in real programs, such as the javac and SPECjbb benchmarks discussed in
Sections 2.7, 2.8 and 6.1.
Several systems now provide several alternative garbage collectors, and some allow the garbage collector to be
selected dynamically, using heuristics based on heap occupancy or on the results of offline profiling [18,21]. Our
models provide the kind of theory that is needed to develop better heuristics.
At the very least, our models and calculations help to explain some of the repeatable but puzzling patterns that are
often observed in experimental studies of garbage collection [4,9,10]. For example, the calculations presented in this
paper help to explain why the generational garbage-first algorithm tends to perform better than a pure garbage-first
algorithm [7].
2. Models of object lifetimes
2.1. Generational hypotheses
In most programs, the mortality rate for young objects is much higher than for old objects. This weak generational
hypothesis [11,12,31] is not true of all programs, but is especially likely to describe fast-allocating programs for which
the performance of garbage collection is most critical, and appears to be true even of fast-allocating programs written
in C [3].
There is much less evidence for the strong generational hypothesis, which postulates a negative correlation between
age and mortality rate even for long-lived objects [12,19,24,31].
As will be seen, radioactive decay models imply that the mortality rate is independent of age, so they satisfy
neither the weak nor the strong generational hypothesis. A linear combination of two or more radioactive decay
models will satisfy the weak generational hypothesis unless it degenerates into a pure radioactive decay model. A
linear combination of only two radioactive decay models will not satisfy the strong generational hypothesis. Linear
combinations of three or more radioactive decay models can satisfy the strong generational hypothesis at least as well
as actual programs.
2.2. Equilibrium models
An equilibrium model of object lifetimes is a model that satisfies the following three assumptions:
3. 186 W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203
Assumption 1. Two objects can be compared to see whether they are the same object, and the live objects of
one or more generations can be distinguished from the dead objects of those generations by performing a garbage
collection within those generations, but objects have no other distinguishing characteristics that might be exploited by
a generational garbage collector.
Assumption 2. There exists a probability density function P for mortality such that, for each newly allocated object
o, the probability that o will die between ages t0 and t1 is given by
t1
t0
P(t)dt.
Assumption 3. The amount of live storage reaches an equilibrium.
Assumption 1 says that equilibrium models ignore the distinctions between live and reachable objects, objects that
contain pointers and objects that don’t, objects that refer to older objects and objects that refer to younger objects, and
so forth. Assumption 2 says the distribution of object lifetimes is independent of the time at which an object is created;
different equilibrium models may of course have different distributions. Assumption 3 says that heap storage reaches
a steady state. None of these assumptions are completely realistic, but they are a reasonable compromise between
reality and tractability.
2.3. Radioactive decay models (RDM)
A radioactive decay model is fully specified by its one parameter, the half-life h. For every object that is live at any
time t0, the probability that the object will still be alive at time t0 + t is St (t0) = 2−t/h. The probability that the object
will be dead at that time is 1 − 2−t/h. Taking t0 to be the time of allocation, the survivor function is s(t) = 2−t/h. The
instantaneous mortality rate is a constant:
m(t) = −
s (t)
s(t)
=
log 2
h
.
For every object, the probability density function for mortality is
Ph(t) = m(t)s(t) =
log 2
h
2−t/h
.
We will now calculate the expected number n of live objects at equilibrium. If the time t is measured by the
number of objects that have been allocated, then the radioactive decay model implies that an equilibrium will indeed
be approached after several half-lives of time have passed. At equilibrium one object can be expected to die per unit
time. The expected number n of live objects at equilibrium is therefore related to the half-life h by 1 = n(1 − 2−1/h).
Let r = 2−1/h = 1 − 1/n. If f (h) = 1 − r and g(h) = m(t) = log 2
h , then
lim
h→∞
f (h)
g(h)
= lim
h→∞
f (h)
g (h)
= lim
h→∞
2−1/h
= 1
by L’Hospital’s Rule [1,29]. Hence r = 1 − f (h) ≈ 1 − g(h) ≈ 1 − log 2
h for large h. Small values of h imply a small
number of live objects, which makes garbage collection too easy to be interesting, so this approximation can safely
be used to calculate that the live storage at equilibrium is
n = 1/(1 − r) ≈
h
log 2
.
= 1.4427h. (1)
2.4. Linear combinations of two radioactive decay models (RDM2)
A linear combination of two radioactive decay models (RDM2) has three parameters: the half-life h1 of the short-
lived objects, the half-life h2 of the long-lived objects, and the fraction w of the allocated objects that are of the
short-lived kind. We require 0 < h1 < h2 < ∞ and 0 ≤ w < 1, so the linear combination model degenerates into a
radioactive decay model if and only if there are no short-lived objects, and w = 0.
4. W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203 187
The probability density function for mortality in the RDM2 model is
Ph1,h2,w(t) = wPh1 + (1 − w)Ph2 .
One of our goals is to calculate the parameters of this model from survival rates seen in actual programs. To do that,
we must calculate the consequences of the model in some detail, starting with the expected number of live objects at
equilibrium. Let
n1 =
h1
log 2
(2)
n2 =
h2
log 2
(3)
r1 = 2−1/h1 = 1 −
1
n1
≈ 1 −
log 2
h1
(4)
r2 = 2−1/h2 = 1 −
1
n2
≈ 1 −
log 2
h2
. (5)
At equilibrium the live storage will be about
n ≈ wn1 + (1 − w)n2
=
wh1 + (1 − w)h2
log 2
.
= 1.4427(wh1 + (1 − w)h2).
Now we calculate some expected survival and mortality rates. By Assumption 2 of Section 2.2, the probability that a
newly allocated object will die between ages t0 and t1 is
t1
t0
Ph1,h2,w(t) dt = w(r
t0
1 − rt1
1 ) + (1 − w)(r
t0
2 − rt1
2 ).
The probability that a newly allocated object will survive to age t is
s(t) = wrt
1 + (1 − w)rt
2.
The conditional probability that an object that has survived to age t will survive to age t + t is the probability of the
latter divided by the probability of the former:
S t (t) =
s(t + t)
s(t)
.
The probability that an object that is live at age t will die before age t + t is 1 − S t (t). The mortality rate is
m(t) = −
s (t)
s(t)
=
1
log 2
wh1rt
1 + (1 − w)h2rt
2
wrt
1 + (1 − w)rt
2
.
For any time t0, define live(t) as the amount of storage that is allocated between t0 and t0 +t and is expected to survive
to time t0 + t. live(t) can be calculated as the probability that the object allocated at t0 will survive the next t − 1
allocations, plus the probability that the object allocated at t0 + 1 will survive the next t − 2 allocations, and so on:
live(t) =
t−1
i=0
s(i) (6)
= w
t−1
i=0
ri
1 + (1 − w)
t−1
i=0
ri
2 (7)
= w
(1 − rt
1)
(1 − r1)
+ (1 − w)
(1 − rt
2)
(1 − r2)
(8)
= wn1(1 − rt
1) + (1 − w)n2(1 − rt
2) (by (4), (5)) (9)
5. 188 W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203
≈ w
h1
log 2
(1 − rt
1) + (1 − w)
h2
log 2
(1 − rt
2). (10)
Now suppose h1 t h2. Then r t
1 ≈ 0, and r t
2 ≈ 1 − log 2
h2
t. Almost all long-lived objects are expected
to survive to age t, while the number of short-lived objects that are expected to survive to time t is essentially
independent of t (because objects that were allocated many half-lives ago have negligible chance of surviving).
Thus, from Eq. (10) and the above,
live( t) ≈ w
h1
log 2
+ (1 − w) t. (11)
Eq. (11) can be used to estimate the parameters h1, h2, w of the model from survival rates that are observed in actual
programs. Let e0 be the fraction of storage that is promoted out of a nursery of size t, where that nursery is collected
after every t bytes of allocation and is empty following the collection. Let e1 be an estimate of the conditional
probability that storage of age less than t will survive the next t bytes of allocation. Let e2 be an estimate of the
conditional probability that storage of age greater than 2 t will survive the next t bytes of allocation. Then
live( t) ≈ e0 t
live(2 t) ≈ e0 t + e0e1 t
live(2 t) − live( t) ≈ e0e1 t.
Combining this with Eq. (11), we can estimate that
h1
.
=
e0(1 − e1) log 2
1 − e0e1
t
h2
.
= −
log 2
log e2
t
w
.
= 1 − e0e1.
Example 1. Fig. 1 shows the storage profile for the nboyer:2 benchmark, which is a modernized and larger version
of the boyer theorem-proving benchmark [8–10,17]. The topmost line shows the total volume of live storage as a
function of time (where time is measured by the number of bytes that have been allocated). Each interior line shows
how much storage survives from the time at which the interior line separated from the topmost line. With a nursery
of t = 500 kB, which is the resolution of Fig. 1, that storage profile implies e0
.
= 42%, e1
.
= 79%, and e2
.
= 96%.
Using the equations above, we estimate that two-thirds of the allocated storage is short-lived, with a half-life of a little
over 45 kB, and the remaining third is long-lived with a half-life on the order of 85 MB. Fig. 2 shows the storage
profile predicted by this RDM2 model.
Comparing Figs. 1 and 2, the nboyer:2 benchmark is farther from equilibrium, and has more long-lived storage,
than can be accounted for by the RDM2 model. The main reason for this model’s poor fit is that nboyer:2 satisfies the
strong generational hypothesis well enough to require a linear combination of at least three radioactive decay models,
as developed below and shown in Fig. 3. The nursery size of t = 500 kB is large enough to observe substantial
mortality among objects that survive the nursery, but is not large enough to observe the decrease in mortality rate
among truly long-lived objects. Hence the RDM2 model overestimates the mortality rate of long-lived objects, which
results in underestimates for the volume of long-lived storage and time to equilibrium.
2.5. Linear combinations of three radioactive decay models (RDM3)
A linear combination of three radioactive decay models (RDM3) has five parameters: the half-life h1 of the short-
lived objects, the half-life h2 of the intermediate-lived objects, the half-life h3 of the long-lived objects, the fraction
w1 of the allocated objects that are of the short-lived kind, and the fraction w2 of the allocated objects that are of the
intermediate-lived kind.
Assuming h1 t h2 s h3 we can estimate these parameters using a generalization of the calculations
shown above. Let f0, f1, and f2 be probabilities analogous to e0, e1, and e2 except that they are estimated using a
6. W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203 189
Fig. 1. Storage profile for the nboyer:2 benchmark.
Fig. 2. Simulated profile for an RDM2 model of the nboyer:2 benchmark.
Fig. 3. Simulated profile for an RDM3 model of the nboyer:2 benchmark.
larger “nursery” size s. Then
w1
.
= 1 − e0e1
w2
.
= 1 − w1 − f0 f1
h1
.
=
e0(1 − e1) log 2
1 − e0e1
t
h2
.
=
log 2
w2
( f0 − (1 − w1 − w2)) s −
h1w1
w2
h3
.
= −
log 2
log f2
s.
This analysis can be further generalized to estimate the parameters for linear combinations of arbitrarily many
radioactive decay models.
7. 190 W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203
Fig. 4. Storage profile for the first of javac’s four iterations.
At equilibrium the live storage for the linear combination of three radioactive decay models will be
n
.
= 1.4427(w1h1 + w2h2 + (1 − w1 − w2)h3).
Better example. For the nboyer:2 benchmark [9,10], with s = 5 MB, the storage profile shown in Fig. 1 yields
f0
.
= 22%, f1
.
= 61%, and f2
.
= 97%. Combining this with our measurements for t, our equations estimate that
two-thirds of the allocated storage is short-lived, with a half-life of a little over 45 kB; 20% of the allocated storage has
a half-life of about 1.4 MB; and the remaining 14% of the allocated storage is essentially permanent, with a half-life
on the order of 100 MB.
Fig. 3 shows the storage profile predicted by this more accurate model. Comparing the actual with the simulated
profile emphasizes the fact that linear combinations of radioactive decay models can express only the smoothed
average behavior of a program, and cannot express the fractal or phase structures often seen in real programs. This
matters only if the fractal or phase behavior is visible at the “sampling rate” determined by the frequency of garbage
collection.
2.6. Radioactive decay models with permanent storage
Instead of working with linear combinations of three radioactive decay models, Sections 5 and 6 use a slightly
simpler model that regards the longest-lived objects as truly permanent.
A linear combination of two radioactive decay models with some permanent storage is an equilibrium model with
four parameters: the three parameters h1, h2, and w of a linear combination of two radioactive decay models, together
with the volume of permanent storage n3.
2.7. Analysis of the javac benchmark
The javac benchmark uses the standard Java compiler to compile a certain program four times [22]. While a
linear combination of radioactive decay models cannot model the iterative behavior of this benchmark, it is possible
to perform a phase-wise analysis of javac using the models described in Sections 2.4 and 2.5.
Fig. 4 shows the storage profile for the first of the benchmark’s four iterations. This phase of the benchmark can
be divided into a ramp-up phase in which abstract syntax trees and other data structures are allocated, and a plateau
phase during which the input is actually compiled. The ramp-up phase ends, and the plateau phase begins, after 20
MB have been allocated. (In fact, a third phase begins after 50 MB have been allocated. This phase is too short to
justify separate analysis, so we count it as part of the plateau phase.)
The ramp-up phase can be modelled by a linear combination of two radioactive decay models. Applying the
formulas in Section 2.4 to the ramp-up phase alone, with a t of 500 kB, results in a model with a short half-life
of about 2 kB, and a long half-life of about 1.2 GB, with 55% of the objects being short lived.
8. W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203 191
Fig. 5. Simulated storage profile for the first of javac’s iterations.
Fig. 6. Storage profile for part of the SPECjbb benchmark.
During the plateau phase, the objects that were allocated during the ramp-up phase and survive into the plateau
phase can be modelled by a pure radioactive decay model with a half-life of 151 MB.
The objects that are allocated during the plateau phase can be modelled by a linear combination of three radioactive
decay models as in Section 2.5. Keeping t at 500 kB and using a s of 5000 kB produces a model in which the
short half-life is around 20 kB, the intermediate half-life is approximately 1.25 MB, and the long half-life is 58 MB.
In this model 87% of the bytes allocated are short-lived, 3% are intermediate-lived, and 10% are long lived. Fig. 5
shows the model’s simulated storage profile, which should be compared with the actual profile in Fig. 4.
2.8. Analysis of the SPECjbb benchmark
The SPECjbb benchmark simulates a set of warehouses that respond to customer requests [23]. Each warehouse
has about 25 MB of data, and runs in a separate thread.
Fig. 6 shows part of the storage profile for SPECjbb with just one warehouse. Construction of that warehouse began
after 48 MB had been allocated, and was completed after 90 MB had been allocated. Then a warmup phase allocated
another 20 MB. The timed portion of the benchmark began at the end of the warmup phase, after 110 MB had been
allocated. These specific numbers depend upon the benchmark’s parameters, but the timed phase of the benchmark
always runs at some storage equilibrium. To improve the horizontal resolution of the timed phase, Fig. 6 ends after
200 MB of allocation.
9. 192 W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203
Fig. 7. Simulated profile for phase-wise RDM2+perm model of SPECjbb.
This particular run of the SPECjbb benchmark can be simulated using a phase-wise combination of two RDM2
models with additional permanent storage. Applying the formulas in Section 2.4 to the construction phase, with
t = 500 kB, results in a model with a short half-life of 576 bytes, a long half-life of 65 GB, with half of the
objects being short-lived. The amount of initial permanent storage during the construction phase is 6.4 MB. This
initial permanent storage corresponds to objects allocated before the construction phase begins. Applying the same
analysis to the warmup and timed phases yields a model with a short half-life of 7 kB and a long half-life of 1.5
MB, again with half the objects being short-lived. All objects allocated before the warmup phase are considered to be
permanent during the warmup and timing phases. Fig. 7 shows the storage profile implied by this model.
3. Mark/cons ratios
For any given benchmark, the number of words that are marked depends upon the garbage collector, but the average
cost of marking a word is fairly constant, at least for garbage collectors that use the same basic algorithm. The number
of words that are marked is therefore a good first-order predictor of garbage collection time [2]. A more accurate
predictor might also consider the average object size and pointer density.
The amortized cost of garbage collection is defined as the total cost of collection divided by the number of words
of storage that are allocated by the mutator. If we define µ as the number of words marked divided by the number of
words allocated, then the amortized cost of garbage collection is roughly proportional to µ times the cost of marking
a word. µ is referred to as the mark/cons ratio.
The mark/cons ratio is easy to compute for non-generational garbage collectors and a heap at equilibrium, in which
storage is allocated just as fast as it becomes garbage. Let n be the total amount of live storage at equilibrium, let N
be the total size of the heap, and L = N/n be the inverse load factor. The garbage collector marks n words on each
collection, and N − n words are allocated between collections, so the mark/cons ratio for non-generational collection
is
µ0 =
n
N − n
=
1
L − 1
.
The next few sections compare the theoretical performance of different garbage collection algorithms by computing
their mark/cons ratios for various models of object lifetimes.
4. Generational collectors
Generational garbage collectors work by dividing heap storage into generations. They have policies that determine
the generation in which a new object will be allocated, when and how to move objects from one generation to another,
and when and how to collect garbage. A generational collector attempts to lower the mark/cons ratio by collecting
10. W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203 193
generations that contain a higher percentage of garbage than the average for all generations. Such generations require
less marking and reclaim more storage than the average.
This section describes five algorithms for generational garbage collection. Although these algorithms are somewhat
idealized to make them easier to analyze, the 2YF, 3YF, and 3ROF algorithms correspond quite closely to the three
best collectors in the Larceny implementation of Scheme [5,6,9,10,15].
4.1. 2-generational younger-first (2YF)
The youngest generation is a nursery of fixed size N0, whose surviving objects are promoted into the oldest
generation on every garbage collection. All objects are allocated within the nursery, and the garbage collector is
called whenever the nursery becomes full, so the garbage collector is called after every N0 bytes have been allocated.
Then N0 is the smallest interval of mutator time that matters to the garbage collector.
If N is the total size of the heap, then the size of the oldest generation is N1 = N − N0. If the garbage collector
is called when there is enough free space within the oldest generation to copy all of the survivors out of the nursery
into the oldest generation, then the garbage collector will perform a minor collection in which only the nursery is
collected. Otherwise the collector will perform a full collection of the heap.
4.2. 3-generational younger-first (3YF)
When there is a substantial volume of permanent heap storage, a 2-generational younger-first collector may spend
too much time marking permanent storage in full collections. This problem can be solved by isolating the permanent
storage within a third generation.
For the analyses of this paper we ignore the problem of identifying the permanent storage, by assuming an
equilibrium in which the third generation contains all and only permanent storage. We also assume that the third
generation is collected so rarely that the cost of these rare collections is negligible. Under these overly sanguine
assumptions, the 3YF algorithm will always perform at least as well as the 2YF algorithm.
4.3. 2-generational renewal-older-first (2ROF)
This section describes a 2-generational renewal-older-first generational collector (also known as the non-predictive
[6] or older-first mix [4] algorithm). The younger generation has a fixed size N0, and the older generation is of size
N1 = N − N0. Objects are allocated within the older generation if it has enough free space; otherwise objects are
allocated within the younger generation. The garbage collector is called whenever both generations are full.
The renewal-older-first garbage collector never performs a full collection. Only the older generation is collected.
If this collection frees enough space within the older generation to accommodate all of the objects that are within
the younger generation, then all of those objects are promoted into the older generation, and the younger generation
becomes empty.
The older generation may not be large enough to accommodate all of the survivors of the garbage collection
together with all of the younger generation; in particular, the younger generation may be larger than the older
generation. If so, then the uncollected objects that are promoted out of the younger generation will displace objects
that survived collection in the older generation, and those displaced objects will be unpromoted into the younger
generation.
All of this promoting and unpromoting of objects can be done in near-constant time [6,9].
This collector may split a cycle between the two generations, and might continue to split it after the cycle becomes
garbage, which would prevent the cycle from being collected. That can happen only if every collection unpromotes part
of the cycle, because any cycles that survive an unpromoting collection will be reclaimed by the following collection.
If too many consecutive collections involve unpromotion, then the next collection can temporarily increase the size of
the older generation by enough to avoid unpromotion.
This collector is called renewal-oldest-first because garbage collection is considered to renew the youth of any old
objects that survive collection in the oldest generation and are unpromoted into the youngest generation.
11. 194 W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203
4.4. 3-generational renewal-older-first (3ROF)
This section describes a 3-generational hybrid collector that attempts to combine the advantages of younger-first
and older-first generational collection by using younger-first collection for minor collections but using renewal-older-
first collection for major collections. This collector never performs a full garbage collection.
The youngest generation is a nursery of fixed size N0, whose surviving objects are promoted into an older
generation on every garbage collection. All objects are allocated within the nursery, and the garbage collector is
called whenever the nursery becomes full, so the garbage collector is called after every N0 bytes have been allocated.
The intermediate generation is of fixed size N1, and the size of the oldest generation is N2 = N − N0 − N1.
If the garbage collector is called when there is enough free space within the intermediate and oldest generations to
copy all of the survivors out of the nursery into the oldest generation, then the garbage collector will perform a minor
collection in which only the nursery is collected, and the survivors are promoted into the oldest generation until it
becomes full, and then into the intermediate generation. Otherwise the collector will perform a major collection of the
heap.
A major collection combines the nursery with the oldest generation, and then collects that combined generation as
if it were the oldest generation of a 2-generational renewal-oldest-first collector, and the intermediate generation were
the youngest.
4.5. 4-generational renewal-older-first (4ROF)
We obtain a 4-generational ROF collector by adding a fourth generation, for permanent storage, to a 3-generational
ROF collector. As with the 3-generational younger-first collector, we ignore the problem of identifying the permanent
storage, assume that the oldest generation contains all and only permanent storage, and assume that the permanent
generation is collected so rarely that the cost of collecting it is negligible. Under these assumptions, the 4ROF
algorithm will always perform at least as well as the 3ROF algorithm, but may not perform as well as the 2ROF
algorithm.
5. Theoretical analysis
This section analyzes the performance of each idealized generational collector for at least one of the following
equilibrium models:
(1) arbitrary equilibrium models
(2) a linear combination of two radioactive decay models
(3) a linear combination of two radioactive decay models with some permanent storage.
Each of these models includes the pure radioactive decay model as a special case.
5.1. 2YF at equilibrium
Let P be the probability density function for mortality in some fixed equilibrium model, and consider the
2-generational younger-first generational garbage collector described in Section 4.1. The fraction of storage within
the nursery that can be expected to survive a minor collection is
live(N0)
N0
.
At equilibrium there are about n live objects, and the number of minor collections that can be expected to occur
between two major collections is
k =
N − N0 − n
live(N0)
12. W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203 195
so the average interval between major collections is q = kN0. The average volume of live storage that is marked by
one major collection plus these k minor collections is p = n + k live(N0). The mark/cons ratio at equilibrium is
µ = p/q
=
n + k live(N0)
kN0
=
n
kN0
+
live(N0)
N0
=
n
N0
live(N0)
N − N0 − n
+
live(N0)
N0
=
live(N0)
N0
1 +
n
N − N0 − n
.
This calculation shows that, for any equilibrium model, the mark/cons ratio of 2-generational younger-first
generational garbage collection depends upon the total volume of live storage, the size of the heap, the size of the
nursery, and the fraction of storage within the nursery that survives a minor collection, but is independent of the
distribution of mortality among objects that have survived a minor collection.
5.2. 3YF at equilibrium
To analyze a 3YF collector at equilibrium, we must make some assumption about the distribution of mortality
among older objects. Perhaps the simplest assumption we can make is that the algorithm works so well that the third
generation is full of permanent heap storage. Under this assumption, a 3YF collector performs like a 2YF collector,
except the major collections do not mark the permanent objects in the third (oldest) generation. Let n3 be the volume
of permanent heap storage, and also the size of the third generation. The average volume of live storage that is marked
by one major collection plus k minor collections is p = n − n3 + k live(N0), and the mark/cons ratio is
µ = p/q =
live(N0)
N0
1 +
n − n3
N − N0 − n
.
This mark/cons ratio can be regarded as a lower bound on the mark/cons ratio (or upper bound on performance) at
equilibrium of a 3-generational younger-first collector whose oldest generation is of size n3.
5.3. 2ROF at equilibrium
This section extends Clinger’s earlier analysis of this collector [6].
5.3.1. 2ROF and general equilibrium models
Let P be the probability density function for mortality in some fixed equilibrium model, and consider the 2-
generational renewal-older-first generational garbage collector described in Section 4.3.
Let p = n −live(N0). If p ≤ N −2N0, then the younger generation is expected to be empty following a collection
at equilibrium, and p is the volume of storage within the older generation that is live when that generation is collected.
The average interval between collections is q = N − N0 − p. The mark/cons ratio is µ = p/q. This result holds for
any equilibrium model provided n − live(N0) ≤ N − 2N0.
5.3.2. 2ROF and RDM
We can do without the assumption that the younger generation is empty following each collection by assuming a
more specific model, such as the radioactive decay model. To ensure a stable equilibrium, however, we must assume
that N1 = N/m for some integer m > 1. In words, we assume the heap consists of m equally-sized regions, and each
region is collected once every m collections. At equilibrium, this symmetry implies that the expected volume of live
storage within the older generation is the same for every collection.
Let y be the volume of live storage within the older generation when it is collected. The interval between collections
is
q = N1 − y
13. 196 W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203
and the mark/cons ratio is
µ = y/q =
y
N1 − y
.
The value of y will be the same when the survivors of this collection are collected again m collections later, after
mq objects have been allocated. By then, the live storage in the older generation consists of the y survivors plus the q
objects that were allocated into it immediately after it was last collected, both reduced by attrition over time ranging
from mq to (m − 1)q:
y = yrmq
+
q
i=1
ri
r(m−1)q
= yrm(N1−y)
+
r(1 − r N1−y)
1 − r
r(m−1)(N1−y)
.
The value of y can be computed numerically by using this equation to generate successive approximations.
5.3.3. 2ROF and RDM2+permanent storage
We can extend the calculation above to a linear combination of two radioactive decay models with some permanent
storage. Let the parameters of that model be h1, h2, w, and n3. Assume h1 N0 h2 and h1 N − n. Assume
also that N1 = N/m for some integer m > 1, and that at equilibrium the permanent storage is divided evenly between
the m heap regions of size N1.
Let y be the volume of live storage within the older generation when it is collected. As above, the interval between
collections is q = N1 − y and the mark/cons ratio is
µ = y/q =
y
N1 − y
.
Since h1 N −n, the older generation contains a negligible volume of short-lived objects. The approximate value
of y is obtained by adding together the expected volumes of five classes of storage:
(1) permanent storage
(2) long-lived storage that survived the last collection in this heap region and survived again to the next,
(3) short-lived storage that survived the last collection in this heap region and survived again to the next,
(4) short-lived objects that were allocated in this heap region during the interval q immediately following the last
collection in this region and then survived until it was collected,
(5) long-lived objects that were allocated in this heap region during the interval q immediately following the last
collection in this region and then survived until it was collected.
The interval between collections of the same heap region is mq. The third volume above is negligible, and the fourth
volume is small, because h1 N − n < mq. Using Eq. (10) to calculate the last two volumes,
y ≈
n3
m
+ y −
n3
m
r
mq
2 + 0 + w
h1
log 2
r
(m−1)q
1 + (1 − w)
h2
log 2
(1 − r
q
2 )r
(m−1)q
2 .
The value of y can be computed numerically by using this equation to generate successive approximations.
5.4. 3ROF at equilibrium
The 3-generational ROF collector is more complex, so we will calculate its mark/cons ratio only for a linear
combination of two radioactive decay models. Let the parameters of that model be h1, h2, and w, and assume
h1 N0 h2. As in the previous calculations, we need enough symmetry to make our calculations tractable,
so we assume the size of the intermediate generation is some integral multiple of the size of the oldest generation;
hence we assume N2 = (N1 + N2)/m for some integer m > 1. The volume of objects in the nursery that survive a
minor collection is
live(N0) ≈ w
h1
log 2
+ (1 − w)
h2
log 2
(1 − r
N0
2 )
14. W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203 197
and the intermediate and oldest generations are filled in blocks of size live(N0).
Let y be the volume of live storage within the oldest generation when it is collected. The ratio of minor to major
collections is
k =
N2 − y − live(N0)
live(N0)
.
The interval between collections is q = kN0 and the mark/cons ratio is
µ =
klive(N0) + y
kN0
.
Furthermore
y = (y + live(N0))r
mq
2 +
k
i=1
r
i N0
2 live(N0) r
(m−1)q
2
= (y + live(N0))r
mkN0
2 +
r
N0
2 (1 − r
kN0
2 )
1 − r
N0
2
r
(m−1)q
2 live(N0).
The value of y can be computed numerically by using this equation to generate successive approximations.
5.5. 4ROF at equilibrium
As with the 3YF collector, we assume that the 4ROF collector works so well that, at equilibrium, the oldest
generation contains all and only permanent storage. With this assumption, we can ignore the permanent storage,
and compute the mark/cons ratio as in Section 5.4 above.
Note that Section 5.4 does not compute the mark/cons ratio for the 3ROF collector in a model with permanent
storage. If the volume of permanent storage were nonzero, then the idealized 4ROF collector would outperform the
3ROF collector for every inverse load factor L.
6. Visualization of results
As an example of these calculations, Fig. 8 shows how the mark/cons ratio varies with inverse load factor for a
non-generational and for four generational collectors, on a model where short-lived objects have a half-life of 100 kB,
long-lived objects have a half-life of 1000 MB, 70% of the allocated objects are short-lived, and there are also 100 MB
of permanent objects. The live storage implied by these parameters is a little over 500 MB.
For these particular parameters, and for the idealized collectors considered in this paper, Fig. 8 shows that, in
theory,
• when the total heap size is less than 700 MB (inverse load factor less than 1.4), a conventional 3-generational
younger-first collector is likely to perform best;
• when the total heap size is between 700 and 1500 MB (inverse load factor of 1.4–3.0), a 4-generational hybrid
renewal-older-first collector is likely to perform best;
• when the total heap size is greater than 1500 MB (inverse load factor greater than 3.0), a 2-generational renewal-
older-first collector is likely to perform best.
In theory, the best collector for a linear combination of two radioactive decay models with some permanent storage
depends upon five parameters: the four parameters of the model, and the inverse load factor. Figs. 9 through 11
summarize part of this five-dimensional space by displaying the collector whose theoretical mark/cons ratio is lowest,
at every combination of the following parameters:
• 0, 10, or 100 MB of permanent (static) storage;
• h1 = 10, 100, or 1000 kB for the half-life of short-lived objects;
• h2 = 10, 32, 100, 316, or 1000 MB for the half-life of long-lived objects;
• w ranging from 0 to 1 by increments of 0.05 (on the y axis) for the fraction of short-lived objects;
15. 198 W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203
Fig. 8. Example of calculated mark/cons ratios. Smaller mark/cons ratios are better. This graph shows one two-dimensional slice through a six-
dimensional space. Other slices are obtained by varying the four model parameters shown at the top of the graph.
• L ranging from 1.1 to 4.0 by increments of 0.1 (on the x axis) for the inverse load factor (ratio of heap size to live
storage).
The collectors shown in Figs. 9 through 11 are
• black: a non-generational collector;
• white: a conventional 3YF generational collector with a 4 MB nursery;
• dark gray: a 4ROF generational collector with a 4 MB nursery and an older (third) generation that is 1/2, 1/3,
1/5, 1/10, or 1/20 the size of the heap excluding the nursery and permanent generations, depending on which
fraction performs the best;
• light gray: a 2ROF generational collector whose older generation is 1/2, 1/3, 1/5, 1/10, or 1/20 the size of the
heap, depending on which fraction performs the best.
If the total volume of live storage is small, and the inverse load factor L is also small, then some or all of the
generational collectors may not have enough free storage to operate. This is the only circumstance in which the non-
generational collector performs better than the 2ROF collector. This also explains why the 2ROF collector sometimes
bests the 3YF at small inverse factors and also at large inverse load factors but not for in-between values.
Although most of this five-dimensional phase space consists of regions in which the 2ROF or 4ROF collectors are
dominant, the conventional 3YF collector does best in regions corresponding to model parameters that are typical of
many real programs. That explains why conventional generational collectors perform so well on many programs.
6.1. Example: the javac benchmark
Consider, for example, the plateau phase of javac, which was analyzed in Section 2.7. Its estimated model
parameters are roughly similar to the model in Fig. 10 with h1 = 10 kB, h2 = 32 MB, 10 MB of static storage,
with 87% of the allocated storage being short-lived (0.87 on the vertical axis). For that model, the conventional 3YF
collector has the best theoretical mark/cons ratio for inverse load factors ranging from 1.25 to 1.5 (on the horizontal
axis), which are often used in real systems.
For larger (more relaxed) inverse load factors, the 4ROF collector has better theoretical mark/cons ratios for that
model. At inverse load factors near 2, the 3YF collector’s theoretical mark/cons ratio for this model is more than 20%
higher than the 4ROF’s, and the 2YF collector’s mark/cons ratio is about twice the 4ROF’s.
16. W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203 199
Fig. 9. The most efficient collectors: no permanent storage. Half-lives are shown atop each phase diagram. The inverse load factor varies along the
x-axes (2 is typical), and the fraction of allocated objects that are short-lived varies along the y-axes (typically greater than 0.9, but often lower for
gc-intensive programs). In the black regions, a non-generational collector has the lowest theoretical mark/cons ratio among the idealized collectors
that are analyzed in this paper. In dark gray regions, the 4ROF collector has the lowest mark/cons ratio. In light gray regions, the 2ROF collector
has the lowest theoretical mark/cons ratio. In white regions, a conventional 3YF collector is the most efficient.
We have not implemented a 4ROF collector for Java, so we cannot say whether a 4ROF collector would outperform
a conventional generational collector on javac at inverse load factors near 2. What we can say is that Larceny’s 3ROF
collector routinely outperformed Larceny’s 3YF collector on a set of GC-intensive benchmarks written in Scheme
[10]. The 3ROF collector’s measured cost for marking a word was 2%–40% greater than the cost measured for
Larceny’s 2YF collector, and was less than 20% greater on 6 of the 13 benchmarks for which that cost was reported.
Furthermore, the increase in marking that comes of promoting intermediate-lived objects prematurely, which our
theoretical calculations have ignored but was significant for most of the Scheme benchmarks, would be less for a
17. 200 W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203
Fig. 10. The most efficient collectors: 10 MB permanent storage. In the black regions, a non-generational collector has the lowest theoretical mark/-
cons ratio among the idealized collectors that are analyzed in this paper. In dark gray regions, the 4ROF collector has the lowest mark/cons ratio.
In light gray regions, the 2ROF collector has the lowest theoretical mark/cons ratio. In white regions, a conventional 3YF collector is the most
efficient.
4ROF collector than for a 3YF collector [10]. In short, it is entirely plausible that an actual 4ROF collector could
outperform a conventional 2YF or 3YF collector for the javac benchmark at inverse load factors near 2.
6.2. Garbage-first collection
Garbage-first garbage collection can be regarded as a very general algorithm that subsumes all generational
algorithms [7]. The pure garbage-first algorithm divides the heap into arbitrary regions, each containing objects of
arbitrary age. Any subset of these regions can be collected without collecting the other regions. In the garbage-
first algorithm described by Detlefs et al., a concurrent marking thread estimates the volume of live objects within
18. W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203 201
Fig. 11. The most efficient collectors, with 100 MB of permanent storage. In the black regions, a non-generational collector has the lowest theoretical
mark/cons ratio among the idealized collectors that are analyzed in this paper. In dark gray regions, the 4ROF collector has the lowest mark/cons
ratio. In light gray regions, the 2ROF collector has the lowest theoretical mark/cons ratio. In white regions, a conventional 3YF collector is the most
efficient.
each region, and this information guides the selection of regions to be collected. The information provided by the
concurrent thread is always a little out of date, and the delay is large compared to the half-lives of short-lived
objects.
The marking thread’s delayed information about the reachability of old objects should allow the pure garbage-first
algorithm to do at least as well as the 2ROF algorithm, because the pure garbage-first algorithm will act much like
the 2ROF algorithm if that is an optimal strategy. The pure garbage-first algorithm cannot be expected to do as well
as the 3YF and 4ROF collectors, however, because those collectors exploit a phenomenon that happens faster than
the concurrent marking thread can mark. The typically large difference in half-life between short-lived and long-lived
objects, together with the typically high percentage of short-lived objects in the nursery, implies that collecting the
19. 202 W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203
nursery as part of every collection is likely to be profitable. Doing so yields a version of the so-called generational
garbage-first algorithm [7].
Figs. 9 through 11 show that the 3YF and 4ROF algorithms tend to outperform the 2ROF and non-generational
algorithms in the portions of phase space that correspond to the distributions of object lifetimes most often seen in
actual programs. This suggests that the generational garbage-first algorithm is likely to outperform the pure garbage-
first algorithm on most programs, which is consistent with the implementors’ experience [7].
7. Conclusions
The theoretical mark/cons ratio is not a perfect predictor of the amortized cost of garbage collection, nor
is amortized cost the only measure of performance, nor can the objects of all programs be modelled by linear
combinations of radioactive decay models.
On the other hand, these models are adequate to model the objects of many programs, and they can explain many
experimental results that would otherwise be quite puzzling. For example, they show how the relative efficiency of
younger-first versus older-first collection can be affected by increasing or decreasing the heap size (which changes
the inverse load factor). They explain why conventional younger-first collection works well for many programs, while
hybrid older-first collectors work better for many others. The models can also explain why generational garbage-first
collection tends to perform better than the pure garbage-first algorithm.
Our models also show that the relationship between relative efficiency and the numerical parameters of even a
simple model for object lifetimes can be quite complex.
One important result is that, for this class of models, there is almost always some generational collector that
performs considerably better than non-generational collection. In particular, non-generational collection is seldom
competitive at the boundary where the efficiencies of younger-first and older-first generational collection coincide.
Acknowledgements
This research was supported by NSF grants CCR-9629801 and CCR-0208722, and by a Collaborative Research
grant from Sun Microsystems. We are grateful to the editor and anonymous reviewers for comments that improved
this paper.
References
[1] Milton Abramowitz, Irene A. Stegun, Handbook of Mathematical Functions, in: National Bureau of Standards Applied Mathematics Series,
vol. 55, June 1964.
[2] Henry G. Baker, Infant mortality and generational garbage collection, ACM SIGPLAN Notices 28 (4) (1993) 55–57. ACM Press.
[3] David A. Barrett, Benjamin G. Zorn, Using lifetime predictors to improve memory allocation performance, in: ACM SIGPLAN Conference
on Programming Language Design and Implementation, ACM Press, 1993, pp. 187–196.
[4] Steve M. Blackburn, Richard Jones, Kathryn S. McKinley, J. Eliot B. Moss, Beltway: Getting around garbage collection gridlock, in: ACM
SIGPLAN Conference on Programming Language Design and Implementation, PLDI, ACM Press, 17–19 June 2002, pp. 153–164.
[5] William D. Clinger, Lars T. Hansen, Lambda, the ultimate label, or a simple optimizing compiler for Scheme, in: Proceedings of the 1994
ACM Conference on Lisp and Functional Programming, in: ACM LISP Pointers VIII(3), ACM Press, July–September 1994, pp. 128–139.
[6] William D. Clinger, Lars T. Hansen, Generational garbage collection and the radioactive decay model, in: Proceedings of the 1997 ACM
SIGPLAN Conference on Programming Language Design and Implementation, PLDI, ACM SIGPLAN Notices 32 (5) (1997) 97–108. ACM
Press.
[7] David Detlefs, Christine Flood, Steve Heller, Tony Printezis, Garbage-first garbage collection, in: Proceedings of the Fourth International
Symposium on Memory Management, 2004, pp. 37–48.
[8] Richard P. Gabriel, Performance and Evaluation of Lisp Systems, The MIT Press, 1985.
[9] Lars T. Hansen, Older-first garbage collection in practice, Ph.D. Thesis, Northeastern University, November 2000. Available at
http://www.ccs.neu.edu/home/will/GC/lth-thesis/index.html.
[10] Lars T. Hansen, William D. Clinger, An experimental study of renewal-older-first garbage collection, in: International Conference on
Functional Programming, ICFP, ACM Press, 2002, pp. 247–258.
[11] David R. Hanson, Storage Management for an Implementation of SNOBOL4, Software—Practice and Experience 7 (1977) 179–192. IEEE.
[12] Barry Hayes, Using key object opportunism to collect old objects, in: ACM SIGPLAN Conference on Object Oriented Programming Systems,
Languages, and Applications, OOPSLA’91, ACM Press, October 1991, pp. 33–46.
[13] Richard L. Hudson, J. Eliot B. Moss, Incremental garbage collection for mature objects, in: Yves Bekkers, Jacques Cohen (Eds.), Proceedings
of International Workshop on Memory Management, in: Lecture Notes in Computer Science, vol. 637, Springer-Verlag, September 1992,
pp. 388–403.
20. W.D. Clinger, F.V. Rojas / Science of Computer Programming 62 (2006) 184–203 203
[14] Richard Jones, Rafael Lins, Garbage Collection: Algorithms for Automatic Dynamic Memory Management, John Wiley & Sons, 1996.
[15] The Larceny home page is at http://www.larceny.org/.
[16] Henry Lieberman, Carl Hewitt, A real-time garbage collector based on the lifetimes of objects, Communications of the ACM 26 (6) (1983)
419–429. ACM Press.
[17] David A. Moon, Garbage collection in a large lisp system, in: ACM Conference on Lisp and Functional Programming, ACM Press, 1984,
pp. 235–246.
[18] Tony Printezis, Hot-swapping between a mark and sweep and a mark and compact garbage collector in a generational environment, in: Java
Virtual Machine Research and Technology Symposium, April 2001, pp. 171–184.
[19] Patrick M. Sansom, Simon L. Peyton Jones, Generational garbage collection for Haskell, in: Conference on Functional Programming
Languages and Computer Architecture, ACM Press, 1993, pp. 106–116.
[20] Jacob Seligmann, Steffen Grarup, Incremental mature garbage collection using the train algorithm, in: Proceedings of 1995 European
Conference on Object-Oriented Programming, in: Lecture Notes in Computer Science, Springer-Verlag, August 1995, pp. 235–252.
[21] Sunil Soman, Chandra Krintz, David Bacon, Dynamic selection of application-specific garbage collectors, in: Proceedings of the Fourth
International Symposium on Memory Management, October 2004, pp. 49–60.
[22] Standard Performance and Evaluation Corporation (SPECjvm98 benchmark). http://www.spec.org.
[23] Standard Performance and Evaluation Corporation (SPECjbb benchmark). http://www.spec.org/jbb2000.
[24] Darko Stefanovi´c, J. Eliot B. Moss, Characterisation of object behaviour in Standard ML of New Jersey, in: ACM Conference on Lisp and
Functional Programming, ACM Press, 1994, pp. 43–54.
[25] Darko Stefanovi´c, Properties of age-based automatic memory reclamation algorithms, Ph.D. Thesis, University of Massachusetts, Amherst,
MA, February 1999.
[26] Darko Stefanovi´c, Kathryn S. McKinley, J. Eliot B. Moss, Age-based garbage collection, in: ACM SIGPLAN Conference on Object Oriented
Programming Systems, Languages, and Applications, OOPSLA ’99, ACM SIGPLAN Notices 34 (10) (1999) 370–381.
[27] Darko Stefanovic, Kathryn S. McKinley, J. Eliot B. Moss, On models for object lifetime distributions, in: International Symposium on Memory
Management, October 2000, pp. 137–142.
[28] Darko Stefanovi´c, Matthew Hertz, Steve M. Blackburn, Kathryn S. McKinley, J. Eliot B. Moss, Older-first garbage collection in practice:
Evaluation in a java virtual machine, in: Workshop on Memory System Performance, Berlin, Germany, June 2002, pp. 25–36.
[29] George B. Thomas Jr., Calculus and Analytic Geometry, Addison-Wesley, 1968.
[30] David Ungar, Generation scavenging: A non-disruptive high performance storage reclamation algorithm, in: ACM SIGSOFT-SIGPLAN
Practical Programming Environments Conference, ACM Press, Pittsburgh, PA, April 1984, pp. 157–167.
[31] Paul R. Wilson, Uniprocessor garbage collection techniques, in: Yves Bekkers, Jacques Cohen (Eds.), Proceedings of International Workshop
on Memory Management, 1992, in: Springer-Verlag Lecture Notes in Computer Science, vol. 637, 1992, Available via anonymous ftp from
cs.utexas.edu, in pub/garbage.