How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
"Volunteer Computing with BOINC Client-Server side" por Diamantino Cruz e Ricardo Madeira
1. 1
Volunteer Computing with BOINC
Client-Server side
Diamantino Cruz, Ricardo M. Madeira, and Rui Lopes
Abstract— Around 300 million personal computers are connected to the internet, the majority are idle or under-used for most of
the time, having their processing and storage potential go to waste, however that wasted potential is starting to be taken
advantage by projects using volunteer computing (around 1% at the moment [12, 13]), volunteer computing uses computational
resources that would otherwise be unused, to solve computationally intensive projects [10]. This paper is going to analyze the
BOINC (Berkeley Open Infrastructure for Network Computing) service on a client-server perspective.
Index Terms— Distributed Systems, Client/server, BOINC, Volunteer Computing
—————————— ——————————
1 INTRODUCTION
1.1 Death, Taxes, and BOINC
I T is said that life holds but two certainties, death and
taxes. Nevertheless with the technologic expansion of
the last few years it’s reasonably safe to assume that the
puter to help physicists develop and exploit particle acce-
lerators, such as CERN's Large Hadron Collider") to
Spinhenge@home ("where you will actively support the
many paths leading to the future have some convergence research of nano-magnetic molecules. In the future these
in some points, being a lot of them in different parallel molecules will be used in localized tumor chemotherapy
and distributed systems. According to Tannenbaum [1] and to develop tiny memory-modules.") and its easy to
"A distributed system is a collection of independent computers deduce why cant escape BOINC (as a platform of investi-
that appears to its users as a single coherent system". In this gation for the near future).
work we shall make a quick introduction to some of those
systems, address generally Client-Server system and do a 1.2 Client-Server
focused approach on volunteer Computing. And what is Client-server describes the relationship between two
volunteer computing? And why can’t we escape from it? computer programs in which one program, the client
Volunteer computing uses Internet-connected comput- program, makes a service request to another, the server
ers, volunteered by their owners, as a source of compu- program. Standard networked functions such as email
ting power and storage. This paper discusses the client- exchange, web access and database access, are based on
server side of BOINC (Berkeley Open Infrastructure for the client-server model. For example, a web browser is a
Network Computing), a middleware system for volunteer client program at the user computer that may access in-
computing. Originally developed to support the formation at any web server in the world. To check your
SETI@home project before it became useful as a platform bank account from your computer, a web browser client
for other distributed applications in areas as diverse as program in your computer forwards your request to a
mathematics, medicine, molecular biology, climatology, web server program at the bank. That program may in
and astrophysics. In essence BOINC is software that can turn forward the request to its own database client pro-
use the unused CPU and GPU cycles on a computer to do gram that sends a request to a database server at another
scientific computing—what one individual doesn't use of bank computer to retrieve your account balance. The
his/her computer, BOINC uses, it consists of a server balance is returned to the bank database client, which in
system and client software that communicate with each turn serves it back to the web browser client in your per-
other to distribute, process, and return work units. Just to sonal computer, which displays the information for you.
glimpse at the sheer amount of potential of this project,
using a single computer costing about $4,000, a BOINC The client-server model has become one of the central
project can dispatch about 8.8 million tasks per day. If ideas of network computing. Most business applications
each client is issued one task per day and each task uses being written today use the client-server model. So do the
12 CPU hours on a 1 GFLOPS computer, the project can Internet's main application protocols, such as HTTP,
support 8.8 million clients and obtain 4.4 PetaFLOPS of SMTP, Telnet, DNS, etc. In marketing, the term has been
computing power. With two additional server computers, used to distinguish distributed computing by smaller
a project can dispatch about 23.6 million tasks per day [2]. dispersed computers from the "monolithic" centralized
Now think of this power redirected to projects ranging computing of mainframe computers. But this distinction
from LHC@home ("a volunteer computing program has largely disappeared as mainframes and their applica-
which enables you to contribute idle time on your com- tions have also turned to the client-server model and
2. 2
become part of network computing.
Each instance of the client software can send data re- The early efforts in Grid computing started as projects to
quests to one or more connected servers. In turn, the link US supercomputing sites, but now it has grown far
servers can accept these requests, process them, and re- beyond its original intent. In fact, there are many applica-
turn the requested information to the client. Although tions that can benefit from the Grid infrastructure, includ-
this concept can be applied for a variety of reasons to ing collaborative engineering, data exploration, high-
many different kinds of applications, the architecture throughput computing, and of course distributed super-
remains fundamentally the same. computing.
The most basic type of client-server architecture employs The term ‘Grid’ is chosen to suggest the idea of a ‘power
only two types of hosts: clients and servers. This type of grid’: namely that application scientists can plug into the
architecture is sometimes referred to as two-tier. It allows computing infrastructure like plugging into an electrical
devices to share files and resources. The two tier architec- power grid. It is important to note, however, that the term
ture means that the client acts as one tier and application ‘Grid’ is sometimes used synonymously with a net-
in combination with server acts as another tier. worked, high performance-computing infrastructure.
Obviously this aspect is an important enabling technolo-
These days, clients are most often web browsers, al- gy for future applications, but in reality it is only part of a
though that has not always been the case. Servers typical- much larger scenario that also includes information han-
ly include web servers, database servers and mail servers. dling and support for knowledge within the scientific
Online gaming is usually client-server too. In the specific process. It is this broader view of the infrastructure that is
case of MMORPG, the servers are typically operated by now being referred to as the Semantic Grid. The Semantic
the company selling the game; for other games one of the Grid is characterized by an open system, with a high
players will act as the host by setting his game in server degree of automation, which supports flexible collabora-
mode. tion and computation on a global scale. [4]
The interaction between client and server is often de- 1.4 Peer-to-Peer
scribed using sequence diagrams. Sequence diagrams are In the literature, the term Peer-to-Peer (P2P) is used to
standardized in the Unified Modeling Language. describe a wide variety of software applications. The
applications that have been classified as P2P come from a
When both the client- and server-software are running on diverse range of domains such as file-sharing, distributed
the same computer, this is called a single seat setup and computing, instant messaging and content distribution. In
also even they break up the value of the current em- the literature, there is a lack of agreement on the set of
ployee. criteria that can be used to call an application P2P. The
computers that are connected to the Internet but have
Specific types of clients include web browsers, email variable connectivity and temporary network addresses
clients and online chat clients. are often called computers on the edge of the Internet.
The existing definitions for the term P2P can be broadly
Specific types of servers include web servers, ftp servers, divided into two groups, depending on whether they
application servers, database servers, name servers, mail emphasize the ability of P2P to utilize computers at the
servers, file servers, print servers, and terminal servers. edge of the Internet, as the key defining characteristics of
Most web services are also types of servers. [3] P2P. The definitions in the first group (e.g., [5]) do em-
phasize the ability of P2P to utilize computers at the edge
1.3 Grid of the Internet, as the key defining characteristics of P2P,
The last decade has seen a considerable increase in com- whereas those in the second category (e.g., [6]) do not.
modity computer and network performance, mainly as a
result of faster hardware and more sophisticated soft- The potential of P2P applications lie in their ability to
ware. Nevertheless, there are still problems, in the fields utilize the computers at the edge of the Internet. However
of science, engineering and business, which cannot be making this the defining characteristics for P2P applica-
dealt effectively with the current generation of super- tions excludes P2P applications used in Intranets, where
computers. In fact, due to their size and complexity, these P2P applications are extremely useful for tasks such as
problems are often numerically and/or data intensive collaboration. P2P applications deployed in an Intranet
and require a variety of heterogeneous resources that are do not require the capability to utilize the computers at
not available from a single machine. A number of teams the edge of the Internet. While the ability to utilize com-
have conducted studies on the cooperative use of geo- puters at the edge of the Internet is a benefit of P2P appli-
graphically distributed resources conceived as a single cations we do not consider it a defining characteristic in
powerful virtual computer. This new approach is known this thesis.
by several names, such as, metacomputing, seamless
scalable computing, global computing, and more recently A useful and accurate definition for P2P is given by
Grid computing. Schollmeier et al. ([6]) in 2001, at the IEEE P2P conference.
3. 3
The definition states that: provider's infrastructure. Developers create applications
on the provider's platform over the Internet. PaaS provid-
A distributed network architecture may be called ers may use APIs, website portals or gateway software
Peer-to-Peer (P-to-P, P2P,...) network, if the participants installed on the customer's computer. Force.com, (an
share a part of their own hardware resources (processing outgrowth of Salesforce.com) and GoogleApps are exam-
power, storage capacity, network link capacity, prin- ples of PaaS. Developers need to know that currently,
ters,...). These shared resources are necessary to provide there are not standards for interoperability or data porta-
the Service and content offered by the network (e.g., file bility in the cloud. Some providers will not allow soft-
sharing or shared workspaces for collaboration). They are ware created by their customers to be moved off the pro-
accessible by other peers directly without passing inter- vider's platform.
mediary entities. The participants of such a network are
thus resource (Service and content) providers as well as In the software-as-a-service cloud model, the vendor
resource (Service and content) requester (Servant- supplies the hardware infrastructure, the software prod-
concept). uct and interacts with the user through a front-end portal.
SaaS is a very broad market. Services can be anything
1.5 Cloud Computing from Web-based email to inventory control and database
Cloud computing is a general term for anything that in- processing. Because the service provider hosts both the
volves delivering hosted services over the Internet. These application and the data, the end user is free to use the
services are broadly divided into three categories: Infra- service from anywhere. [7]
structure-as-a-Service (IaaS), Platform-as-a-Service (PaaS)
and Software-as-a-Service (SaaS). The name cloud compu- Or like Brian Hayes [8] simply puts “Data and programs
ting was inspired by the cloud symbol that's often used to are being swept up from desktop PCs and corporate serv-
represent the Internet in flow charts and diagrams. er rooms and installed in “the compute cloud.” Whether
it’s called cloud computing or on-demand computing,
A cloud service has three distinct characteristics that dif- software as a service, or the Internet as platform, the
ferentiate it from traditional hosting. It is sold on de- common element is a shift in the geography of computa-
mand, typically by the minute or the hour; it is elastic -- a tion. When you create a spreadsheet with the Google
user can have as much or as little of a service as they want Docs service, major components of the software resides
at any given time; and the service is fully managed by the on unseen computers, whereabouts unknown possibly
provider (the consumer needs nothing but a personal scattered across continents.”
computer and Internet access). Significant innovations in
virtualization and distributed computing, as well as im-
2 Volunteer Computing
proved access to high-speed Internet and a weak econo-
my, have accelerated interest in cloud computing. 2.1 Don’t ask what the internet can do for you; ask
what u can do for the world.
A cloud can be private or public. A public cloud sells
Volunteer computing is an arrangement in which people
services to anyone on the Internet. (Currently, Amazon
(volunteers) provide computing resources to projects,
Web Services is the largest public cloud provider.) A pri-
which use the resources to do distributed computing
vate cloud is a proprietary network or a data center that
and/or storage. Volunteers are typically members of the
supplies hosted services to a limited number of people.
general public who own Internet-connected PCs. Organi-
When a service provider uses public cloud resources to
zations such as schools and businesses may also volun-
create their private cloud, the result is called a virtual
teer the use of their computers. Projects are typically aca-
private cloud. Private or public, the goal of cloud compu-
demic (university-based) and do scientific research. But
ting is to provide easy, scalable access to computing re-
there are exceptions; for example, GIMPS and distri-
sources and IT services.
buted.net (two major projects) are not academic. Several
aspects of the project/volunteer relationship are worth
Infrastructure-as-a-Service like Amazon Web Services
noting:
provides virtual server instances with unique IP ad-
Volunteers are effectively anonymous; although they may
dresses and blocks of storage on demand. Customers use
be required to register and supply email address or other
the provider's application program interface (API) to
information, they are not linked to a real-world identity.
start, stop, access and configure their virtual servers and
Because of their anonymity, volunteers are not accounta-
storage. In the enterprise, cloud computing allows a com-
ble to projects. If a volunteer misbehaves in some way (for
pany to pay for only as much capacity as is needed, and
example, by intentionally returning incorrect computa-
bring more online as soon as required. Because this pay-
tional results) the project cannot prosecute or discipline
for-what-you-use model resembles the way electricity,
the volunteer.
fuel and water are consumed; it's sometimes referred to
Volunteers must trust projects in several ways:
as utility computing.
Platform-as-a-service in the cloud is defined as a set of
software and product development tools hosted on the
4. 4
The volunteer trusts the project to provide This is different from volunteer computing. ʹDesktop gridʹ
applications that don't damage their computer or invade computing ‐ which uses desktop PCs within an organiza‐
their privacy.
tion ‐ is superficially similar to volunteer computing, but
The volunteer trusts that the project is truthful about because it has accountability and lacks anonymity, it is
what work is being done by its applications, and how the significantly different.
resulting intellectual property will be used.
If your definition of ʹGrid computingʹ encompasses all
The volunteer trusts the project to follow proper distributed computing (which is silly ‐ thereʹs already a
security practices, so that hackers cannot use the project perfectly good term for that) then volunteer computing is
as a vehicle for malicious activities.
a type of Grid computing.
The first volunteer computing project was GIMPS (Great
Internet Mersenne Prime Search), which started in 1995. 2.4 Is it the same as “peer to peer” computing?
Other early projects include distributed.net, SETI@home,
and Folding@home. Today there are over 50 active No. ʹPeer‐to‐peer computingʹ describes systems such as
projects. Napster, Gnutella, and Freenet, in which files and other
data are exchanged between ʹpeersʹ (i.e. PCs) without the
involvement of a central server. This differs in several
2.2 Why is it important? ways from volunteer computing:
Because of the huge number (> 1 billion) of PCs in the
world, volunteer computing supplies more computing Volunteer computing uses central servers. There is typi‐
power to science than does any other type of computing. cally no peer‐to‐peer communication.
This computing power enables scientific research that
could not be done otherwise. This advantage will increase Peer‐to‐peer computing benefits the participants (i.e. the
over time, because the laws of economics dictate that people sharing files). Thereʹs no notion of a ʹprojectʹ to
consumer products such as PCs and game consoles will which resources are donated.
advance faster than more specialized products, and that
there will be more of them. Peer‐to‐peer computing usually involves storage and
retrieval, not computing.
Volunteer computing power can't be bought; it must be
earned. A research project that has limited funding but
large public appeal can get huge computing power. In 3.1 HOW THE MAGIC IS DONE SERVER-SIDE
contrast, traditional supercomputers are extremely ex-
pensive, and are available only for applications that can
afford them (for example, nuclear weapon design and 3.2 SERVER DESCRIPTION
espionage). BOINC‐based projects are autonomous. Each project
operates a server consisting of several components:
Volunteer computing encourages public interest in
science, and provides the public with voice in determin-
Web interfaces for account and team management, mes‐
ing the directions of scientific research.
sage boards, and other features:
2.3 How does it compare to grid computing?
A task server that creates tasks dispatches them to clients,
It depends on how you define ʹGrid computingʹ. The term and processes returned tasks.
generally refers to the sharing of computing resources
within and between organizations, with the following A data server that downloads input files and executables,
properties: and that uploads output files.
Each organization can act as either producer or consumer
These components share various data stored on disk,
of resources (hence the analogy with the electrical power
including relational databases and upload/download files
grid, in which electric companies can buy and sell power
(see Figure 1).
to/from other companies, according to fluctuating de‐
mand).
The organizations are mutually accountable. If one organ‐
ization misbehaves, the others can respond by suing them
or refusing to share resources with them.
5. 5
should take to complete. The reply includes a list of
instances and their corresponding jobs. Handling a re‐
quest involves a number of database operations: read‐
ing and updating records for the user account and
team, the host, and the various jobs and instances. The
scheduler is implemented as a Fast CGI program run
from an Apache web server [3], and many instances
can run concurrently.
Figure 1: A BOINC server consists of several components, • The feeder streamlines the scheduler’s database
sharing several forms of storage. access. It maintains a shared‐memory segment con‐
taining 1) static database tables such as applications,
Each client periodically communicates with the task serv- platforms, and application versions, and 2) a fixed‐size
er to report completed work and to get new work. In cache of unsent instance/job pairs. The scheduler finds
addition, the server performs a number of background instances that can be sent to a particular client by
functions, such as retrying and garbage collecting tasks. scanning this memory segment. A semaphore syn‐
The load on a task server depends on the number of vo- chronizes access to the shared‐memory segment. To
lunteer hosts and their rates of communication. The num-
minimize contention for this semaphore, the scheduler
ber of volunteer hosts in current projects ranges from tens
marks a cache entry as “busy” (and releases the sema‐
to hundreds of thousands, and in the future may reach
tens or hundreds of millions. If servers become over- phore) while it reads the instance from the database to
loaded, requests fail and hosts become idle. Thus, server verify that it is still unsent.
performance can limit the computing capacity available to • The transitioner examines jobs for which a state
a volunteer computing project. change has occurred (e.g., a completed instance has
been reported). Depending on the situation, it may
3.3 BOINC TASK SERVER ARCHITECTURE generates new instances, flag the job as having a per‐
manent error, or trigger validation or assimilation of
the job.
3.3.1 TASK SERVER COMPONENTS • The validator compares the instances of a job and
selects a canonical instance representing the correct
BOINC implements a task server using a number of sepa- output. It determines the credit granted to users and
rate programs, which share a common MySQL database hosts that return the correct output, and updates those
(see Figure 2). database records.
• The assimilator handles job that are “com‐
pleted”: i.e., that have a canonical instance or for
which a permanent error has occurred. Handling a
successfully completed job might involve writing out‐
puts to an application database or archiving the out‐
put files.
• The file deleter deletes input and output files
that are no longer needed.
• The database purger removes jobs and instance
database entries that are no longer needed, first writ‐
Figure 2: The components of a BOINC task server ing them to XML log files. This bounds the size of
these tables, so that they act as a working set rather
The work generator creates new jobs and their input files. than an archive. This allows database management
For example, the SETI@home work generator reads digi‐ operations (such as backups and schema changes) to
tal tapes containing data from a radio telescope, divides be done quickly.
this data into files, and creates jobs in the BOINC data‐
base. The work generator sleeps if the number of unsent The programs communicate through the BOINC data‐
instances exceeds a threshold, limiting the amount of disk base. For example, when the work generator creates a job
storage needed for input files. it sets a flag in the job’s database record indicating that
• The scheduler handles requests from BOINC the transitioner should examine it. Most of the programs
clients. Each request includes a description of the host, repeatedly scan the database, enumerating records that
a list of completed instances, and a request for addi‐ have the relevant flag set, handling these records, and
tional work, expressed in terms of the time the work clearing the flags in the database. Database indices on the
6. 6
flag fields make these enumerations efficient. When an accountable to the project (indeed, their identity is un‐
enumeration returns nothing, the program sleeps for a known), and the volunteered hosts are unreliable and
short period. insecure.
Thus, when a task is sent to a host, several types of er‐
Thus, a BOINC task server consists of many processes, rors are possible. Incorrect output may result from a
mostly asynchronous with respect to client requests, that hardware malfunction (especially in hosts that are “over‐
communicate through a database. This approach has the clocked”), an incorrect modification to the application, or
disadvantage of imposing a high load on the database a intentional malicious attack by the volunteer. The appli‐
server. One can imagine an alternative design in which cation may crash. There may be no response to the
almost all functions are done by the scheduler, synchron‐ project, e.g. because the host dies or stops running
ously with client requests. This would have lower data‐ BOINC. An unrecoverable error may occur while down‐
base overhead. However, the current design has several loading or uploading files. The result may be correct but
important advantages: reported too late to be of use.
• It is resilient with respect to failures. For exam‐ 3.4.1.1) Persistent redundant computing
ple, only the assimilator uses the application database, Because the above problems occur with nonnegligible
and if it unavailable only the assimilator is blocked. frequency, volunteer computing requires mechanisms for
The other components continue to execute, and the validation (to ensure that outputs are correct) and retry
BOINC database (i.e., the job records tagged as ready (to ensure that tasks eventually get done). BOINC pro‐
to assimilate) acts as a queue for the assimilator when vides a mechanism called persistent redundant compu‐
it runs again. ting that accomplishes both goals.
• It is resilient with respect to performance. If This mechanism involves performing each task inde‐
backend components (e.g. the validator or assimilator) pendently on two or more computers, comparing the
perform poorly and fall behind, the client‐visible outputs, looking for a “quorum” of equivalent outputs,
components (the feeder and scheduler) are unaffected. and generating new instances as needed to reach a quo‐
rum.
THE VARIOUS COMPONENTS CAN EASILY BE DISTRIBUTED In BOINC terminology, a job is a computational task,
AND/OR REPLICATED (SEE BELOW). specified by a set of input files and an application pro‐
gram. Each job J has several scheduling‐related parame‐
ters:
3.4 SCALABILTY • DelayBound(J): a time interval that determines
the deadline for instances of J.
• NInstances(J): the number of instances of J to be
3.3.1 COMPONENT DISTRIBUTION
created initially.
The programs making up a BOINC task server may run
• MinQuorum(J): the minimum size of a quorum.
on different computers. In particular, the BOINC database
• Estimates of the amount of computing, disk
may run on a separate computer (MySQL allows remote
access). Many of the programs require access to shared space, and memory required by J.
files (configuration files, log files, upload/download data • Upper bounds on the number of erroneous, cor‐
files) so generally the server computers are on the same rect, and total instances. These are used to detect jobs
LAN and use a network file system such as NFS. that consistently crash the application, that return in‐
The server programs may also be replicated, either on a consistent results, or that cause their results to not be
multiprocessor host or on different hosts. Interference reported.
between replicas is avoided by having each replica work
on a different subset of database items. The space of data‐ A job instance (or just “instance”) refers to a job and
base identifiers is partitioned: if there are n replicas, repli‐ specifies a set of output files. An instance is dispatched to
ca i handles only items (e.g., jobs) for which (ID mod n) = at most one host. An instance is reported when it listed in
i. a scheduler request message. If enough instances of a job
have been reported and are equivalent, they are marked
3.4. FAILURE PROTECTION as valid and one of them is selected as the job’s canonical
instance.
BOINC implements persistent redundant computing as
3.4.1. THE BOINC COMPUTING MODEL follows:
Grid computing involves resource sharing between 1. When a job J is created, NInstances(J) instances
organizations that are mutually accountable. In contrast, for J are created and marked as unsent.
participants in a volunteer computing project are not
7. 7
2. When a client requests work, the task server se‐ graphics. It communicates with the core client using
lects one or more unsent instances and dispatches remote procedure calls over TCP.
them to the host. Two instances of the same job are • A BOINC screensaver (if enabled by the volun‐
never sent to the same participant, making it unlikely teer) runs when the computer is idle. It doesn’t gener‐
that a maliciously incorrect result will be accepted a ate screensaver graphics itself, but rather communi‐
valid. The instance’s deadline is set to the current time cates with the core client, requesting that one of the
plus DelayBound(J). running applications display full‐screen graphics.
3. If an instance’s deadline passes before it is re‐
ported, the server marks it as “timed out” and creates 4.1 OVERALL ARCHITECTURE
a new instance of J. It also checks whether the limit on
the number of error or total instance of J has been
reached, and if so marks J as having a permanent er‐ 4.1.1 SHARED-MEMORY MESSAGE-PASSING
ror.
4. When an instance I is reported, and its job al‐ The runtime system requires bidirectional communi‐
ready has a canonical instance I*, the server invokes cation between the core client and applications. How
an application‐specific function that compares I and I*, should this work? Operating systems offer a variety of
and marks I as valid if they are equivalent. If there is mechanisms for inter‐process communication, process
no canonical instance yet, and the number of success‐ control, and synchronization. For example, POSIXcom‐
ful instances is at least MinQuorum(J), the server in‐ pliant systems have signals, semaphores, and pipes. Win‐
vokes an application‐specific function which, if it finds dows has mutexes, messages, and various system calls for
a quorum of equivalent instances, selects one of them process and thread control. We avoided platform‐specific
as the canonical instance I*, and marks the instances as mechanisms because of the resulting code complexity.
valid if they are equivalent to I*. Volunteers are
granted credit for valid instances.[9] Instead, the BOINC runtime system is based on
shared‐memory message passing. For each application it
executes, the core client creates a shared memory segment
4 HOW THE MAGIC IS DONE CLIENT-SIDE containing a data structure with a number of unidirec‐
tional message channels. Each channel consists of a fixed‐
size buffer and a ‘present’ flag. Message queuing, if
needed, is provided at a higher software level. All mes‐
sages are XML, minimizing versioning problems.
The BOINC runtime system uses eight message chan‐
nels, four in each direction. For example, one channel
carries task control messages (telling the application to
suspend, resume, quit or abort) while another conveys
graphics‐related messages (telling the application to
Figure 3: the BOINC client software includes a ‘core
create or destroy graphics windows).
client’ that executes applications and interacts with them
through a runtime system.
• Applications are typically long‐running scientific
programs. They may consist of a single process or a
dynamic set of multiple processes.
• The BOINC core client program communicates
with schedulers, uploads and downloads files, and ex‐
ecutes and coordinates applications.
• The BOINC Manager provides a graphical inter‐
face allowing users to view and control computation
status (see Figure 3). For each task, it shows the frac‐
tion done and the estimated time to completion, and Figure 4: The core client communicates with applications
lets the user open a window showing the application’s by shared‐memory message passing.
8. 8
4.1.2 SIMPLE AND COMPOUND APPLICATIONS effect) and tries again.
BOINC supports both simple and compound applica‐
tions. Simple applications consist of a single program, 4.2.2. RELIABLE TERMINATION
and their scientific code, graphics code, and the BOINC The core client uses standard functions (such as wait‐
runtime library reside and execute in a single address pid() on Unix) to find when applications have finished
space. Compound applications consist of several pro‐ and whether they exited normally. On some versions of
grams – typically a coordinator that executes one or more Windows, when a program is killed externally by the
worker programs. The coordinator, for example, might user, it is indistinguishable (from the core client’s view‐
run pre‐processing, main, and post‐processing programs
point) from a call to exit(0). To solve this problem, the
in sequence, or it might launch one or more programs
BOINC API finalization routine writes a ‘finished file’. If
(e.g. coupled climate models) that run concurrently and
the core client detects that a program has exited unexpec‐
communicate via shared memory. It might run a graphics
tedly but no ‘finished file’ is found, it restarts the applica‐
program concurrently with a scientific program.
tion.
The BOINC runtime library is linked with each pro‐
gram of a compound application. The BOINC API lets 5 Checkpointing
each program specify which message channels it will
handle, and whether the message handling should be BOINC expects applications to do checkpoint/restart,
done by the runtime system or by the application. In the so that they can quit and restart repeatedly and still finish
example shown in Figure 5, the coordinator handles their intended computation. BOINC user preferences
process control messages, while the graphics program include a minimum interval between periods of disk ac‐
handles graphics messages. tivity. This is useful for laptops whose disks spin down to
conserve power. The BOINC runtime system must allow
applications to checkpoint frequently (to minimize
wasted CPU time) but must respect the minimum disk
interval.
BOINC applications typically have particular points in
their execution where the state of the computation can be
represented compactly (e.g. by the values of outer loop
indices). These “checkpointable states” may be separated
by milliseconds or by minutes. The BOINC API provides
a function
bool boinc_time_to_checkpoint();
that should be called whenever the application is in a
Figure 5: A compound application consists of several
checkpointable state. It can be called frequently (hun‐
processes, each of which handles particular message
dreds or thousands of times a second). It returns true if
channels
the minimum disk interval has elapsed since the last
checkpoint. If so, the application should write a check‐
4.2. FAILURE PROTECTION point file and call
boinc_checkpoint_completed();
These functions automatically make checkpointing a criti‐
4.2.1. ORPHANED AND DUPLICATE PROCESSES
cal section with respect to quit messages. They also in‐
Sometimes the core client exits unexpectedly (for ex‐ form the core client when the application has check‐
ample, because it crashes). In these situations, a mechan‐ pointed, so that it can correctly account total CPU time,
ism is needed that will cause applications to eventually and so that it can avoid doing preempt‐by‐quit for appli‐
exit. BOINC uses heartbeat messages, which are sent cations that haven’t checkpointed recently.
once per second from the core client to each application. If 5.1 Output file integrity
an application doesn’t get a heartbeat message for 30 Many BOINC applications write incrementally to output
seconds, it exits. files. If an application is preempted by quitting at a time
Each application executes in a directory containing its when has extended an output file since the last check‐
input and output files. To prevent duplicate copies of an point, the same output will be written when the task runs
application from executing in the same directory, the again, producing an erroneous output file. There are sev‐
runtime system uses a lock file. The API initialization
eral ways of dealing with this. The application can copy
routine tries to acquire the lock file; if it can’t, it waits for output files during checkpoint; this is potentially ineffi‐
30 seconds (allowing the heartbeat mechanism to take
9. 9
cient. It can store the size of output files in the checkpoint has used 1.5 million years of CPU time. Scientists can now
file, and seek to these offsets on restart. Or it can use a set resurrect and reconsider these discarded ideas.[14]
of printf()‐replacement functions (supplied by BOINC) in part by a grant from XYZ.
that buffer output in memory, and flush these buffers
during checkpoint. REFERENCES
[1] Andrew S. Tanenbaum, Computer Networks, pp 2, 2002.
6. Remote diagnostics and debugging [2] David P. Anderson, Eric Korpela, Rom Walton. ʺHigh‐
Applications can fail by crashing or going into infinite Performance Task Distribution for Volunteer Computingʺ Space
loops. Some failures occur only in specific contexts – CPU Sciences Laboratory University of California, Berkeley H.
type, OS version, library version, even CPU speed. Such [3] http://Wikipedia.com
failures may be common on volunteer hosts, yet never [4] http://dsonline.computer.org
occur on the project’s development machines. The BOINC [5] Andy Oram (editor), Peer‐to‐peer: Harnessing the power of
disruptive technologies, pg 22, O´Reilly, 2001.
runtime system has several features that collect failure
[6] Rüdiger Schollmeier, A definition of peer‐to‐peer networking
information:
for the classification of peer‐to‐peer architectures and applica‐
• An application’s standard error output is di‐ tions, IEEE International Conference on Peer‐to‐Peer Compu‐
rected to a file and returned to the project’s server for ting, 2001, pp. 101 to 102.
all tasks, failed and not. [7] searchcloudcomputing.techtarget.com
• If an application crashes, stack trace is written to [8] Brian Hayes. Cloud computing. Commun. ACM, 51(7):9{11,
standard error. It the application includes a symbol 2008.ISSN00010782.doi:http://doi.acm.org/10.1145/1364782.1647
table, the stack trace is symbolic. 86.
[9] David P. Anderson,Carl Christensen,Bruce Allen UC Berkeley
• If an application is aborted (because the task ex‐
Space Sciences Laboratory Dept. of Physics, University of Ox‐
ceeds time, disk, or memory limits, or is aborted by ford,Physics Dept., University of Wisconsin – Milwaukee “De‐
the user) a stack trace is written to standard error. signing a Runtime System for Volunteer Computing “
All information about a task (exit code, signal number, [10] D. P. Anderson, J. Cobb, E. Korpela, M.Lebofsky, and D. Wer‐
standard error output, volunteer host platform) is stored thimer, SETI@home: An Experiment in Public‐Resource Com‐
puting. Communications of the ACM, Vol. 45, No. 11, 2002, pp.
in a relational database on the server, making it easy to
56‐61.
isolate the contexts in which failures occur. Many BOINC‐
[11] D. Toth & D. Finkel, A Comparison of Techniques for Distribut‐
based projects have small “alpha testing” projects, with
ing File‐Based Tasks for Public‐Resource Computing, Proc. 17th
enough volunteers to cover the main platforms, so that IASTED International Conference on Parallel and Distributed Com‐
context‐specific problems can be fixed before applications puting and Systems, Phoenix, Arizona, USA, 2005, pp. 398‐403.
are released to the public. [12] J. Bohannon, Grassroots Supercomputing. Science 308, 2005,
810‐813.
7. Conclusion
[13] C. Sagan. The Demon‐Haunted World: Science As a Candle in
Nowadays the opinion of the average internet user about the Dark. Random House, 1996.
internet itself isn’t a generous one; Carl Sagan observed [14] Dr. David P. Anderson. “Public Computing: Reconnecting People
that the general publicʹs attitude toward science is to Science”, March 21, 2004.
increasingly one of alienation and even hostility[13], of
course u can say this is a trend ever since Prometheus
stole the divine fire and gave it to humans, volunteer
computing is a step on the right direction. Not only it
puts potencial in the everyday internet user as a vessel of
knowledge but it takes the vouch of goverments and capi‐
talist companies in scientific research. Because computer
owners can contribute to whatever project they choose,
the control over resource allocation for science will be
shifted away from government funding agencies (with
the myriad factors that control their policies) and towards
the public. This has its risks: the public may be easier to
deceive than a peer‐review panel. But it offers a very
direct and democratic mechanism for deciding research
policy. If a scientist has an idea for a computation, but
finds that it will take a million years of computer time, the
normal reaction is to toss the idea in a wastebasket. But
public computing makes such ideas feasible: SETI@home