CoMon is an evolving, mostly-scalable monitoring system
for PlanetLab that has the goal of presenting environmenttailored
information for both the administrators and users of
the PlanetLab global testbed. In addition to passively reporting
metrics provided by the operating system, CoMon also
actively gathers a number of metrics useful for developers
of networked systems. Using CoMon, PlanetLab administrators
and users can easily spot problematic machines, where
the problem may arise from the machine itself, local configuration/
environment problems, or the workload running on
the machine. Furthermore, users can easily observe many
properties of all of the experiments running across multiple
PlanetLab nodes, facilitating not only their own experiment
monitoring and debugging, but also helping scale the task of
finding PlanetLab problems.
In this paper we describe CoMon’s design and operation,
includingwhat kinds of data are gathered, the scale of the processing
involved, and the approaches we have taken to keep
CoMon running. Our goal is not only to illustrate the kinds of
problems faced in this environment, but also to invite others
to participate, either by experimentingwith the data generated
by CoMon, or by building on the CoMon system itself.