SCMSWeb
SCMSWeb is developed for monitoring the large scale distributed system, including the newly emerging Computational and Data Grids. The substantial amount of monitoring data was collected for a variety of tasks such as the problem detection, performance determination, performance tuning and advance services supporting. The goal of this system is to develop technology and software tools that allows system administrator to monitor the state of large distributed system and grid computing system. The design focuses on the well defined and repetitive structure, the ease of use, the capability to form hierarchical monitoring and the historical archives. This paper presents the monitoring architectural design for large scale grid system software, SCMSWeb.
Introduction
Grid system is a distributed computing system that focuses on large-scale resource sharing, innovative application and high-performance orientation. It is applying the resources of many computers in a network to a single problem at the same time. Nowadays grid computing appears to be a promising trend because of its ability to make more cost-effective use of a given amount of computer resources.
When Grid scales growth larger, it is too complex to monitor and manage the wide range of heterogeneous resource. Therefore the monitoring of a very large scale distributed system is usually a complex and challenging task. SCMSWeb is a powerful web based monitoring system for large scale Grid and clusters that simplifies the administration of the system, allows system administrator to spot the potential problem earlier and captures important information. The information can be used to determine the source of performance problems, tuning up the application and system software, detect and recovery from failure and finally support advance services.
Features
- Scalability. This is the crucial issue in defining this monitoring architecture because the monitor should be scalable to thousands of resources, and services to monitor and potential entities that would like to receive this information.
- Extensibility. These system are designed to have unlimited extensibility through the use of monitoring level plug-in. Currently, there are two standard plug-in: Job Monitoring and Probe.
- Simplification. This models use layers to simplify the monitoring function. Layers divide the aspects of monitoring operation into less complex elements and enable us to specialize design and development efforts on modular functions.
- Modifiability. In addition, layers also prevent changes in one area from affecting other areas, so each area can evolve more quickly.
- Comprehensiveness. The monitoring system provides very detailed summary view of system information. The data is kept in a database that is accessible by users. The system support Hierarchical Grid and multi-cluster and has failure notification service.
Result
The system has currently been installed and used by many sites including the ApGrid (Asia Pacific Grid: www.apgrid.org), PRAGMA (Pacific Rim Applications & Grid Middleware Assembly: http://pragma.sdsc.edu) and ThaiGrid (www.thaigrid.net). The live demonstration is http://observer.cpe.ku.ac.th/scmsweb
Grid-viewer
Grid-viewer provides powerful summary status of Grid, cluster and host. Status information in Grid-viewer is showed host-by-host.
Static Summary Monitoring Grid-level
Static Summary Monitoring is a Graphical Monitoring tool which provides comparable view by Grid and cluster in Grid level, by host in cluster level and by hardware metric in host level.
Monitoring Management
Monitoring Management provides user-friendly way to add new entity to your Grid site. Owner of entity in your Grid network can modify own site information such as entity name, address and cgi-path. If your entity disappear, checker deamon will send an E-mail to an owner which is specified by E-mail address. This error reporting can be set at subscribe method.
System Information
System Information provides neccessary hardware information
Information Service
Information Service offers feature to retrieve information of Grid.
Publications
1. Napat Chalakornkosol and Putchong Uthayopas, "Monitoring the Dynamics of Grid Environment using Grid Observer ", Poster Presentation in IEEE CCGRID2003, Toshi Center, Tokyo, May 12-15, 2003. (pdf)
Download
SCMS Packages - http, ftp
SCMSWeb Packages - http, ftp
Required Packages - NOTE: It is possible to use others build RPM such as freshrpms.net, DAG
PyRRDtool - http, ftp
RRdtool - http, ftp



