Introduction

The GPIR providers are responsible for getting data about resources and putting it into XML format, and ingesting the data into GPIR. There are two categories of providers: central and remote. Central providers can be run on any server, including the GridPort server. Remote providers need to be installed and deployed on all end resources that will be monitored by GPIR. Sample providers that gather different types of data (e.g. jobs, load, motd) for two three different resource managers (LSF, PBS, LoadLeveler, Condor) are included in the GridPort distribution.

Remote Providers

Configuration and Installation

In order to install the remote GPIR providers, please carry out the following steps:

  1. Set up the remote resource as a client in GPIR.

    GPIR authenticates a provider that tries to ingest data via that providers ip address. Thus, the resource on which a remote provider runs must be added as a client in the GPIR. You can add clients in GPIR by using the GPIR Admin Client. For more details, please see the documentation on the GPIR Admin Client.
  2. Distribute and unpack the providers.

    Bundle the provider code, transfer the provider bundle to each remote resource for which you wish to gather information, and decompress the provider bundle.
    
    	> cd gpir-<version>/src
    	> tar cvfz providers.tar.gz providers/
    	> scp providers.tar.gz provideruser@myremotehost.edu:~/.
    	> ssh provideruser@myremotehost.edu
    	> tar xvfz providers.tar.gz
    	
  3. Set up the provider user's environment

    If the remote provider will gather data from the resource's resource manager, the portal user's environment must be aware of the resource manager's commands. If the provider user does not have this environment configured by default, please add the configuration to the provider user's shell environment by editing this user's $HOME/.profile file (the provider runs as an sh shell command via cron).
  4. Install SOAP::Lite

    If your remote resource does not have the SOAP::Lite perl module, you will need to install it on the remote resource as the provider user. The SOAP::Lite bundle can be found in providers/perl/soap. In the same directory, there is also a simple shell script that will install the bundle. Make sure to execute this script from the providers/perl/soap directory. (Note: you may need to modify the script to account for various resources' tar commands.)
    
    	# as provideruser on the remote resource
    	> cd ~/providers/perl/soap
    	> ./install.soap.sh
    	
  5. (LSF resources) Install the lsf_showq command

    If your remote resource utilizes LSF as its resource manager, the provider scripts will by default interact with the lsf_showq command, a non-default lsf command developed at TACC. This command is included in the GPIR provider distribution, and an installation script is provided.
    
    	> cd ~/providers/perl/lsf_showq
    	> ./install.sh
    	
  6. Configure the provider

    Next, you will need to set a few configuration values for the remote provider. You will need to edit the parameters in perl-providers/perl/conf/providers.conf. These parameters include hostname, GPIR servers (and ports), administrator email addresses, and resource manager specification. The following configuration utilizes the sample data gathering modules that interact with a PBS resource manager. If your resource utilizes a different resource manager, please point the module paths to the appropriate data gathering module for that resource manager (see the providers/src/modules directory for a full list of the example modules provided). For more details on the configuration parameters, see the documentation in the configuration file.
    
    	# as provideruser on the remote resource
    	> vi ~/providers/perl/conf/providers.conf
    
    	...
    
    	### RESOURCE INFO
    	hostname=myremotehost.edu
    
    	### MODULE PATHS
    	motd.module=../modules/motd.pl
    	load.module=../modules/jobs.pbs.pl
    	jobs.module=../modules/load.pbs.pl
    	jobs.condor.module=<path-to-load-module>
    	pcgrid.condor.module=<path-to-load-module>
    
    	### CONDOR INFO
    	central.manager=centralmanager.edu
    	pool.name=<pool-name>
    	pool.description=<pool-description>
    	pool.state=<enabled || disabled>
    
    	
    	### GPIR INFO
    	gpir.contact=gpirserver.edu:8080
    
    	### ADMIN INFO
    	admin.email=portaladmin@myorg.edu
    
    	
  7. Test the providers from the command line.

    Before automating the execution of the provider scripts, please run the providers from the command line to ensure they are functioning correctly. To run the provider, as the provider user on the remote resource, from within the providers/perl/src/core directory, run the provider's main.pl script with the appropriate arguments. It is a good idea to first test the providers without ingesting the data to GPIR (by passing the -n command-line parameter to main.pl).

    Note: If you installed the perl SOAP::Lite module that is bundled with the providers, using the main.pl script to ingest the data will not work. However, it is still useful to test proper data gathering and xml formatting without ingesting to GPIR.
    
    	
    	# as provideruser on the remote resource
    	> cd ~/providers/perl/src/core
    	> ./main.pl
    	Usage: ./main.pl -f <function> [-d] [-n]
    	<function> = motd | jobs | load | nodes
    	[-d] = print debug info
    	[-n] = do not ingest data in GPIR
    
    	### first test without ingesting
    	> ./main.pl -f motd -d -n
    	> ./main.pl -f load -d -n
    	> ./main.pl -f jobs -d -n
    	> ./main.pl -f jobs.condor -d -n
    	> ./main.pl -f pcgrid.condor -d -n
    
    	### then test with ingesting
    	> cd ~/providers/perl
    	> ./run.sh motd
    	> ./run.sh load
    	> ./run.sh jobs
    	> ./run.sh jobs.condor
    	> ./run.sh pcgrid.condor
    	
    	
  8. Install the provider in the crontab.

    In order to automate the execution of the provider, we recommend using cron run on the remote resource under the provider user's account. Sample cron entries for various resource types have been included in the providers/perl/cron directory. The sample entries gather motd information every half hour and gather load and jobs information every fifteen minutes. The sample entries also specify that the output and error of the provider execution be written to files in the providers/perl/logs directory.
    
    	# as provideruser on the remote resource
    	> crontab ~/providers/perl/cron/compute.crontab
    	

Customizing The Providers (Creating Module Scripts)

The remote GPIR providers are architected to separate the various responsibilities they carry out into modular scripts. The tasks carried out can be listed as follows:

  1. Gather data of interest from the resource manager (or other sources).
  2. Format the data into GPIR-schema-based XML.
  3. Ingest the data into data via SOAP.
Due to the great variety of resource managers, and slight differences in the formats of their commands' outputs, Step 1 may have to be customized somewhat to fit the environment of your particular resource. However, as long as Step 1 produces a consistently formatted output, you should not have to modify any provider code that deals with Steps 2-3.

Step 1, the gathering of resource data, is encapsulated in the perl scripts in the providers/perl/src/modules directory. The provider's infrastructure code will execute these scripts, and will take the resulting output produced by the scripts and pass it to the xml formatting code in providers/perl/src/xml_formatters. Thus, if you wish to include a customized data gathering mechanism for your resource, simply create an executable script (in the programming language of your choice) that gathers the data and writes it to STDOUT in the expected format of for the particular function (i.e. motd, load, jobs).

The existing xml formatters expect a well-defined format to be written to STDOUT. A description of the format for each supported function follows:

  • motd - The motd xml formatter simply expects the MOTD contents to be printed to STDOUT line-by-line. Thus, the given provider simply makes a call to 'cat motd'. It is unlikely this provider will need to be tweaked, unless perhaps your resources motd file is not in the standard location (/etc/motd).
  • load - The load xml formatter simply expects the ratio of CPUS used to total CPUS available, expressed as a whole number percentage (i.e. 65). This number should be printed to stdout on a single line.
  • jobs - The jobs xml formatter expects the appropriate module to print a job per line. Each line will contain a comma-delimited list of the jobs attributes, with no spaces between commas and attributes. For a detailed list of the order of the attributes, please refer to the providers/perl/src/xml_formatters/jobs.pl script.

Extending Provider Functionality

The remote provider is separated into three groups of functionality: core, modules, and xml formatters. The core scripts are responsible for the provider's configuration, main logic, and GPIR web service interaction. The modules are responsible for gathering a specific type of data about the resource. The xml formatters convert data obtained by the modules into a format that the GPIR web service expects. In order to use the remote provider to gather another type of information from the resource, please do the following:

  1. Add a module to gather data

    You will need to add a module to the remote providers to gather the particular data of interest. This module must be an executable script (in the programming language of your choice) that acquires the data of interest and prints the data to STDOUT in a format that its xml formatter expects. Different modules will print data in different formats, depending on the type of data being gathered. Modules are located in the providers/perl/src/modules directory and follow the following naming convention:
    
    	
    	<function>.<resource_manager>.pl
    	
    	
  2. Add an xml formatter to format the module's data for GPIR

    The xml formatter will convert the data printed by the the module into a format that the GPIR web service expects. Specifically, this format is an xml string that conforms to one of the accepted GPIR schemas. Please refer to the documentation on GPIR schemas for more information. Xml formatters are located in the providers/perl/src/xml_formatters directory and follow the following naming convention:
    
    	
    	<function>.pl
    	
    	
  3. Add the new functionality to the provider's core configuration

    Finally, you will need to modify the remote provider's core configuration in order for your new functionality to be executed by the provider. To do so, you will need to edit providers/perl/src/core/config.pl. Simply add the function to the VALID_FUNCTIONS list (line 3).
    
    	# as provideruser on the remote resource
    	vi ~/providers/perl/src/core/config.pl
    	...
    	
    	# line 3
    	@VALID_FUNCTIONS = ('motd', 'jobs', 'jobs.condor', 'load', 'pcgrid.condor', 'nodes');
    	
    The new function will be accepted as a parameter of the -f flag when executing the providers/perl/src/core/main.pl script.