Skip to content

Copy Tools

Paul Nilsson edited this page Apr 25, 2022 · 1 revision

The Pilot is capable to use externally defined copy tools or available backend API for file transfers.

The logic implementing file transfer operation for stage-in and stage-out modes using corresponding library or external tool is isolated into dedicated Pilot copytool module.

Each copytool module relies on following settings to configure and customize top Staging workflow (implemented in Data API) for file transfer operations:

parameter type default value description
allowed_schemas list any enabled for PandaQueue a prioritized list of supported schemas for transfers by given copytool
require_replicas boolean False indicates if given copytool requires input replicas to be resolved first from Rucio before stage-in
require_input_protocols boolean False indicates if given copytool requires input protocols and manual generation of input replicas for stage-in
require_protocols boolean True indicates if given copytool requires protocols to be resolved first for stage-out
check_availablespace boolean True indicates whether space check should be applied before stage-in transfers using given copytool
resolve_surl handler StagingClient.resolve_surl Get final destination SURL for file to be transferred. Can be customized at the level of specific copytool
resolve_replica handler StageInClient.resolve_replica Resolve input replica (matched by domain) first according to primary_schemas, if not found then look up within allowed_schemas. Can be customized at the level of specific copytool

In addition to these settings, each copytool module must implement following interface functions:

function signature arguments description
is_valid_for_copy_in(files) files: list of input FileSpec entries Check if passed files list is valid (allowed) for stage-in operation. Typically returns True
is_valid_for_copy_out(files) files: list of output FileSpec entries Check if passed files list is valid (allowed) for stage-out operation. Typically returns True
copy_in(files, **kwargs)
  • files: list of FileSpec entries
  • kwargs: extra arguments passed by top workflow
Download (stage-in) given files using copytool related implementation. Copytool should update corresponding state fields of FileSpec object (status, status_code)
copy_out(files, **kwargs)
  • files: list of FileSpec entries
  • kwargs: extra arguments passed by top workflow
Upload (stage-out) given files using copytool related implementation. Copytool should update corresponding state fields of FileSpec object (status, status_code)

The current range of supported copy tools is described below.

Copy tool Require replicas
(stage-in)
Require input protocols
(stage-in)
Require protocols
(stage-out)
Check space
(stage-in)
Allowed schemas description
gfal or gfal-copy ✔️ ✔️ ✔️ ['srm', 'gsiftp', 'https', 'davs', 'root'] GFAL2 tool (gfal-copy command)
gs ✔️ ✔️ ✔️ ['gs', 'srm', 'gsiftp', 'https', 'davs', 'root'] Google Cloud Storage (google.cloud API)
lsm ✔️ ✔️ ['srm', 'gsiftp', 'root'] Local site mover (lsm-get, lsm-put commands)
mv ✔️ any Move file using filesystem commands (ln -s for stage-in, mv for stage-out)
objectstore ✔️ ✔️ ✔️ ['srm', 'gsiftp', 'https', 'davs', 'root', 's3', 's3+rucio'] Transfer files to OS storage using Rucio CLI (rucio download for stage-in, rucio upload for stage-out)
rucio ✔️ ✔️ any Transfer files to RSE using Rucio python API (rucio.client.downloadclient, rucio.client.uploadclient)
s3 ✔️ ✔️ ✔️ ['srm', 'gsiftp', 'https', 'davs', 'root', 's3', 's3+rucio'] Transfer files to Amazon Cloud Object Storage (S3 bucket) using boto3 python AWS API
xrdcp ✔️ ✔️ ✔️ ['root'] Transfer files using xrdcp command