A command line interface and Python module for accessing the CKAN Action API
tested under Python 2.7, 3.6 and pypy
Installation with pip:
pip install ckanapi
Installation with conda:
conda install -c conda-forge ckanapi
The ckanapi command line interface lets you access local and remote CKAN instances for bulk operations and simple API actions.
Simple actions with string parameters may be called directly. The response is pretty-printed to STDOUT.
$ ckanapi action group_list -r https://demo.ckan.org --insecure
[
"data-explorer",
"example-group",
"geo-examples",
...
]
Use -r to specify the remote CKAN instance, and -a to provide an API KEY. Remote actions connect as an anonymous user by default. For this example, we use --insecure as the CKAN demo uses a self-signed certificate.
Local CKAN actions may be run by specifying the config file with -c. If no remote server or config file is specified the CLI will look for a development.ini file in the current directory, much like paster commands.
Local CKAN actions are performed by the site user (default system administrator) when -u is not specified.
To perform local actions with a less privileged user use the -u option with a user name or a name that doesn't exist. This is useful if you don't want things like deleted datasets or private information to be returned.
Note that all actions in the CKAN Action API and actions added by CKAN plugins are supported.
Simple action arguments may be passed in KEY=STRING form for string values or in KEY:JSON form for JSON values.
$ ckanapi action package_show id=my-dataset-name
{
"name": "my-dataset-name",
...
}
$ ckanapi action datastore_info id=my-resource-id-or-alias
{
"meta": {
"aliases": [
"test_alias"
],
"count": 1000,
...
}
$ ckanapi action package_search facet.field:'["organization"]' rows:0
{
"facets": {
"organization": {
"org1": 42,
"org2": 21,
...
}
},
...
}
Files may be passed for upload using the KEY@FILE form.
$ ckanapi action resource_create package_id=my-dataset-with-files \
upload@/path/to/file/to/upload.csv
$ ckanapi action package_show id=my-dataset-id > my-dataset.json
$ nano my-dataset.json
$ ckanapi action package_update -I my-dataset.json
$ rm my-dataset.json
$ ckanapi action resource_patch id=my-resource-id size:42000000
Datasets, groups, organizations, users and related items may be dumped to JSON lines text files and created or updated from JSON lines text files.
dump
and load
jobs can be run in parallel with
multiple worker processes using the -p
parameter. The jobs in progress,
the rate of job completion and any individual errors are shown on STDERR
while the jobs run.
There are no parallel limits when running against a CKAN on localhost.
When running against a remote site, there's a default limit of 3 worker processes.
The environment variables CKANAPI_MY_SITES
andCKANAPI_PARALLEL_LIMIT
can be
used to adjust these limits. CKANAPI_MY_SITES
(comma-delimited list of CKAN urls)
will not have the PARALLEL_LIMIT
applied.
dump
and load
jobs may be resumed from the last completed
record or split across multiple servers by specifying record
start and max values.
$ ckanapi dump datasets --all -O datasets.jsonl.gz -z -p 4 -r http://localhost
$ ckanapi search datasets include_private=true -O datasets.jsonl.gz -z \
-c /etc/ckan/production.ini
search
is faster than dump
because it calls package_search
to retrieve
many records per call, paginating automatically.
You may add parameters supported by package_search
to filter the
records returned.
$ ckanapi load datasets -I datasets.jsonl.gz -z -p 3 -c /etc/ckan/production.ini
Datasets, groups, organizations, users and related items may be deleted in bulk with the delete command. This command accepts ids or names on the command line or a number of different formats piped on standard input.
$ ckanapi action package_list -j | ckanapi delete datasets
$ ckanapi action package_search q=ponies | ckanapi delete datasets
$ ckanapi dump groups --all > groups.jsonl
$ grep ponies groups.jsonl | ckanapi delete groups
$ cat users_to_remove.txt
fred
bill
larry
$ ckanapi delete users < users_to_remove.txt
Datasets may be exported to a simplified datapackage.json format (which includes the actual resources, where available).
If the resource url is not available, the resource will be included in the datapackage.json file but the actual resource data will not be downloaded.
$ ckanapi dump datasets --all --datapackages=./output_directory/ -r http://sourceckan.example.com
Simple shell pipelines are possible with the CLI.
$ ckanapi action package_show id=my-dataset \
| jq '.+{"title":.name}' \
| ckanapi action package_update -i
$ ckanapi dump datasets --all -q -r http://sourceckan.example.com \
| ckanapi load datasets
The ckanapi Python module may be used from within a CKAN extension or in a Python 2 or Python 3 application separate from CKAN.
Making a request:
from ckanapi import RemoteCKAN
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'
demo = RemoteCKAN('https://demo.ckan.org', user_agent=ua)
groups = demo.action.group_list(id='data-explorer')
print(groups)
result:
[u'data-explorer', u'example-group', u'geo-examples', u'skeenawild']
The example above is using an "action shortcut". The .action
object detects
the method name used ("group_list" above) and converts it to a normal
call_action
call. This is equivalent code without using an action shortcut:
groups = demo.call_action('group_list', {'id': 'data-explorer'})
Once again, all actions in the CKAN Action API
and actions added by CKAN plugins are supported by action shortcuts and
call_action
calls.
For example, if the Showcase extension is installed:
from ckanapi import RemoteCKAN
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'
demo = RemoteCKAN('https://demo.ckan.org', user_agent=ua)
showcases= demo.action.ckanext_showcase_list()
print(showcases)
Combining query parameters clauses is possible as in the following package_search
action. This query combines three clauses that are all satisfied by the single example dataset in the Demo CKAN site.
More detailed complex query syntax examples can be found in the SOLR documentation.
from ckanapi import RemoteCKAN
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'
demo = RemoteCKAN('https://demo.ckan.org', user_agent=ua)
packages = demo.action.package_search(q='+organization:sample-organization +res_format:GeoJSON +tags:geojson')
print(packages)
Many CKAN API functions can only be used by authenticated users. Use the
apikey
parameter to supply your CKAN API key to RemoteCKAN
:
demo = RemoteCKAN('https://demo.ckan.org', apikey='MY-SECRET-API-KEY')
An example of updating a single field in an existing dataset can be seen in the Examples directory
NotAuthorized
- user unauthorized or accessing a deleted itemNotFound
- name/id not foundValidationError
- field errors listed in.error_dict
SearchQueryError
- error reported from SOLR indexSearchError
CKANAPIError
- incorrect use of ckanapi or unable to parse responseServerIncompatibleError
- the remote API is not a CKAN API
When using an action shortcut or the call_action
method
failures are raised as exceptions just like when calling get_action
from a
CKAN plugin:
from ckanapi import RemoteCKAN, NotAuthorized
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'
demo = RemoteCKAN('https://demo.ckan.org', apikey='phony-key', user_agent=ua)
try:
pkg = demo.action.package_create(name='my-dataset', title='not going to work')
except NotAuthorized:
print('denied')
When it is possible to import ckan
all the ckanapi exception classes are
replaced with the CKAN exceptions with the same names.
File uploads for CKAN 2.2+ are supported by passing file-like objects to action shortcut methods:
from ckanapi import RemoteCKAN
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'
mysite = RemoteCKAN('http://myckan.example.com', apikey='real-key', user_agent=ua)
mysite.action.resource_create(
package_id='my-dataset-with-files',
url='dummy-value', # ignored but required by CKAN<2.6
upload=open('/path/to/file/to/upload.csv', 'rb'))
When using call_action
you must pass file objects separately:
mysite.call_action('resource_create',
{'package_id': 'my-dataset-with-files'},
files={'upload': open('/path/to/file/to/upload.csv', 'rb')})
As of ckanapi 4.0 RemoteCKAN will keep your HTTP connection open using a requests session.
For long-running scripts make sure to close your connections by using RemoteCKAN as a context manager:
from ckanapi import RemoteCKAN
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'
with RemoteCKAN('https://demo.ckan.org', user_agent=ua) as demo:
groups = demo.action.group_list(id='data-explorer')
print(groups)
Or by explicitly calling RemoteCKAN.close()
.
A similar class is provided for accessing local CKAN instances from a plugin in the same way as remote CKAN instances. Unlike CKAN's get_action LocalCKAN prevents data from one action call leaking into the next which can cause issues that are very hard do debug.
This class defaults to using the site user with full access.
from ckanapi import LocalCKAN, ValidationError
registry = LocalCKAN()
try:
registry.action.package_create(name='my-dataset', title='this will work fine')
except ValidationError:
print('unless my-dataset already exists')
For extra caution pass a blank username to LocalCKAN and only actions allowed by anonymous users will be permitted.
from ckanapi import LocalCKAN
anon = LocalCKAN(username='')
print(anon.action.status_show())
A class is provided for making action requests to a webtest.TestApp instance for use in CKAN tests:
from ckanapi import TestAppCKAN
from webtest import TestApp
test_app = TestApp(...)
demo = TestAppCKAN(test_app, apikey='my-test-key')
groups = demo.action.group_list(id='data-explorer')
To run the tests:
python setup.py test
π¨π¦ Government of Canada / Gouvernement du Canada
The project files are covered under Crown Copyright, Government of Canada and is distributed under the MIT license. Please see COPYING / COPYING.fr for full details.