Lenticular Lens is a tool which allows users to construct linksets between entities from different Timbuctoo datasets (so called data-alignment or reconciliation). Lenticular Lens tracks the configuration and the algorithms used in the alignment and is also able to report on manual corrections and the amount of manual validation done.
- Make sure Docker and Docker Compose are installed
- For Windows and Mac users: install Docker Desktop
- Use the provided
docker-compose.yml
as a baseline - Run
docker-compose up
- Visit http://localhost:8000 in your browser
Note: This will create a folder pgdata
with the database data. To clean up the database and start from scratch,
simply remove this folder.
Misc. configuration:
APP_DOMAIN
: The application domain; defaults tohttp://localhost
SECRET_KEY
: The secret key used for session signingADMIN_ACCESS_TOKEN
: The access token used for running admin tasksLOG_LEVEL
: The logging level; defaults toINFO
PUBLISHER
: The publisher to be registered in the RDF export; defaults toLenticular Lens
AUTO_DELETE_JOB_DAYS
: The minimum number of days after creation of a job making the job eligible for deletionWORKER_TYPE
: For a worker instance, the type of the worker to run:TIMBUCTOO
LINKSET
LENS
CLUSTERING
RECONCILIATION
Database configuration:
DATABASE_HOST
: The database host; defaults tolocalhost
DATABASE_PORT
: The database port; defaults to5432
DATABASE_DB
: The database name; defaults topostgres
DATABASE_USER
: The database user; defaults topostgres
DATABASE_PASSWORD
: The database password; defaults topostgres
DATABASE_MAX_CONNECTIONS
: The maximum number of database connections in the connection pool; defaults to5
OpenID Connect authentication configuration:
OIDC_SERVER
: The OpenID Connect provider server; leave empty to disable authenticationOIDC_CLIENT_ID
: The OpenID Connect client idOIDC_CLIENT_SECRET
: The OpenID Connect client secret
-
Job
A job encloses a research question, which highlights the scope/context in which linksets and lenses are created, analysed, validated and exported.
-
Entity-type selection
An entity-type selection is a selection of entities (stemmed from a dataset) of a certain type based on zero or more filters. The set of entity-type selections in a job comprises the entities of interest for a research question.
-
Linkset specification
A linkset specification is the specification determining how entities from one or more entity-type selections should be matched using one or more entity matching algorithms. Running a linkset specification will result in a _ linkset_.
-
Linkset
A linkset is a set of paired resources (URIs) that matched according to a linkset specification.
-
Lens specification
A lens specification is the specification that specifies one or more modifications (union, intersection, ...) over a number of linksets or lenses. The lens inherits the specifications of all _ linksets_ and lenses it originates from.
-
Lens
A lens is a set of paired resources (URIs) resulting from one or more modifications according to a lens specification.
-
Clustering
A clustering is the partitioning of the resources (URIs) in a linkset or lens into clusters based on transitivity of the links in the linkset or lens.
-
Cluster
A cluster is a set of potentially similar resources (URIs). As a cluster originates from the clustering of a _ linkset_ or a lens, the cluster holds only with respect to their linkset specifications.
URL: /
Method: GET
Root page. Will return the GUI for the tool.
URL: /datasets
Method: GET
Parameters: endpoint
Returns all available datasets for a specific Timbuctoo GraphQL endpoint
.
Example: /datasets?endpoint=https://repository.goldenagents.org/v5/graphql
URL: /downloads
Method: GET
Returns all currently running data downloads and finished data downloads from Timbuctoo.
URL: /download
Method: GET
Parameters: endpoint
, dataset_id
, collection_id
Starts a data download from Timbuctoo from the given Timbuctoo GraphQL endpoint
. Use dataset_id
to specify from
which dataset to download and collection_id
to specify the collection from the dataset to download.
_
Example: /download?endpoint=https://repository.goldenagents.org/v5/graphql&dataset_id=ufab7d657a250e3461361c982ce9b38f3816e0c4b__ecartico_20190805&collection_id=schema_Person
_
URL: /stopwords/<dictionary>
Method: GET
Returns the stopwords for the given dictionary
.
URL: /methods
Method: GET
Returns the various available filter functions, matching methods and transformers.
URL: /login
Method: GET
Parameters: redirect-uri
Allow the user to login and then redirect back to the given redirect-uri
.
Example: /login?redirect-uri=https://lenticularlens.org
URL: /user_info
Method: GET
Returns the user information of the logged-in user.
URL: /job/create
Method: POST
Form data: job_title
, job_description
Creates a new job with the given job_title
and job_description
. Returns the identifier of this new job.
URL: /job/update
Method: POST
JSON: job_id
, job_title
, job_description
, job_link
, entity_type_selections
, linkset_specs
, lens_specs
, views
Updates a job with the given job_id
. Updates the job_title
, job_description
, job_link
, entity_type_selections
, linkset_specs
, lens_specs
and views
.
URL: /job/<job_id>
Method: GET
Returns the details of a job with the given job_id
.
Example: /job/d697ea3869422ce3c7cc1889264d03c7
URL: /job/<job_id>/linksets
Method: GET
Returns the details of all linksets with the given job_id
.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/linksets
URL: /job/<job_id>/lenses
Method: GET
Returns the details of all lenses with the given job_id
.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/lenses
URL: /job/<job_id>/clusterings
Method: GET
Returns the details of all clustering jobs with the given job_id
.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/clusterings
URL: /job/<job_id>/run/<type>/<linkset>
Method: POST
Form data: restart
Start a process for the given spec of type
(linkset
or lens
) of a specific job_id
. Specify restart
to restart
the process.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/run/linkset/0
URL: /job/<job_id>/run_clustering/<type>/<id>
Method: POST
Start a clustering process of type
(linkset
or lens
) for the linkset/lens with the given id
of a
specific job_id
.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/run_clustering/linkset/0
URL: /job/<job_id>/kill/<type>/<linkset>
Method: POST
Stop a process for the given spec of type
(linkset
or lens
) of a specific job_id
.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/kill/linkset/0
URL: /job/<job_id>/kill_clustering/<type>/<id>
Method: POST
Stop a clustering process of type
(linkset
or lens
)
for the linkset/lens with the given id
of a specific job_id
.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/kill_clustering/lens/0
URL: /job/<job_id>
Method: DELETE
Deletion of the job with the given job_id
.
Example: /job/d697ea3869422ce3c7cc1889264d03c7
URL: /job/<job_id>/<type>/<id>
Method: DELETE
Deletion of type
(linkset
or lens
) for the linkset/lens with the given id
of a specific job_id
.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/lens/0
URL: /job/list
Method: GET
Returns all the logged-in user his/her jobs.
URL: /job/<job_id>/entity_type_selection_total/<id>
Method: GET
Returns the total number of entities for an entity-type selection with the given id
of the given job_id
.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/entity_type_selection_total/0
URL: /job/<job_id>/links_totals/<type>/<id>
Method: GET
, POST
Parameters: apply_filters
, uri
, cluster_id
, min
, max
Returns the total number of links of type
(linkset
or lens
) for the linkset/lens with id
of the given job_id
.
Specify apply_filters
to apply the filters specified by the user. Specify uri
to only return links with the
specified URIs. Specify cluster_id
to only return the links of specific clusters. Specify min
and/or max
to only
return links with a similarity score within the specified minimum and maximum score.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/links_totals/linkset/0
URL: /job/<job_id>/clusters_totals/<type>/<id>
Method: GET
, POST
Parameters: apply_filters
, uri
, cluster_id
, min
, max
Returns the total number of clusters of type
(linkset
or lens
) for the linkset/lens with id
of the
given job_id
.
Specify apply_filters
to apply the filters specified by the user. Specify uri
to only return links with the
specified URIs. Specify cluster_id
to only return the links of specific clusters. Specify min
and/or max
to only
take into account links with a similarity score within the specified minimum and maximum score. Specify min_size
and/or max_size
to only return clusters with a size that is within the specified minimum and maximum size.
Specify min_count
and/or max_count
to only return clusters with a links count that is within the specified minimum
and maximum count.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/clusters_totals/linkset/0
URL: /job/<job_id>/entity_type_selection/<id>
Method: GET
Parameters: limit
, offset
Returns all data for an entity-type selection with the given id
of the given job_id
. Use limit
and offset
for
paging.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/entity_type_selection/0
URL: /job/<job_id>/links/<type>/<id>
Method: GET
, POST
Parameters: with_properties
, apply_filters
, valid
, uri
, cluster_id
, min
, max
, sort
, limit
, offset
Returns the links of type
(linkset
or lens
) for the linkset/lens with id
of the given job_id
. Use limit
and offset
for paging.
Specify with_properties
with 'none' to return no property values, 'single' to only return a single property value or '
multiple' to return multiple property values. Specify apply_filters
to apply the filters specified by the user.
Specify valid
with accepted
, rejected
, uncertain
and/or unchecked
to only return from the specified validity
types. Specify uri
to only return links with the specified URIs. Specify cluster_id
to only return the links of
specific clusters. Specify min
and/or max
to only return links with a similarity score within the specified minimum
and maximum score. Specify sort
if you want to enable sorting on similarity score using asc
or desc
.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/links/linkset/0
URL: /job/<job_id>/clusters/<type>/<id>
Method: GET
, POST
Parameters: with_properties
, apply_filters
, include_nodes
, uri
, cluster_id
, min
, max
, min_size
, max_size
, min_count
, max_count
, limit
, offset
Returns the clusters of type
(linkset
or lens
) for the linkset/lens with id
of the given job_id
. Use limit
and offset
for paging.
Specify with_properties
with 'none' to return no property values, 'single' to only return a single property value or '
multiple' to return multiple property values. Specify apply_filters
to apply the filters specified by the user.
Specify include_nodes
to include all nodes that are part of the cluster in the response. Specify uri
to only return
links with the specified URIs. Specify cluster_id
to only return the links of specific clusters. Specify min
and/or max
to only return links with a similarity score within the specified minimum and maximum score.
Specify min_size
and/or max_size
to only return clusters with a size that is within the specified minimum and
maximum size. Specify min_count
and/or max_count
to only return clusters with a links count that is within the
specified minimum and maximum count.
Example: /job/d697ea3869422ce3c7cc1889264d03c7/clusters/0
URL: /job/<job_id>/validate/<type>/<id>
Method: POST
Form data: source
, target
, apply_filters
, valid
, uri
, cluster_id
, min
, max
, validation
Validate a link of type
(linkset
or lens
) for the linkset/lens with id
of the given job_id
.
Specify the uris of the source
and target
to identify the link to be validated. Or filter the links by
specifying apply_filters
to apply the filters specified by the user. Specify valid
with accepted
, rejected
,
uncertain
and/or unchecked
to only return from the specified validity types. Specify uri
to only return links with
the specified URIs. Specify cluster_id
to only return the links of specific clusters. Specify min
and/or max
to only return links with a similarity score within the specified minimum and maximum score.
Provide validation
with either accepted
, rejected
or uncertain
to validate the link or use unchecked
to reset.
URL: /job/<job_id>/motivate/<type>/<id>
Method: POST
Form data: source
, target
, apply_filters
, valid
, uri
, cluster_id
, min
, max
, motivation
Motivate using motivation
of type
(linkset
or lens
) for the linkset/lens with id
of the given job_id
.
Specify the uris of the source
and target
to identify the link to be motivated. Or filter the links by
specifying apply_filters
to apply the filters specified by the user. Specify valid
with accepted
, rejected
,
uncertain
and/or unchecked
to only return from the specified validity types. Specify uri
to only return links with
the specified URIs. Specify cluster_id
to only return the links of specific clusters. Specify min
and/or max
to only return links with a similarity score within the specified minimum and maximum score.
URL: /job/<job_id>/cluster/<type>/<id>/<cluster_id>/graph
Method: GET
Get the visualization information for a cluster with cluster_id
of type
(linkset
or lens
)
for the linkset/lens with id
of the given job_id
.
URL: /job/<job_id>/csv/<type>/<id>
Method: GET
Parameters: valid
Get a CSV export of type
(linkset
or lens
) for the linkset/lens with id
the given job_id
.
Specify valid
with accepted
, rejected
, uncertain
and/or unchecked
to only export from the specified validity
types.
URL: /job/<job_id>/rdf/<type>/<id>
Method: GET
Parameters: valid
, link_pred_namespace
, link_pred_shortname
, export_metadata
,
export_linkset
, reification
, use_graphs
, creator
, publisher
Get a RDF export of type
(linkset
or lens
) for the linkset/lens with id
the given job_id
.
Specify valid
with accepted
, rejected
, uncertain
and/or unchecked
to only export from the specified validity
types.
Specify link_pred_namespace
and link_pred_shortname
to configure the predicate to use for the links.
Specify export_metadata
, export_linkset
with boolean values to indicate what to include in the RDF export.
Specify reification
with either none
, standard
, singleton
or rdf_star
to indicate how the link metadata has to
be included in the RDF export.
Specify use_graphs
to determine the RDF format to use.
Optionally specify creator
to include extra metadata. If authentication is enabled, the creator
is obtained from the
authentication provider.
URL: /admin/cleanup_jobs
Method: POST
Parameters: access_token
Cleanup all the jobs.
Specify access_token
to show authorization to run this admin task.
URL: /admin/cleanup_downloaded
Method: POST
Parameters: access_token
Cleanup all the downloaded collections.
Specify access_token
to show authorization to run this admin task.
Lenticular Lens pushes events using the Socket.IO library using WebSockets.
There is a default namespace on /
and a namespace for messages on a specific job on /<job_id>
.
Event: timbuctoo_update
Emits download progress on Timbuctoo datasets.
{
// The GraphQL interface of the Timbuctoo instance
"graphql_endpoint": "https://repository.goldenagents.org/v5/graphql",
// The identifier of the dataset
"dataset_id": "ufab7d657a250e3461361c982ce9b38f3816e0c4b__ecartico_20190805",
// The identifier of the collection from this dataset
"collection_id": "foaf_Person",
// The total number of entities to be downloaded
"total": 1000,
// The total number of entities currently downloaded
"rows_count": 400,
}
Event: timbuctoo_delete
Emits removal of a Timbuctoo dataset collection from the database.
{
// The GraphQL interface of the Timbuctoo instance
"graphql_endpoint": "https://repository.goldenagents.org/v5/graphql",
// The identifier of the dataset
"dataset_id": "ufab7d657a250e3461361c982ce9b38f3816e0c4b__ecartico_20190805",
// The identifier of the collection from this dataset
"collection_id": "foaf_Person",
}
Event: job_update
Emits when the job has been updated.
{
// The job identifier
"job_id": "d697ea3869422ce3c7cc1889264d03c7",
// The timestamp of the update
"updated_at": "2021-01-01T12:00:00.01234",
// Was the title updated?
"is_title_update": true,
// Was the description updated?
"is_description_update": true,
// Was the link updated?
"is_link_update": true,
// Were any entity-type selections updated?
"is_entity_type_selections_update": false,
// Were any linkset specifications updated?
"is_linkset_specs_update": false,
// Were any lens specifications updated?
"is_lens_specs_update": false,
// Were any views updated?
"is_views_update": false,
}
Event: alignment_update
Emits linkset or lens matching progress.
{
// The job identifier
"job_id": "d697ea3869422ce3c7cc1889264d03c7",
// The specification type: a linkset or a lens
"spec_type": 'linkset',
// The specification identifier
"spec_id": 1,
// The matching status
"status": "running",
// A human-readable status message
"status_message": "Matching",
// If links progressing is enabled, the number of links found so far
"links_progress": 23,
}
Event: alignment_delete
Emits removal of a linkset or lens.
{
// The job identifier
"job_id": "d697ea3869422ce3c7cc1889264d03c7",
// The specification type: a linkset or a lens
"spec_type": 'linkset',
// The specification identifier
"spec_id": 1,
}
Event: clustering_update
Emits clustering progress.
{
// The job identifier
"job_id": "d697ea3869422ce3c7cc1889264d03c7",
// The specification type: a linkset or a lens
"spec_type": 'linkset',
// The specification identifier
"spec_id": 1,
// The type of clustering performed
"clustering_type": "default",
// The matching status
"status": "running",
// A human-readable status message
"status_message": "Clustering",
// The number of links clustered so far
"links_count": 452,
// The number of clusters found so far
"clusters_count": 5,
}
Event: clustering_delete
Emits removal of a clustering.
{
// The job identifier
"job_id": "d697ea3869422ce3c7cc1889264d03c7",
// The specification type: a linkset or a lens
"spec_type": 'linkset',
// The specification identifier
"spec_id": 1,
// The type of clustering performed
"clustering_type": "default",
}
Entity-type selections is a list of JSON objects that contain the configuration of the specific entity-type selections to use for a particular job.
{
// An integer as identifier
"id": 1,
// The label of the entity-type selection
"label": "My dataset",
// A description of this entity-type selection by the user; optional field
"description": "",
// The data to use from Timbuctoo
"dataset": {
// The identifier of the dataset to use
"dataset_id": "ufab7d657a250e3461361c982ce9b38f3816e0c4b__ecartico_20190805",
// The identifier of the collection from this dataset to use
"collection_id": "foaf_Person",
// The GraphQL interface of the Timbuctoo instance
"timbuctoo_graphql": "https://repository.goldenagents.org/v5/graphql",
},
// The filter configuration to obtain only a subset of the data from Timbuctoo; optional field
"filter": {
// Whether ALL conditions in this group should match ('and') or AT LEAST ONE condition in this group has to match ('or')
"type": "and",
// The filter is composed of a logic box
"conditions": [
{
// The property path to which this condition applies
"property": [
"foaf_name"
],
// The type of filtering to apply; see table below for allowed values
"type": 'minimal_date',
// Depends on type of filtering selected; value to use for filtering
"value": "1600",
// Both the types `minimal_date` and `maximum_date` require a date format for parsing
"format": "YYYY-MM-DD"
}
]
},
// Apply a limit on the number of entities to obtain or -1 for no limit; optional field, defaults to '-1'
"limit": -1,
// Randomize the entities to obtain or not; optional field, defaults to 'false'
"random": false,
// A list of property paths to use for obtaining sample data; optional field
"properties": [
[
"foaf_name"
]
]
}
Filtering | Key | Value |
---|---|---|
Equal to | equals |
Yes |
Not equal to | not_equals |
Yes |
Has no value | empty |
No |
Has a value | not_empty |
No |
Contains | contains |
Yes (Use % as a wildcard) |
Does not contain | not_contains |
Yes (Use % as a wildcard) |
Minimal | minimal |
Yes (An integer) |
Maximum | maximum |
Yes (An integer) |
Minimal date | minimal_date |
Yes (Use YYYY-MM-DD) |
Maximum date | maximum_date |
Yes (Use YYYY-MM-DD) |
Minimal appearances | minimal_appearances |
Yes (An integer) |
Maximum appearances | maximum_appearances |
Yes (An integer) |
Linkset specs is a list of JSON objects that contain the configuration of the linksets to generate for a particular job.
{
// An integer as identifier
"id": 1,
// The label of the linkset
"label": "My linkset",
// A description of this linkset by the user; optional field
"description": "",
// Whether we would like to track progress in the GUI at the cost that matching might run longer; optional field, defaults to 'true'
"use_counter": true,
// The identifiers of entity-type selections to use as sources
"sources": [
1
],
// The identifiers of entity-type selections to use as targets
"targets": [
1
],
// The matching configuration for finding links; requires at least one condition
"methods": {
// Whether ALL conditions in this group should match ('and') or AT LEAST ONE condition in this group has to match ('or'); T-norms and s-norms are also allowed: see table below for allowed values
"type": "and",
// The threshold to apply on the similarity score; optional field, defaults to '0' which means it does not apply
"threshold": 0.8,
// The matching configuration is composed of a logic box
"conditions": [
{
// The main matching method to apply
"method": {
// The type of matching to apply; see table below for allowed values
"name": "soundex",
// Some types of matching methods require extra configuration
"config": {}
},
// The similarity matching to apply; see table below for allowed values; optional field
"sim_method": {
// The type of similarity matching to apply; see table below for allowed values
"name": "soundex",
// Some types of similarity matching methods require extra configuration
"config": {},
// Whether to apply the similarity matching method on the normalized value; optional field, defaults to 'false'
"normalized": false,
},
// Fuzzy matching configuration; optional field
"fuzzy": {
// The s-norm to apply on the values of this condition; see table below for allowed values; optional field, defaults to 'MAXIMUM_S_NorM'
"s_norm": "maximum_s_norm",
// The threshold to apply on the similarity score; optional field, defaults to '0' which means it does not apply
"threshold": 0
},
// Perform list matching; optional field
"list_matching": {
// The minimum number of intersections; optional field, defaults to '0' which means it does not apply
"threshold": 8,
// Whether the threshold number should be interpreted as a percentage; optional field, defaults to 'false'
"is_percentage": false
},
// Sources configuration
"sources": {
// The source properties to use during matching per entity-type selection
"properties": {
"1": [
{
// The property path to which this condition applies
"property": [
"schema_birthDate"
],
// Whether the transformers of this property should be applied before the source transformers; optional field, defaults to 'false'
"property_transformer_first": false,
// The transformers to apply to transform the value before matching; see table below for allowed values
"transformers": [
{
"name": "parse_date",
"parameters": {
"format": "YYYY-MM-DD"
}
}
]
}
],
},
// The transformers to apply to transform the source value before matching; see table below for allowed values
"transformers": []
},
// Targets configuration
"targets": {
// The target properties to use during matching per entity-type selection
"properties": {
"1": [
{
"property": [
"schema_birthDate"
],
"property_transformer_first": false,
"transformers": []
}
],
},
// The transformers to apply to transform the target value before matching; see table below for allowed values
"transformers": []
}
}
]
}
}
Matching method | Key | Accepts a similarity method | Is a similarity method | Values |
---|---|---|---|---|
Exact match | exact |
No | No | |
Intermediate dataset | intermediate |
No | No | entity_type_selection , intermediate_source , intermediate_target (Property paths) |
Levenshtein distance | levenshtein_distance |
No | Yes | max_distance |
Levenshtein normalized | levenshtein_normalized |
No | Yes | threshold |
Soundex | soundex |
Yes | No | size |
Gerrit Bloothooft | bloothooft |
Yes | No | name_type (First or last name: first_name , family_name ) |
Word Intersection | word_intersection |
No | Yes | ordered , approximate , stop_symbols , threshold |
Metaphone | metaphone |
Yes | No | max |
Double Metaphone | dmetaphone |
Yes | No | |
Trigram | trigram |
No | Yes | threshold |
Numbers Delta | numbers_delta |
No | No | type (Irrelevant, Source < Target, Target < Source: <> , < , > ), start , end |
Time Delta | time_delta |
No | No | type (Irrelevant, Source < Target, Target < Source: <> , < , > ), years , months , days , format |
Same Year/Month | same_year_month |
No | No | date_part (Year, month, or both: year , month , year_month ) |
Jaro | jaro |
No | Yes | threshold |
Jaro-Winkler | jaro_winkler |
No | Yes | threshold , prefix_weight |
Transformer | Key | Values |
---|---|---|
Transform 'last name first' format | transform_last_name_format |
infix |
Prefix | prefix |
prefix |
Suffix | suffix |
suffix |
Replace | replace |
from , to |
Unaccent | unaccent |
|
Regular expression replace | regexp_replace |
pattern , replacement , flags |
Lens specs is a list of JSON objects that contain the configuration of the lenses to apply on a combination of linksets.
{
// An integer as identifier
"id": 1,
// The label of the lens
"label": "My lens",
// A description of this lens by the user; optional field
"description": "",
// The lens configuration; requires groups consisting of two elements
"specs": {
// Lens type to apply; see table below for allowed values
"type": "union",
// The s-norm to apply on the values of this element; see table below for allowed values; optional field, defaults to 'MAXIMUM_S_NorM'
"s_norm": "",
// The threshold to apply on the similarity score; optional field, defaults to '0' which means it does not apply
"threshold": 0.8,
// The lens configuration is composed of a logic box
"elements": [
{
// The identifier of the linkset/lens to use
"id": 0,
// The type (linkset or lens)
"type": "linkset"
}
]
}
}
Lens type | Description |
---|---|
union | Union (A ∪ B) |
intersection | Intersection (A ∩ B) |
difference | Difference (A - B) |
sym_difference | Symmetric difference (A ∆ B) |
in_set_and | Source and target resources match |
in_set_or | Source or target resources match |
in_set_source | Source resources match |
in_set_target | Target resources match |
Views is a list of JSON objects that contain the properties and filters to examine a linkset or lens for a particular job.
{
// The id of the specification (linkset or lens) to which the view applies
"id": 1,
// The type of the specification (linkset or lens) to which the view applies
"type": "linkset",
// The property paths to use for obtaining data; optional field
"properties": [
{
// The identifier of the dataset of the properties
"dataset_id": "ufab7d657a250e3461361c982ce9b38f3816e0c4b__ecartico_20190805",
// The identifier of the collection of the properties for this dataset
"collection_id": "foaf_Person",
// The GraphQL interface of the Timbuctoo instance
"timbuctoo_graphql": "https://repository.goldenagents.org/v5/graphql",
// A list of property paths to use for this dataset
"properties": [
[
"foaf_name"
]
]
}
]
}
The entity-type selections (using the filter), the linkset specs (using the matching methods) and the lens specs (using the elements) all apply a logic box to allow the user the express complex conditions.
{
// The type that combines these elements (usually and/or, but can be of any type)
"type": "and",
// The list of elements; may contain other logic boxes (can have any JSON key)
"elements": []
}
As logic boxes may contain other logic boxes, complex conditions can be expressed.
{
"type": "and",
"conditions": [
{
"type": "or",
"conditions": [
{},
{},
{}
]
},
{
"type": "or",
"conditions": [
{
"type": "and",
"conditions": [
{}
]
},
{}
]
}
]
}
A property path is a list of values that expresses the path in the linked data from the entity to a specific property. The list has at least one value: the property to select on the entity. If the property is a reference to another entity, you have to specify another value in the list with the id of the entity it points to. Then you can select the specific property on the referenced entity. If this is again a reference to another entity, the cycle repeats itself until you reach the required property.
[
"property",
"entity",
"property",
"entity",
"property"
]
If you want the reference as a value, rather then selecting a property on the referenced entity, there is a special
value __value__
that you can use.
[
// Get the name of a person: select the property 'foaf_name'
[
"foaf_name"
],
// Get the name of a parent of a person: follow the property 'schema_parent' to the parent entity and select the property 'foaf_name'
[
"schema_parent",
"foaf_Person",
"foaf_name"
],
// Get the name of a grandparent of a person: follow the property 'schema_parent' to the parent entity, then follow that property again and then select the property 'foaf_name'
[
"schema_parent",
"foaf_Person",
"schema_parent",
"foaf_Person",
"foaf_name"
],
// Get the reference of the parent of a person (the uri of this parent): follow the property 'schema_parent' and use the special value '__value__'
[
"schema_parent",
"__value__"
]
]
The configuration mentions both t-norms (conjuction / and) and s-norms (disjunction / or) that can be used to configure how the similarity score is computed:
T-norm | Key |
---|---|
Minimum t-norm (⊤min) | minimum_t_norm |
Product t-norm (⊤prod) | product_t_norm |
Łukasiewicz t-norm (⊤Luk) | lukasiewicz_t_norm |
Drastic t-norm (⊤D) | drastic_t_norm |
Nilpotent minimum (⊤nM) | nilpotent_minimum |
Hamacher product (⊤H0) | hamacher_product |
S-norm | Key |
---|---|
Maximum s-norm (⊥max) | maximum_s_norm |
Probabilistic sum (⊥sum) | probabilistic_sum |
Bounded sum (⊥Luk) | bounded_sum |
Drastic s-norm (⊥D) | drastic_s_norm |
Nilpotent maximum (⊥nM) | nilpotent_maximum |
Einstein sum (⊥H2) | einstein_sum |