Dataset

Code for the TACL 2024 paper: Staniek et al, "Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap"

Live Demo: https://overpassnl.schumann.pub/

Dataset

The main dataset is found in the following files:

dataset.{train,dev,test}.nl
dataset.{train,dev,test}.query
dataset.{train,dev,test}.bbox

where .nl are the natural language inputs, .query are the Overpass queries and .bbox is used during evaluation for queries that use the {{bbox}} variable/shortcut. This ensures that they are evaluated in an area where the gold query returns results.

The following files are used to determine the difficulty of evaluation instances (Figure 5 in paper):

dataset.{dev,test}.difficulty_{len_nl, len_query, num_results, train_sim_nl train_sim_oqo, xml_components}

where train_sim_oqo is used to determine the 333 hard instances in dataset.{dev,test}.hard.{nl,query}.

Evaluation

Download the exact OpenStreetMap database we used for the evaluation in the paper here [10 parts, total 306 GB]
Unzip the files (381 GB unziped) such that the folder structure is: evaluation/overpass_clone_db/db
Install docker and docker-compose and start the container with the Overpass API.

cd evaluation/docker
docker-compose up

This will start the Overpass API as a docker service and expose it at http://localhost:12346/api
If you see permission or file not found errors for db/osm3s_v0.7.57_areas or db/osm3s_v0.7.57_osm_base be sure to set correct execution permission to those files
You can test the Overpass API with:

curl -g 'http://localhost:12346/api/interpreter?data=[out:json];area[name="London"];out;'

If this returns an appropriate json output, you are set for the evaluation.

cd evaluation
pip install -r requirements.txt
python run_evaluation.py --ref_file ../dataset/dataset.dev --model_output_file ../models/outputs/byt5-base_com08/evaluation/preds_dev_beams4_com08_byt5_base.txt

This will take around 5 hours depending on how many query results were cached in previous runs. Be sure to change the default arguments in the evaluation script if you use a different port for the Overpass API.
The evaluation results will be written to results_execution...txt and results_oqs...txt in the same dir as the model_output_file.

OverpassT5

Download the model config and weights here and place the files into models/outputs/byt5-base_com08/
Then run the following commands to generate the output queries.

cd evaluation
pip install -r requirements.txt
python inference_t5.py --exp_name com08 --model_name byt5-base --data_dir ../dataset --num_beams 4 --splits dev test

Finetuning

To finetune your own model use the train_t5.py script.

cd evaluation
pip install -r requirements.txt
python train_t5.py --exp_name default --data_dir ../dataset --model_name google/byt5-base  --gradient_accumulation_steps 4
python inference_t5.py --exp_name default --model_name byt5-base --data_dir ../dataset --num_beams 4

Python

Recommended Python version is 3.10 for all scripts.

References

The demo front-end is a fork of https://github.com/rowheat02/osm-gpt
We thank the https://overpass-turbo.eu/ community and Martin Raifer

Citation

Please cite the following paper:

@article {staniek-2023-overpassnl,
 title = "Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap",
 author = "Michael Staniek and Raphael Schumann and Maike Züfle and Stefan Riezler",
 year = "2023",
 publisher = "arXiv",
 eprint = "2308.16060" 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataset

Evaluation

OverpassT5

Finetuning

Python

References

Citation

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dataset		dataset
demo		demo
evaluation		evaluation
models		models
README.md		README.md

raphael-sch/OverpassNL

Folders and files

Latest commit

History

Repository files navigation

Dataset

Evaluation

OverpassT5

Finetuning

Python

References

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages