Paper | Video | Project Page
If you find our work useful, please consider citing:
@InProceedings{Taioli_2023_ICCV,
author = {Taioli, Francesco and Cunico, Federico and Girella, Federico and Bologna, Riccardo and Farinelli, Alessandro and Cristani, Marco},
title = {{Language-Enhanced RNR-Map: Querying Renderable Neural Radiance Field Maps with Natural Language}},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {October},
year = {2023},
pages = {4669-4674}
}
- Python version: 3.8
- Habitat-lab: 0.2.1
- Habitat-sim: 0.2.1
- Torch:
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
- Clip:
pip install git+https://github.com/openai/CLIP.git
For the habitat setup visit the page https://github.com/facebookresearch/habitat-lab/tree/v0.2.1#Gibson
- For the Gibson scene dataset, containing the .glb 3D models, refer to https://github.com/facebookresearch/habitat-lab/tree/v0.2.1#data
- For the task dataset, containing the navigation episodes, refer to https://github.com/devendrachaplot/Object-Goal-Navigation
├── data
│ └── gibson
│ └── scene_dataset
│ └── gibson_habitat
├── pretrained
├── src
│ └── model
│ └── autoencoder
│ └── GSN
├── heat_maps
│ ├── multi_search
│ └── single_search
├── images
│ ├── map_generation
│ └── query_map
├── lernr_maps
├── lernr
├── requirements.txt
├── evaluation.py
├── map_creation.py
├── query_map.py
└── README.md
data
: It contains the dataset,pretrained
: It contains the pretreined autoencoder models,src
: It contains the encoding and decoding processes,heatmaps
: It contains the output heatmaps for the single/multi search,lernr_map
: It contains the lernr maps as dictionary,images
: It contains the video generated by the navigation/querying,lernr
: It contains utils files,evaluation.py
: It performs an evaluation on the maps contained in the lernr_maps folder,map_creation.py
: It generates lernr maps,query_map.py
: It queries the maps on single or multi search.
Download lernr_map
from here and put it in the lernr_map folder.
To download the pretrained
ckpt, download the ckpt from here and put them inside the pretrained
folder.
-
Example:
python map_creation.py --scene="Wiconisco" --n_goals=25 --img_res=128 --map_size=128
-
scene: Name of the scene (default: Cantwell)
-
n_goals: Number of goal points to reach (default: 25)
-
img_res: Input img resolution entering the encoder (default: 128)
-
map_size: Lernr_map's resolution (default: 128)
-
make_video: If present, a video of the navigation is generated (default: false)
-
Outputs:
- Single object search:
- For each different word queries a heat map is saved.
- Multi object search:
- A heat map with different locations of the given query word is saved.
- Single object search:
-
Example
- Single object search:
python query_map.py --scene="Wiconisco" --query_words="toilet, couch" --negative_prompt="objects, textures" --make_video --smooth
- Multi object search:
python query_map.py --scene="Cantwell" --query_words="Window" --make_video --smooth --erased_area=500 --th=0.6 --multi_search
- Single object search:
-
scene: Name of the map (default: "Cantwell")
-
query_words: Objects to search (default: "Window")
-
negative_prompt: List of negative prompt (default: "stuff, things, objects, textures")
-
make_video: If present a video of the navigation is generated (default: False)
-
smooth: If present, the video generated is smooth (default: False)
-
multi_object: if present, a multi object search is performed, otherwise a sigle object search is made
-
erased_area: Area to delete, in order to perform multi object search (default: 500)
-
th: Treshold used in the multi object search to stop the navigation (default: 0.6)
@InProceedings{Kwon_2023_CVPR,
author = {Kwon, Obin and Park, Jeongho and Oh, Songhwai},
title = {Renderable Neural Radiance Map for Visual Navigation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {9099-9108}
}