-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
263 lines (234 loc) · 12.3 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
<!DOCTYPE HTML>
<html><meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Raleway">
<script src="assets/js/fontawesone.6.5.1.js" type="text/javascript"></script>
<link rel="stylesheet" type="text/css" href="assets/css/w3.css">
<link rel="stylesheet" type="text/css" href="assets/css/main.css">
<script src="assets/js/jquery.min.3.7.1.js" type="text/javascript"></script>
<script src="js/main.js" type="text/javascript"></script>
<script type="application/javascript">
var doNotTrack = false;
if (!doNotTrack) {
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-178921838-1', 'auto');
ga('send', 'pageview');
}
</script>
<body class="w3-content" style="max-width:1000px">
<div class="w3-container hj-padding">
<div>
<h1 class="w3-center w3-margin-top" id="photo-slam"><b>Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras</b></h1>
<h4 class="w3-center w3-margin-top" >CVPR 2024</h4>
<div class="w3-center">
<p><em><a href="https://huajianup.github.io" style="border-bottom: 1px dotted black;">Huajian Huang<sup>1</sup></a></em>, Longwei Li<sup>2</sup>, Hui Cheng<sup>2</sup>
and
<em><a href="http://www.saikit.org" style="border-bottom: 1px dotted black;">Sai-Kit Yeung<sup>1</sup></a></em>
</p>
<p>The Hong Kong University of Science and Technology<sup>1</sup>, Sun Yat-Sen University<sup>2</sup></p>
</div>
<div class="link w3-center pub">
<a class="btn btn-primary outline big-outline" href="https://arxiv.org/abs/2311.16728" target="_blank"><i class="fa fa-file-pdf-o" style="margin-right: 5px"></i>Paper</a>
<a class="btn btn-primary outline big-outline" href="https://github.com/HuajianUP/Photo-SLAM" target="_blank"><i class="fa fa-github" style="margin-right: 5px"></i>Code</a>
<a class="btn btn-primary outline big-outline" href="https://youtu.be/hLal4fhvMrk" target="_blank"><i class="fa fa-video" style="margin-right: 5px"></i>Video</a>
<a class="btn btn-primary outline big-outline" href="img/poster_8906.png" target="_blank"><i class="fa fa-signs-post" style="margin-right: 5px"></i>Poster</a>
<a class="btn btn-primary outline big-outline" href="img/poster_ICRA.jpg" target="_blank"><i class="fa fa-signs-post" style="margin-right: 5px"></i>Poster2</a>
</div>
</div>
<div class="w3-container hj-padding">
<img style="width:100% ;margin-top: auto"; src="Photo-SLAM_v2.gif" ></img>
</div>
<div class="w3-container hj-padding" >
<div class="w3-container hj-padding plate" id="abstract">
<h3>Abstract</h3>
</div>
<h6 class="abstract-text">
The integration of neural rendering and the SLAM system recently showed promising results in joint
localization and photorealistic view reconstruction. However, existing methods, fully relying on implicit representations, are so resource-hungry
that they cannot run on portable devices, which deviates from the original intention of SLAM. In this paper, we present Photo-SLAM, a novel SLAM
framework with a hyper primitives map. Specifically, we simultaneously exploit explicit geometric features for localization and learn implicit
photometric features to represent the texture information of the observed environment. In addition to actively densifying hyper primitives based
on geometric features, we further introduce a Gaussian-Pyramid-based training method to progressively learn multi-level features, enhancing
photorealistic mapping performance. The extensive experiments with monocular, stereo, and RGB-D datasets prove that our proposed system
Photo-SLAM significantly outperforms current state-of-the-art SLAM systems for online photorealistic mapping, e.g., PSNR is 30\% higher and
rendering speed is hundreds of times faster in the Replica dataset. Moreover, the Photo-SLAM can run at real-time speed using an embedded platform
such as Jetson AGX Orin, showing the potential of robotics applications.
</h6>
<img src="img/taxonomy.jpg" ></img>
</div>
<div class="w3-container hj-padding" >
<div class="w3-container hj-padding plate" id="how-it-works">
<h3>How it works</h3>
</div>
<!--which capitalizes on explicit geometric feature points for accurate and efficient localization
while leveraging implicit representations to capture and model the texture information.-->
<div class="how-it-work-style">
<details>
<summary>1. Introduce the concept of hyper primitives map.</summary>
<div class="detail-content">
Hyper primitives are defined as a set of point clouds associated with ORB features, rotation, scaling, ensity,
and spherical harmonic (SH) coefficients. The hyper primitives map allows the system to efficiently optimize
tracking using a factor graph solver and learn the photorealistic mapping by neural solver.
<br>
<img src="img/overview.jpg" ></img>
</div>
</details>
<details>
<summary>2. Geometry-based densification.</summary>
<div class="detail-content">
We argue that 2D geometric feature points spatially distributed in the frames essentially represent the region
with a complex texture that requires more hyper primitives. However, less than 30% of 2D geometric feature points
of frames are active and have corresponding 3D points, especially for non-RGB-D scenarios. Therefore, we actively
create additional temporary hyper primitives based on the inactive 2D feature points.
<br>
<img src="img/Geo.jpg" ></img>
</div>
</details>
<!-- <li>Gaussian-pyramid-based learning, a new progressive training method. </li>-->
<details>
<summary>3. Gaussian-pyramid-based learning, a new progressive training method.</summary>
<div class="detail-content">
Progressive training is a widely used technology in neural rendering to accelerate the optimization process.
Some methods have been proposed to reduce training time while achieving better rendering quality.
To enhance performance with efficient multi-level features learning online, we propose Gaussian-pyramid-based learning.
At the beginning training step, the hyper primitives are supervised by the highest level of the pyramid, i.e. level n.
As training iteration increases, we not only densify hyper primitives but also reduce the pyramid level and obtain a
new ground truth until reaching the bottom of the Gaussian pyramid.
<br>
<img src="img/GP_learning.jpg" ></img>
</div>
</details>
</div>
</div>
<div class="w3-container hj-padding" >
<div class="w3-container hj-padding plate" id="result">
<h3>Results</h3>
</div>
<div class="w3-padding-16">
<h6 class="w3-center result-text">Comparisons</h6>
<div class="results-carousel">
<ul class="slides">
<input type="radio" id="res1" checked name="group1" onclick="_change_active_dot(0)" />
<li class="slide-container">
<div class="slide-image">
<video poster="" autoplay controls muted loop playsinline height="100%">
<source src="video/Office4_Ours_Vs_Nice.m4v" type="video/mp4">
</video>
</div>
<div class="carousel-controls">
<label for="res3" class="prev-slide">
<span><i class="fa-solid fa-circle-chevron-left"></i></span>
</label>
<label for="res2" class="next-slide">
<span><i class="fa-solid fa-circle-chevron-right"></i></span>
</label>
</div>
</li>
<input type="radio" id="res2" name="group1" onclick="_change_active_dot(1)"/>
<li class="slide-container">
<div class="slide-image">
<video poster="" autoplay controls muted loop playsinline height="100%">
<source src="video/Room0_Ours_Vs_Co.m4v" type="video/mp4">
</video>
</div>
<div class="carousel-controls">
<label for="res1" class="prev-slide">
<span><i class="fa-solid fa-circle-chevron-left"></i></span>
</label>
<label for="res3" class="next-slide">
<span><i class="fa-solid fa-circle-chevron-right"></i></span>
</label>
</div>
</li>
<input type="radio" id="res3" name="group1" onclick="_change_active_dot(2)"/>
<li class="slide-container">
<div class="slide-image">
<video poster="" autoplay controls muted loop playsinline height="100%">
<source src="video/Room1_Ours_Vs_Eslam.mp4" type="video/mp4">
</video>
</div>
<div class="carousel-controls">
<label for="res2" class="prev-slide">
<span><i class="fa-solid fa-circle-chevron-left"></i></span>
</label>
<label for="res1" class="next-slide">
<span><i class="fa-solid fa-circle-chevron-right"></i></span>
</label>
</div>
</li>
<div class="carousel-dots">
<label for="res1" class="carousel-dot active" id="img-dot-0"></label>
<label for="res2" class="carousel-dot" id="img-dot-1"></label>
<label for="res3" class="carousel-dot" id="img-dot-2"></label>
</div>
</ul>
</div>
<h6 class="w3-center result-text">Live demos (No Speedup)</h6>
<div class="results-carousel">
<ul class="slides">
<input type="radio" id="res4" checked name="group2" onclick="_change_active_demo_dot(0)"/>
<li class="slide-container">
<div class="slide-image">
<video poster="" autoplay controls muted loop playsinline height="100%">
<source src="video/live.m4v" type="video/mp4">
</video>
</div>
<div class="carousel-controls">
<label for="res5" class="prev-slide">
<span><i class="fa-solid fa-circle-chevron-left"></i></span>
</label>
<label for="res5" class="next-slide">
<span><i class="fa-solid fa-circle-chevron-right"></i></span>
</label>
</div>
</li>
<input type="radio" id="res5" name="group2" onclick="_change_active_demo_dot(1)"/>
<li class="slide-container">
<div class="slide-image">
<video poster="" autoplay controls muted loop playsinline height="100%">
<source src="video/live_Office0_ours.mp4" type="video/mp4">
</video>
</div>
<div class="carousel-controls">
<label for="res4" class="prev-slide">
<span><i class="fa-solid fa-circle-chevron-left"></i></span>
</label>
<label for="res4" class="next-slide">
<span><i class="fa-solid fa-circle-chevron-right"></i></span>
</label>
</div>
</li>
<div class="carousel-dots">
<label for="res4" class="carousel-dot active" id="img-dot-3"></label>
<label for="res5" class="carousel-dot" id="img-dot-4"></label>
</div>
</ul>
</div>
</div>
</div>
<div class="w3-container hj-padding" id="citation">
<h3>Citation</h3>
<pre class="citation-code"><code><span>@inproceedings</span>{hhuang2024photoslam,
title = {Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras},
author = {Huang, Huajian and Li, Longwei and Cheng Hui and Yeung, Sai-Kit},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2024}
}</code></pre>
</div>
<div class="w3-container hj-padding" id="concurrent">
<h3>Concurrent Works using 3D Gaussian Splatting</h3>
<!-- <h6>Concurrent works using 3D Gaussian Splatting:</h6>-->
<ul>
<li><a href="https://github.com/muskie82/MonoGS">Gaussian Splatting SLAM (CVPR 2024)</a></li>
<li><a href="https://arxiv.org/abs/2311.11700"> GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting (CVPR 2024)</a></li>
<li><a href="https://github.com/spla-tam/SplaTAM"> SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM (CVPR 2024)</a></li>
</ul>
</div>
</div>
</body>
</html>