March 2024 Progress #26

AmitMY · 2024-03-14T13:30:33Z

We now use the improved pose-to-video based on diffusion models.

We start with a paragraph in German, translate it to German Sign Language:

Das Alte Museum wurde 1830 als erstes öffentliches Museum in Berlin eröffnet.
Im Obergeschoss können Sie bei einem großartigen Ausblick über den Lustgarten später mehr über die Geschichte des Museums und seinen Architekten Karl Friedrich Schinkel erfahren.

The simple glossing gives:

[("das", "der"), ("alte", "alter"), ("museum", "museum"), ("wurde", "werden"), ("1830", "1830"), ("als", "als"), ("erstes", "erster"), ("öffentliches", "öffentlich"), ("museum", "museum"), ("in", "in"), ("berlin", "berlin"), ("eröffnet", "eröffnet"), (".", "."), ("im", "im"), ("obergeschoss", "obergeschoss"), ("können", "können"), ("sie", "sie|sie"), ("bei", "bei"), ("einem", "ein"), ("großartigen", "großartig"), ("ausblick", "ausblick"), ("über", "über"), ("den", "der"), ("lustgarten", "lustgarten"), ("später", "spät"), ("mehr", "mehr"), ("über", "über"), ("die", "der"), ("geschichte", "geschichte"), ("des", "der"), ("museums", "museum"), ("und", "und"), ("seinen", "sein"), ("architekten", "architekt"), ("karl", "karl"), ("friedrich", "friedrich"), ("schinkel", "schinkel"), ("erfahren", "erfahren"), (".", ".")]

The current system gives:

TODO

We choose to focus on one issue - visual inconsistency between signs.
After adding pose anonymization in 0072c52 the output is:

dgs-example.mp4

We note that the database lookup time was 9 seconds, this was optimized to 1-2 seconds, and could be improved further.

We recognize that sentences should be split. This will affect both the database search (only search up to one sentence) and in the video, lower and raise hands without cropping on sentence boundary. (Possibly, generate every sentence independently, then join them)

The text was updated successfully, but these errors were encountered:

AmitMY · 2024-09-11T06:41:42Z

Some notes:

The system mainly fails here on numbers (1830), and named entities (Karl Friedrich Schinkel) (which with some modifications, it could spell out).
It also did not make the sentence boundary clear, and basically ignored the punctuation (also, fixable).
In my opinion maybe the biggest problem here to address, is that the signing is performed in the spoken language word order. It is comprehensible, but not really sign language.
The smoothing between signs is too simplistic (can be easily seen in the skeleton video), and can be fixed.
The video quality is not the best. The generated interpreter has some artifacts even if the pose sequence was perfect. Not easily fixable.
The video is quite slow. Further work can be done to make the signing faster and tighter, decreasing the number of frames.

AmitMY mentioned this issue Mar 15, 2024

Question about sign-language-processing/spoken-to-signed-translation/issues/26 sign-language-processing/pose-to-video#2

Closed

sign-language-processing deleted a comment from florianbaer Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

March 2024 Progress #26

March 2024 Progress #26

AmitMY commented Mar 14, 2024 •

edited

Loading

AmitMY commented Sep 11, 2024

March 2024 Progress #26

March 2024 Progress #26

Comments

AmitMY commented Mar 14, 2024 • edited Loading

AmitMY commented Sep 11, 2024

AmitMY commented Mar 14, 2024 •

edited

Loading