Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
FIX: Allow multiple inlined image data links in html clean
Add a lazy quantifier in the regex `_find_image_dataurls` to match as few characters as possible, to make it stop at the first occurence of `;base64,` e.g. ```py >>> _find_image_dataurls = re.compile(r'data:image/(.+);base64,', re.I).findall >>> _find_image_dataurls('<div style="background: url(data:image/jpeg;base64,foo); background-image: url(data:image/jpeg;base64,foo);"></div>') ['jpeg;base64,foo); background-image: url(data:image/jpeg'] ``` ```py >>> _find_image_dataurls = re.compile(r'data:image/(.+?);base64,', re.I).findall >>> _find_image_dataurls('<div style="background: url(data:image/jpeg;base64,foo); background-image: url(data:image/jpeg;base64,foo);"></div>') ['jpeg', 'jpeg'] ``` This allows to have multiple image data links on the same line, which happens for instance in inline styles. Without this change, `_has_javascript_scheme` returns `True` because the count of safe image urls is lower than the number of possible malicious scheme. Then, the whole style is dropped as considered malicious. Co-authored-by: Christophe Simonis <chs@odoo.com>
- Loading branch information