Metadata not in head but in the body #37

ThePavolC · 2021-01-07T17:57:44Z

Hi,

I am having an issue with getting the metadata using opengraph_py3, urllib and bs4.

In parser method you are only checking the <head> but it looks like <meta> tags are sometimes in the body. Any ideas how can I fix this ? Is it due to the UserAgent ?

urllib3 1.23
opengraph-py3 0.71
beautifulsoup4 4.6.0

import re
import opengraph_py3 as opengraph
import urllib
from bs4 import BeautifulSoup

raw = urllib.request.FancyURLopener().open("https://youtu.be/DQwU_kU4pUg")
html = raw.read()
soap = BeautifulSoup(html, 'html.parser')

# This is the same code as in `parser`
soap.html.head.findAll(property=re.compile(r'^og'))
# []

soap.html.body.findAll(property=re.compile(r'^og'))
# [<meta content="YouTube" property="og:site_na....]

The text was updated successfully, but these errors were encountered:

fumiya5863 mentioned this issue Feb 3, 2022

Make it possible to specify the parser for BeautifulSoup4 #39

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata not in head but in the body #37

Metadata not in head but in the body #37

ThePavolC commented Jan 7, 2021 •

edited

Loading

Metadata not in head but in the body #37

Metadata not in head but in the body #37

Comments

ThePavolC commented Jan 7, 2021 • edited Loading

ThePavolC commented Jan 7, 2021 •

edited

Loading