Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No URLs found #57

Open
exportio opened this issue Mar 21, 2019 · 7 comments
Open

No URLs found #57

exportio opened this issue Mar 21, 2019 · 7 comments

Comments

@exportio
Copy link

exportio commented Mar 21, 2019

Number of found URL : 1
Number of links crawled : 1

python main.py --domain https://www.domain.com --output sitemap.xml --report

<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

</urlset>

@exportio exportio changed the title ERROR:root:Output file not available No URLs found Mar 21, 2019
@c4software
Copy link
Owner

Hi,

Interesting… I don't have the problem here. What is your python version ?

Capture d’écran 2019-03-21 à 09 25 18

@c4software
Copy link
Owner

@samboustani Problem still present ?

@GovetaXV
Copy link

Same problem here.

@GovetaXV
Copy link

try this url: https://paperarchive.space/

@c4software
Copy link
Owner

@GovetaXV Hi,

Thanks for the link. Unfortunately the current version of python-sitemap doesn't support « full javascript » website, this is why the paperarchive.space doesn't work.

Sorry

@ishannaktode
Copy link

+1
Same issue
No error log

@mgifford
Copy link

This looked pretty hopeful, but didn't work for me either. This isn't a full headless site by any means.

$ python3 main.py --domain https://canada.ca --output sitemap.xml --report
Number of found URL : 1
Number of links crawled : 1
Mikes-MBP-3:python-sitemap mikegifford$ cat sitemap.xml 
<?xml version="1.0" encoding="UTF-8"?>
<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

</urlset>

But maybe this helps.

$ python3 main.py --domain https://canada.ca --output sitemap.xml --debug
INFO:root:Start the crawling process
INFO:root:Crawling #0: https://canada.ca
DEBUG:root:https://canada.ca ==> <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)>
INFO:root:Crawling has reached end of all found links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants