Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grab site is not actually compatible with python 3.8 #229

Open
cenodis opened this issue May 28, 2023 · 2 comments
Open

Grab site is not actually compatible with python 3.8 #229

cenodis opened this issue May 28, 2023 · 2 comments
Assignees

Comments

@cenodis
Copy link

cenodis commented May 28, 2023

Update: These problems only happen with python 3.8. Using grab-site with python 3.7.16 fixes these problems on the same system.

I have recently upgraded my system to Ubuntu LTS 22.04.2.
Grab-site now shows a few messages on the console relating to manhole as well as a warning about a HTTP session. I do not remember either of those appearing before the update. After these messages it outputs nothing. Similarly, no active scrape is shown on the gs-serv dashboard.

Looking at the filesystem it seems what wpull is still running and writing to the warc file. But no progress is visible on the console or gs-serv.

I have already tried resetting the python venv and reinstalling grab-site and its dependencies, following the exact instructions in the README. This did not fix the problem.

grab-site output

Manhole[202440:1685302622.5931]: Patched <built-in function fork> and <built-in function forkpty>.
Manhole[202440:1685302622.5941]: Manhole UDS path: /tmp/manhole-202440
Manhole[202440:1685302622.5941]: Waiting for new connection (in pid:202440) ...
/home/ubuntu/gs-venv/lib/python3.8/site-packages/wpull/protocol/http/client.py:185: UserWarning: HTTP session did not complete.
  warnings.warn(_('HTTP session did not complete.'))

gs-serv output

grab-site server listening on 0.0.0.0:29000
dropping connection to peer tcp4:127.0.0.1:32986 with abort=False: None
tcp4:127.0.0.1:32986 disconnected
tcp4:127.0.0.1:33026 connected
tcp4:127.0.0.1:33026 is dashboarding with Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36
dropping connection to peer tcp4:127.0.0.1:32996 with abort=False: None
tcp4:127.0.0.1:32996 disconnected
dropping connection to peer tcp4:127.0.0.1:33012 with abort=True: WebSocket opening handshake timeout (peer did not finish the opening handshake in time)
tcp4:127.0.0.1:33012 disconnected

@TheTechRobo
Copy link
Contributor

I think I've seen this issue once I upgraded to Debian Bullseye. I fixed it by using a docker container: https://github.com/Nold360/docker-grab-site

@cenodis
Copy link
Author

cenodis commented Jun 2, 2023

After a bit of experimentation I found out that the problem is python 3.8. Grab-site works perfectly fine with python 3.7.16. This contradicts the README which claims compatability with 3.7 and 3.8. A simple "fix" would be to update the installation instructions to fall back a major version. It already uses pyenv anyway.

@cenodis cenodis changed the title Grab site shows no progress output Grab site is not actually compatible with python 3.8 Jun 2, 2023
@HeliosLHC HeliosLHC self-assigned this Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants