GitHub - stalkerg/python-readability: Get from the page the essence! by Python3

This code is under the Apache License 2.0. http://www.apache.org/licenses/LICENSE-2.0

This is Python3 fork of https://github.com/buriy/python-readability and https://github.com/ftzeng/python-readability (some python3 support). I support only Python3 and drop Python2.x. This is not only Python3 fork. I added new features and some fixing like "lead" or "main_image_url".

Installation:

pip install git+https://github.com/stalkerg/python-readability

Usage:

from readability.readability import Document
import urllib.request

html = urllib.request.urlopen(url).read()
doc = Document(html)
doc.parse(["summary", "short_title"])
readable_article = doc.summary()
readable_title = doc.short_title()

Document() _init_ arguments:

input: input html as text
base_url: will allow adjusting links to be absolute
debug: output debug messages
min_text_length: minimum text size
retry_length: acceptable length of the text
positive_keywords: the list of positive search patterns in classes and ids, for example: ["news-item", "block"]
negative_keywords: the list of negative search patterns in classes and ids, for example: ["mysidebar", "related", "ads"]

Document() parse arguments:

params_list: list params for parse. Accept variants: ["content", "title", "short_title", "summary", "lead", "first_image_url", "main_image_url"]
html_partial: if True make html without html/body tags.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
readability		readability
test_by_urls		test_by_urls
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
setup.py		setup.py
test_by_urls.py		test_by_urls.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

stalkerg/python-readability

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages