NAME

Goodscrapes - Goodreads.com HTML-API

VERSION

Updated: 2022-01-21
Since: 2014-11-05

COMPARED TO THE OFFICIAL API

focuses on analysing, not updating info on GR
less limited, e.g., reading shelves and reviews of other members: Goodscrapes can scrape thousands of fulltext reviews.
official is slow too; API users are even second-class citizen
theoretically this library is more likely to break, but Goodreads progresses very very slowly: nothing actually broke between 2019-2014 (I started this); actually their API seems to change more often than their web pages; they can and do disable API functions without being noticed by the majority, but they cannot easily disable important webpages that we use too; There are unit-tests to detect markup changes on the scraped Goodreads.com website.
this library grew with every new usecase and program; it retries operations on errors on Goodreads.com, which are not seldom (over capacity, exceptions etc); it saw a lot of flawed data such as wrong review dates ("Jan 01, 1010"), which broke Time::Piece.
Goodreads "isn't eating its own dog food" https://www.goodreads.com/topic/show/18536888-is-the-public-api-maintained-at-all#comment_number_1

LIMITATIONS

slow: version with concurrent AnyEvent::HTTP requests was marginally faster, so I sticked with simpler code; doesn't actually matter due to Amazon's and Goodreads' request throttling. You can only speed things up significantly with a pool of work-sharing computers and unique IP addresses...
just text pattern matching, no ECMAScript execution and DOM parsing with a headless renderer (so far sufficient and faster). Regex is not meant for HTML parsing and a HTML parser would had been easier from time to time, I would use one today. However, regular expressions proved good enough for goodreads.com, given that user generated content is very restricted and cannot easily confuse the regex patterns. The Regex code is small too. We just look at the server response as text with some features which mark the start and end of a value of interest.

HOW TO USE

for real-world usage examples see Andre's Goodreads Toolbox. There are unit tests in the "t" directory, too. Tests are good (up-to-date) tutorials and might help comprehending the yet terse API documentation.
_ prefix means private function or constant (use in module only)
ra prefix means array reference, rh prefix means hash reference
on prefix or fn suffix means function variable
constants are uppercase, functions lowercase
Goodscrapes code in your program is usually recognizable by the 'g' or 'GOOD' prefix in the function or constant name
common internal abbreviations: pfn = progress function, bfn = book handler function, pag = page number, nam = name, au = author, bk = book, uid = user id, bid = book id, aid = author id, rat = rating, tit = title, q = query string, slf = shelf name, shv = shelves names, t0 = start time of an operation, ret = return code, tmp = temporary helper variable, gp = group, gid = group id, us = user

AUTHOR

https://github.com/andre-st/

DATA STRUCTURES

Note

never cast 'id' to int or use %d format string, despite digits only, compare as strings
don't expect all attributes set (undef), this depends on the available info on the scraped page

%book

id => string
title => string
isbn => string
isbn13 => string
num_pages => int
num_reviews => int
num_ratings => int 103 for example
avg_rating => float 4.36 for example, 0 if no rating
stars => int rounded avg_rating, e.g., 4
format => string (binding)
user_rating => int number of stars 1,2,3,4 or 5 (program user)
user_read_count => int (program user)
user_num_owned => int (program user)
user_date_read => Time::Piece (program user)
user_date_added => Time::Piece (program user)
ra_user_shelves => string[] reference
url => string
img_url => string
review_id => string
year => int (original publishing date)
year_edit => int (edition publishing date)
rh_author => %user reference

%user

id => string
name => string "Firstname Lastname"
name_lf => string "Lastname, Firstname"
residence => string (might require login)
age => int (might require login)
num_books => int books shelfed, not books written (even if is_author == 1)
is_friend => bool
is_author => bool
is_female => bool
is_private => bool
is_staff => bool true if user is a Goodreads.com employee
is_mainstream => bool currently, guessed from number of ratings for any book, is_author == 1
url => string URL to the user's profile page
works_url => string URL to the author's distinct works (is_author == 1)
img_url => string
user_min_rating => int requires is_author == 1
user_max_rating => int requires is_author == 1
user_avg_rating => float 3.3 for example (user of the program), requires is_author == 1, value depends on the shelves involved
_seen => int incremented if user already exists in a load-target structure

%review

id => string
rh_user => %user reference
book_id => string
rating => int with 0 meaning no rating, "added" or "marked it as abandoned" or something similar
rating_str => string represention of rating, e.g., 3/5 as "[*** ]" or "[TTT ]" if there's additional text, or "[ttt ]" if not longer than 160 chars
text => string
date => Time::Piece
url => string full text review

%group

id => string
name => string
url => string
img_url => string
num_members => int

%comment

text => string
rh_to_user => %user reference, addressed user
rh_review => %review reference, addressed review, undefined if not comment on a review (but group, another user's status, book list, ...)
rh_book => %book reference, undefined if rh_review is undefined and vice versa

PUBLIC ROUTINES

`string` gverifyuser( $user_id_to_verify )

returns a sanitized, valid Goodreads user id or kills the current process with an error message

`string` gverifyshelf( $name_to_verify )

returns the given shelf name if valid
returns a shelf which includes all books if no name given
kills the current process with an error message if name is malformed

`bool` gisbaduser( $user_or_author_id )

returns true if the given user or author is blacklisted and would slow down any analysis

`sub` gmeter( $unit_str = '' )

generates and returns a CLI progress indicator function $f, with $f->( 20 ) adding 20 to the last values and printing the sum like "40 unit_str". Given a second (max value) argument $f->( 10, 100 ), it will print a percentage without any unit: "10%". Given a modern terminal, the text remains at the same position if the progress function is called multiple times.

`void` glogin({ ... })

some Goodreads.com pages are only accessible by authenticated members
some Goodreads.com pages are optimized for authenticated members (e.g. get 200 books vs 30 books per request)
usermail => string
userpass => string
r_userid => string ref set user ID if variable is empty/undef [optional]

`void` gsetopt({ ... })

change one or multiple library-scope parameters
ignore_errors => bool disables retries for [ERROR] and [CRIT] with the process just keep going with the next step
maxretries => int sets number of retries when there is an error, critical issues are retried indefinitely (if ignore_errors is false)
retrydelay_secs => int
cache_days => int sets the number of days that a resource can be loaded from the local storage. Scraping Goodreads.com is a very slow process; scraped documents can be cached if you don't need them "fresh" during development time or long running sessions (cheap recovery on crash, power blackout or pauses), or when experimenting with parameters

`%book` greadbook( $book_id )

`%user` greaduser( $user_id, $prefer_author = 0 )

there can be a different user and author with the same ID (2456: Joana vs Chuck Palahniuk); if there's no user but an author, Goodreads would redirect to the author page with the same ID and this function would return the author
if ambiguous you can set the $prefer_author flag

`void` greadusergp({ ... })

reads all group memberships of the given user into rh_into
from_user_id => string
rh_into => hash reference (id => %group,...)
on_group => sub( %group ) [optional]
on_progress => sub see gmeter() [optional]

`void` greadshelf({ ... })

reads a list of books (and/or authors) present in the given shelves of the given user
from_user_id => string
ra_from_shelves => string-array reference with shelf names
rh_into => hash reference (id => %book,...) [optional]
rh_authors_into => hash reference (id => %user,...) [optional]; this parameter is for convenience and also replaces the former greadauthors() function. It's not required to access author data as author data is available from the book data too: $book->{rh_author}->{...}
on_book => sub( %book ) [optional]
on_progress => sub see gmeter() [optional]
doesn't add users to rh_authors_into when gisbaduser() is true
sets the user_XXX and is_mainstream fields in each author item

`void` greadshelfnames({ ... })

reads the names of all shelves of the given user
from_user_id => string
ra_into => array reference
ra_exclude => array reference won't add given names to the result [optional]
Precondition: glogin()
Postcondition: result includes 'read', 'to-read', 'currently-reading', but doesn't include '#ALL#'

`void` _update_author_stats(rh_from_books)

sets the user_XXX and is_mainstream fields in each author item

`void` greadauthors({ ... })

DEPRECATED: use greadshelf() with rh_authors_into parameter
gets a list of authors whose books are present in the given shelves of the given
from_user_id => string
ra_from_shelves => string-array reference with shelf names
rh_into => hash reference (id => %user,...) [optional]
on_progress => sub see gmeter() [optional]
If you need authors and books data, then use greadshelf which also populates the author property of every book
skips authors where gisbaduser() is true
sets the user_XXX and is_mainstream fields in each author item

`void` greadauthorbk({ ... })

reads the Goodreads.com list of books written by the given author
author_id => string
limit => int number of books to read into rh_into
rh_into => hash reference (id => %book,...)
on_book => sub( %book ) [optional]
on_progress => sub see gmeter() [optional]

`void` greadreviews({ ... })

loads ratings (no text), reviews (text), "to-read", "added" etc; you can filter later or via on_filter parameter
rh_for_book => hash reference %book, see greadbook()
rh_into => hash reference (id => %review,...)
since => Time::Piece [optional]
on_filter => sub( %review ), return 0 to drop [optional]
on_progress => sub see gmeter() [optional]
dict_path => string path to a dictionary file (1 word per line) [optional]

text_minlen => int overwrites on_filter argument [optional, default 0 ]

0  =  no text filtering
n  =  specified minimum length (see also GOOD_USEFUL_REVIEW_LEN constant)

rigor => int [optional, default 2]

level 0   = search newest reviews only (max 300 ratings)
level 1   = search with a combination of filters (max 5400 ratings)
level 2   = like 1 plus dict-search if more than 3000 ratings with stall-time of 2 minutes
level n   = like 1 plus dict-search with stall-time of n minutes

`void` greadfolls({ ... })

queries Goodreads.com for the friends and followees list of the given user
rh_into => hash reference (id => %user,...)
from_user_id => string
on_user => sub( %user ) return false to exclude user from $rh_into [optional]
on_progress => sub see gmeter() [optional]
discard_threshold => number> don't add anything to $rh_into if number of folls exceeds limit [optional]; use this to drop degenerated accounts which would just add noise to the data
incl_authors => bool [optional, default 1]
incl_friends => bool [optional, default 1]
incl_followees => bool [optional, default 1]
Precondition: glogin()

`void` greadcomments({ ... })

reads a list of all comments posted from the given user on goodreads.com; it does not read a conversation by multiple users on some topic
from_user_id => string
ra_into => array reference (%comment,...) [optional]
limit => int stop after reading N comments [optional, default 0 ]
on_progress => sub see gmeter() [optional]

`void` gsocialnet({ ... })

from_user_id => string
rh_into_nodes => hash reference (id => %user,...)
ra_into_edges => array reference ({from => id, to => id},...)
ignore_nhood_gt => int ignore users with with a neighbourhood > N [optional, default 1000]; such users just add noise to the data and waste computing time
depth => int [optional, default 1]
incl_authors => bool [optional, default 0]
incl_friends => bool [optional, default 1]
incl_followees => bool [optional, default 1]
on_progress => sub({ done => int, count => int, perc => int, depth => int }) [optional]
on_user => sub( %user ) return false to exclude user [optional]
Precondition: glogin()

`void` greadsimilaraut({ ... })

reads the Goodreads.com list of authors who are similar to the given author
rh_into => hash reference (id => %user,...)
author_id => string
on_progress => sub see gmeter() [optional]
increments '_seen' counter of each author if already in %$rh_into

`void` gsearch({ ... })

searches the Goodreads.com database for books that match a given phrase
ra_into => array reference (%book,...)
phrase => string with space separated keywords
is_exact => bool [optional, default 0]
ra_order_by => array reference property names from %book [optional, default: 'stars', 'num_ratings', 'year']
num_ratings => int only list books with at least N ratings [optional, default 0]
on_progress => sub see gmeter() [optional]

`string` amz_book_html( %book )

HTML body of an Amazon article page

PUBLIC REPORT-GENERATION HELPERS

`string` ghtmlhead( $title, $ra_cols )

returns a string with HTML boiler plate code for a table-based report
$title: HTML title, Table caption
$ra_cols: [ "Normal", ">Sort ASC", "<Sort DESC", "!Not sortable/searchable", "Right-Aligned:", ">Sort ASC, right-aligned:", ":Centered:" ]

`string` ghtmlfoot()

returns a string with HTML boiler plate code for a table-based report

`string` ghtmlsafe($string)

always use this when generating HTML reports in order to prevent cross site scripting attacks (XSS) through malicious text on the Goodreads.com website

`void` ghistogram({ ... })

prints a year-based histogram for the given hash on the terminal
rh_from => hash reference (id => %any,...)
date_key => string name of the Time::Piece component of any hash item [optional, default 'date']
start_year => int [optional, default 2007]
title => string [optional, default '...reviews...']
bar_width => int [optional, default 40]
bar_char => char [optional, default '#']

PRIVATE URL-GENERATION ROUTINES

`string` _amz_url( %book )

Requires at least {isbn=>string}

`string` _shelf_url( $user_id, $shelf_name, $page_number = 1 )

URL for a page with a list of books (not all books)
"&print=true" allows 200 items per page with a single request, which is a huge speed improvement over loading books from the "normal" view with max 20 books per request. Showing 100 books in normal view is oddly realized by 5 AJAX requests on the Goodreads.com website.
"&per_page" in print-view can be any number if you work with your own shelf, otherwise max 200 if print view; ignored in non-print view; per_page>20 requires access with a cookie, see glogin()
"&view=table" puts all book data in code, although invisible (display=none)
"&sort=rating" is important for `friendrated.pl` with its book limit: Some users read 9000+ books and scraping would take forever. We sort lower-rated books to the end and could just scrape the first pages: Even those with 9000+ books haven't top-rated more than 2700 books.
"&shelf" supports intersection "shelf1%2Cshelf2" (comma)
Warning: changes to the URL structure will bust the file-cache

`string` _followees_url( $user_id, $page_number = 1 )

URL for a page with a list of the people $user is following
Warning: changes to the URL structure will bust the file-cache

`string` _friends_url( $user_id, $page_number = 1 )

URL for a page with a list of people befriended to $user_id
"&sort=date_added" (as opposed to 'last online') avoids moving targets while reading page by page
"&skip_mutual_friends=false" because we're not doing this just for me
Warning: changes to the URL structure will bust the file-cache

`string` _book_url( $book_id )

`string` _user_url( $user_id, $is_author = 0 )

`string` _revs_url( $book_id, $str_sort_newest_oldest = undef, $search_text = undef, $rating = undef, $is_text_only = undef, $page_number = 1 )

"&sort=newest" and "&sort=oldest" reduce the number of reviews for some reason (also observable on the Goodreads website), so only use if really needed (&sort=default)
"&search_text=example" invalidates sort order argument
"&rating=5"
"&text_only=true" just returns 1 page, you might get more text-reviews without this flag
the maximum of retrievable pages is 10 (300 reviews), see https://www.goodreads.com/topic/show/18937232-why-can-t-we-see-past-page-10-of-book-s-reviews?comment=172163745#comment_172163745
seems less throttled, not true for text-search

`string` _rev_url( $review_id )

`string` _author_books_url( $user_id, $page_number = 1 )

`string` _author_followings_url( $author_id, $page_number = 1 )

`string` _similar_authors_url( $author_id )

page number > N just returns same page, so no easy stop criteria; not sure, if there's more than page, though

`string` _search_url( phrase_str, $page_number = 1 )

"&q=" URL-encoded, e.g., linux+%40+"hase (linux @ "hase)

`string` _user_groups_url( $user_id, $page_number = 1 )

`string` _group_url( $group_id )

`string` _comments_url( $user_id, $page_number = 1 )

PRIVATE HTML-EXTRACTION ROUTINES

`%book` _extract_book( $book_page_html_str )

`%user` _extract_user( $user_page_html_str )

`%user` _extract_author( $user_page_html_str )

`bool` _extract_books( $rh_books, $rh_authors, $on_book_fn, $on_progress_fn, $shelf_tableview_html_str )

$rh_books: (id => %book,...)
$rh_authors: (id => %user,...)
returns 0 if no books, 1 if books, 2 if error

`bool` _extract_author_books( $rh_books, $r_limit, $on_book_fn, $on_progress_fn, $html_str )

$rh_books: (id => %book,...)
$r_limit: is counted to zero
returns 0 if no books, 1 if books, 2 if error

`bool` _extract_followees( $rh_users, $on_progress_fn, $incl_authors, $discard_threshold, $following_page_html_str )

$rh_users: (user_id => %user,...)
returns 0 if no followees, 1 if followees, 2 if error

`bool` _extract_friends( $rh_users, $on_progress_fn, $incl_authors, $discard_threshold, $friends_page_html_str )

$rh_users: (user_id => %user,...)
returns 0 if no friends, 1 if friends, 2 if error

`bool` _extract_comments( $ra, $on_progress, $comment_history_html_str )

`string` _conv_uni_codepoints( $string )

Convert Unicode codepoints such as \u003c

`string` _dec_entities( $string )

`$value` _require_arg( $name, $value )

`string` _trim( $string )

`bool` _extract_revs( $rh_revs, $on_progress_fn, $filter_fn, $since_time_piece, $reviews_xhr_html_str )

$rh_revs: (review_id => %review,...)
returns 0 if no reviews, 1 if reviews, 2 if error

`bool` _extract_similar_authors( $rh_into, $author_id_to_skip, $on_progress_fn, $similar_page_html_str )

returns 0 if no authors, 1 if authors, 2 if error

`bool` _extract_search_books( $ra_books, $on_progress_fn, $search_result_html_str )

result pages sometimes have different number of items: P1: 20, P2: 16, P3: 19
website says "about 75 results" but shows 70 (I checked that manually). So we fake "100%" to the progress indicator function at the end, otherwise it stops with "93%".
ra_books: (%book,...)
returns 0 if no books, 1 if books, 2 if error

`bool` _extract_user_groups( $rh_into, $on_group_fn, on_progress_fn, $groups_html_str )

returns 0 if no groups, 1 if groups, 2 if error

`string` _extract_csrftok( $html )

Example: my $csrftok = _extract_csrftok( _html( _user_url( $uid ) ) ); $curl->setopt( $curl->CURLOPT_HTTPHEADER, [ "X-CSRF-Token: ${csrftok}",

PRIVATE I/O PLUMBING SUBROUTINES

`int` _check_page( $any_html_str )

returns $_ENO_XXX constants
warn if sign-in page (https://www.goodreads.com/user/sign_in) or in-page message
warn if "page unavailable, Goodreads request took too long"
warn if "page not found"
error if page unavailable: "An unexpected error occurred. We will investigate this problem as soon as possible"
error if over capacity (TODO UNTESTED): "<?>Goodreads is over capacity.</?> <?>You can never have too many books, but Goodreads can sometimes have too many visitors. Don't worry! We are working to increase our capacity.</?> <?>Please reload the page to try again.</?> <a ...>get the latest on Twitter</a>" https://pbs.twimg.com/media/DejvR6dUwAActHc.jpg https://pbs.twimg.com/media/CwMBEJAUIAA2bln.jpg https://pbs.twimg.com/media/CFOw6YGWgAA1H9G.png (with title)
error if maintenance mode (TODO UNTESTED): "<?>Goodreads is down for maintenance.</?> <?>We expect to be back within minutes. Please try again soon!<?> <a ...>Get the latest on Twitter</a>" https://pbs.twimg.com/media/DgKMR6qXUAAIBMm.jpg https://i.redditmedia.com/-Fv-2QQx2DeXRzFBRKmTof7pwP0ZddmEzpRnQU1p9YI.png
error if website temporarily unavailable (TODO UNTESTED): "Our website is currently unavailable while we make some improvements to our service. We'll be open for business again soon, please come back shortly to try again. <?> Thank you for your patience." (No Alice error) https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/hostedimages/1404319071i/10224522.png

`void` _updcookie( $string_with_changed_fields )

updates "_session_id2" for X-CSRF-Token, "csid", "u" (user?). "p" (password?)

`void` _setcurlopts( $curl_ref , $url_str )

Sets default options for GET, POST, PUT, DELETE

`string` _html( $url, $warn_level = $_ENO_WARN, $can_cache = 1 )

HTML body of a web document
caches documents (if $can_cache is true)
retries on errors

Files

Goodscrapes.pod

Latest commit

History

Goodscrapes.pod

File metadata and controls

NAME

VERSION

COMPARED TO THE OFFICIAL API

LIMITATIONS

HOW TO USE

AUTHOR

DATA STRUCTURES

Note

%book

%user

%review

%group

%comment

PUBLIC ROUTINES

string gverifyuser( $user_id_to_verify )

string gverifyshelf( $name_to_verify )

bool gisbaduser( $user_or_author_id )

sub gmeter( $unit_str = '' )

void glogin({ ... })

void gsetopt({ ... })

%book greadbook( $book_id )

%user greaduser( $user_id, $prefer_author = 0 )

void greadusergp({ ... })

void greadshelf({ ... })

void greadshelfnames({ ... })

void _update_author_stats(rh_from_books)

void greadauthors({ ... })

void greadauthorbk({ ... })

void greadreviews({ ... })

void greadfolls({ ... })

void greadcomments({ ... })

void gsocialnet({ ... })

void greadsimilaraut({ ... })

void gsearch({ ... })

string amz_book_html( %book )

PUBLIC REPORT-GENERATION HELPERS

string ghtmlhead( $title, $ra_cols )

string ghtmlfoot()

string ghtmlsafe($string)

void ghistogram({ ... })

PRIVATE URL-GENERATION ROUTINES

string _amz_url( %book )

string _shelf_url( $user_id, $shelf_name, $page_number = 1 )

string _followees_url( $user_id, $page_number = 1 )

string _friends_url( $user_id, $page_number = 1 )

string _book_url( $book_id )

string _user_url( $user_id, $is_author = 0 )

string _revs_url( $book_id, $str_sort_newest_oldest = undef, $search_text = undef, $rating = undef, $is_text_only = undef, $page_number = 1 )

string _rev_url( $review_id )

string _author_books_url( $user_id, $page_number = 1 )

string _author_followings_url( $author_id, $page_number = 1 )

string _similar_authors_url( $author_id )

string _search_url( phrase_str, $page_number = 1 )

string _user_groups_url( $user_id, $page_number = 1 )

string _group_url( $group_id )

string _comments_url( $user_id, $page_number = 1 )

PRIVATE HTML-EXTRACTION ROUTINES

%book _extract_book( $book_page_html_str )

%user _extract_user( $user_page_html_str )

%user _extract_author( $user_page_html_str )

bool _extract_books( $rh_books, $rh_authors, $on_book_fn, $on_progress_fn, $shelf_tableview_html_str )

bool _extract_author_books( $rh_books, $r_limit, $on_book_fn, $on_progress_fn, $html_str )

bool _extract_followees( $rh_users, $on_progress_fn, $incl_authors, $discard_threshold, $following_page_html_str )

bool _extract_friends( $rh_users, $on_progress_fn, $incl_authors, $discard_threshold, $friends_page_html_str )

bool _extract_comments( $ra, $on_progress, $comment_history_html_str )

string _conv_uni_codepoints( $string )

string _dec_entities( $string )

$value _require_arg( $name, $value )

string _trim( $string )

bool _extract_revs( $rh_revs, $on_progress_fn, $filter_fn, $since_time_piece, $reviews_xhr_html_str )

bool _extract_similar_authors( $rh_into, $author_id_to_skip, $on_progress_fn, $similar_page_html_str )

bool _extract_search_books( $ra_books, $on_progress_fn, $search_result_html_str )

bool _extract_user_groups( $rh_into, $on_group_fn, on_progress_fn, $groups_html_str )

string _extract_csrftok( $html )

`string` gverifyuser( $user_id_to_verify )

`string` gverifyshelf( $name_to_verify )

`bool` gisbaduser( $user_or_author_id )

`sub` gmeter( $unit_str = '' )

`void` glogin({ ... })

`void` gsetopt({ ... })

`%book` greadbook( $book_id )

`%user` greaduser( $user_id, $prefer_author = 0 )

`void` greadusergp({ ... })

`void` greadshelf({ ... })

`void` greadshelfnames({ ... })

`void` _update_author_stats(rh_from_books)

`void` greadauthors({ ... })

`void` greadauthorbk({ ... })

`void` greadreviews({ ... })

`void` greadfolls({ ... })

`void` greadcomments({ ... })

`void` gsocialnet({ ... })

`void` greadsimilaraut({ ... })

`void` gsearch({ ... })

`string` amz_book_html( %book )

`string` ghtmlhead( $title, $ra_cols )

`string` ghtmlfoot()

`string` ghtmlsafe($string)

`void` ghistogram({ ... })

`string` _amz_url( %book )

`string` _shelf_url( $user_id, $shelf_name, $page_number = 1 )

`string` _followees_url( $user_id, $page_number = 1 )

`string` _friends_url( $user_id, $page_number = 1 )

`string` _book_url( $book_id )

`string` _user_url( $user_id, $is_author = 0 )

`string` _revs_url( $book_id, $str_sort_newest_oldest = undef, $search_text = undef, $rating = undef, $is_text_only = undef, $page_number = 1 )

`string` _rev_url( $review_id )

`string` _author_books_url( $user_id, $page_number = 1 )

`string` _author_followings_url( $author_id, $page_number = 1 )

`string` _similar_authors_url( $author_id )

`string` _search_url( phrase_str, $page_number = 1 )

`string` _user_groups_url( $user_id, $page_number = 1 )

`string` _group_url( $group_id )

`string` _comments_url( $user_id, $page_number = 1 )

`%book` _extract_book( $book_page_html_str )

`%user` _extract_user( $user_page_html_str )

`%user` _extract_author( $user_page_html_str )

`bool` _extract_books( $rh_books, $rh_authors, $on_book_fn, $on_progress_fn, $shelf_tableview_html_str )

`bool` _extract_author_books( $rh_books, $r_limit, $on_book_fn, $on_progress_fn, $html_str )

`bool` _extract_followees( $rh_users, $on_progress_fn, $incl_authors, $discard_threshold, $following_page_html_str )

`bool` _extract_friends( $rh_users, $on_progress_fn, $incl_authors, $discard_threshold, $friends_page_html_str )

`bool` _extract_comments( $ra, $on_progress, $comment_history_html_str )

`string` _conv_uni_codepoints( $string )

`string` _dec_entities( $string )

`$value` _require_arg( $name, $value )

`string` _trim( $string )

`bool` _extract_revs( $rh_revs, $on_progress_fn, $filter_fn, $since_time_piece, $reviews_xhr_html_str )

`bool` _extract_similar_authors( $rh_into, $author_id_to_skip, $on_progress_fn, $similar_page_html_str )

`bool` _extract_search_books( $ra_books, $on_progress_fn, $search_result_html_str )

`bool` _extract_user_groups( $rh_into, $on_group_fn, on_progress_fn, $groups_html_str )

`string` _extract_csrftok( $html )

`int` _check_page( $any_html_str )

`void` _updcookie( $string_with_changed_fields )

`void` _setcurlopts( $curl_ref , $url_str )

`string` _html( $url, $warn_level = $_ENO_WARN, $can_cache = 1 )