Normalized sort key for sorting Library of Congress call numbers.
Sorting Library of Congress call numbers is tricky. This library generates a sort key for a LC call number, such that for a list of callnums, their sort keys will sort (natural byte order) in the same order the call numbers should sort in.
# It's often useful to store the sort_key in a db
sort_key = Lcsort.normalize(callnum)
If the input can't be recognized as an LC Call Number, nil
will be returned.
This code is intended for ascii-only input, if you have UTF-8 in your call numbers, we don't know what will happen.
# Or if you have a list of call numbers in memory, easy
# enough to just sort them in memory:
call_num_array.sort_by {|callnum| Lcsort.normalize(callnum) }
Call numbers are diverse, both in standard LC and local practice. We wouldn't have the hubris to say we can properly recognize and sort EVERY possible LC call number including local practices. But we sure can handle a lot, including:
- Typical call numbers like:
R 169.1 .B59 1990
- Can handle variations in spacing/punctuation, such as:
R 169.B59.C39
,R169 B59C39 1990
- Can handle properly sorting the dreaded 'date or other number':
KF 4558 15th .G6
sorts afterKF 4558 1st .G6
- Will generally sort volume/number info in call number suffix properly:
Q11 .P6 vol. 4 no. 4
sorts beforeQ11 .P6 vol. 12 no. 1
. - Can handle 1-2 letter suffixes on the end of cutters:
R 179 .C79ab
. Common local practice, and also used in NLM call numbers. (No guarantee that every NLM call number can be handled by this library for LC call numbers, but it seems to work okay for NLM.)
OCLC's docs on MARC 050 includes some information on possible LC call number components.
Once you have a bunch of Lcsort keys in your database, you may want to search
to find all call numbers beginning with, say, EG 101
. So that might include EG 101.5
, EG 101 .C23 1990
etc.
The truncated_range_end
method gives you a proper ending range to get what you want, say:
sort_key >= #{Lcsort.normalize("EG 101")} AND sort_key <= #{Lcsort.truncated_range_end('EG 101')}
This can also be used for finding a range of call numbers. Say you want all call numbers
from those beginning with AB 101
to AB 500
:
sort_key >= #{Lcsort.normalize("AB 101")} AND sort_key <= #{Lcsort.truncated_range_end('AB 500')}
truncated_range_end
works with as many or as few call number components as you want. Lcsort.truncated_range_end('AB 101.1')
will find AB 101.123
or AB 101.1 .A5
too. Lcsort.truncated_range_end("AB 101 .C45")
will find AB 101 .C456
, AB 101 .C45 .B5
, etc.
At the moment, truncated_range_end
actually pretty much just adds an ~
onto the end
of the normalized sort key. But it did more complicated things in past versions of
the normaliation algorithm, and we do have tests ensuring it finds what is expected.
Sometimes you want to add something on to the end of a normalized call number, as a payload, or to ensure normalized sort key uniqueness.
You can pass an :append_suffix to have it appended in a way that won't otherwise change the sort order of the original call number.
I use this to add the bib ID on to the end of the normalized sort key, because if two bibs have identical call numbers, I want to avoid normalized sort key collision, because my functions work better with all unique sort keys.
sortkey = Lcsort.normalize(callnumber, :append_suffix => bibID)
Original regex and code by Bill Dueber. Original port to ruby by Nikitas Tampakis. LC handling advice from Naomi Dushay and her code.