A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from the command line, or as a JMS listener.
Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
datasource and write solr documents into a solr instance, you need to use an implementation specific gem, such as
solrizer-fedora, which provides the
mechanics for reading from a fedora repository and writing to a solr instance.
The gem is hosted on rubygems.org. The best way to manage the gems for your project is to use bundler. Create a Gemfile in the root of your application and include the following:
source "http://rubygems.org" gem 'solrizer'
Then:
bundle install
The code snippets in the following sections can be cut/paste into your console, giving you the opportunity to play with Solrizer
and demonstrate the functionality underlying the implementation-specific gems, such as solrizer-fedora.
Start up a console and load solrizer:
irb require "rubygems" require "solrizer"
The FieldMapper maps term names and values to Solr fields, based on the term’s data type and any index_as options. Solrizer comes with default mappings (which are defined in the config/solr_mappings.yml):
default_mapper = Solrizer::FieldMapper::Default # some of the default mappings in solrizer default_mapper.solr_name("foo",:string) # returns foo_t default_mapper.solr_name("foo",:date) # returns foo_dt default_mapper.solr_name("foo",:integer) # returns foo_i default_mapper.solr_name("foo",:string,:facetable) # returns foo_facet default_mapper.solr_name("foo",:text,:facetable) # returns foo_facet default_mapper.solr_name("foo",:integer,:facetable) # returns foo_facet
FieldMapper provides some defaults:
default_mapper.solr_names_and_values(“foo”,“bar”,:string,[:facetable]) # returns searchable and facetable by default => {"foo_facet"=>[“bar”], “foo_t”=>[“bar”]}
default_mapper.solr_names_and_values(“foo”,“bar”,:string,[:not_searchable, :facetable]) # returns just facetable => {"foo_facet"=>[“bar”]}
Which can be tweaked:
default_mapper.default_index_types << :facetable
default_mapper.solr_names_and_values(“foo”,“bar”,:string,[]) # returns searchable and facetable by default => {"foo_facet"=>[“bar”], “foo_t”=>[“bar”]}
Custom Mappings can also be provided (with custom converters):
class CustomMapper < FieldMapper index_as :searchable, :suffix => "_search" do |type| type.reversed :suffix => "_reverse" do |value| value.reverse end end end
custom_mapper = CustomMapper.new custom_mapper.solr_names_and_values("foo","bar",:string,[:searchable]) # returns {"foo_search"=>["bar"]} custom_mapper.solr_names_and_values("foo","bar",:reversed,[:searchable]) # returns {"foo_reverse"=>["rab"]}
For more detailed information on custom mappings, see the documetnation for the FieldMapper class.
Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
extractor = Solrizer::Extractor.new extractor.format_node_value(["foo ","\n bar"]) # returns "foo bar" solr_doc = Hash.new extractor.insert_node_value(solr_doc, "foo","bar") # solr_doc is now {"foo" => ["bar"]} extractor.insert_solr_field_value(solr_doc,"foo","baz") # solr_doc is now {"foo" => ["bar","baz"]} extractor.insert_node_value(solr_doc, "boo","hoo") # solr_doc is now {"foo" => ["bar","baz"], "boo" => ["hoo"]}
- Solrizer::HTML::Extractor -=> provides html_to_solr method
- Solrizer::XML::Extractor -=> provides xml_to_solr method
xml = "<fields><foo>bar</foo><bar>baz</bar></fields>" extractor.xml_to_solr(xml) # returns {:foo_t=>"bar", :bar_t=>"baz"}
Another powerful mixin for use with classes that include the OM::XML::Document module is Solrizer::XML::TerminologyBasedSolrizer.
The methods provided by this module map provides a robust way of mapping terms and solr fields via om terminologies. A notable example
can be found in ActiveFedora::NokogiriDatatstream.
The solrizer gem provides two executables:
- solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
- solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
The usage for solrizerd is as follows:
solrizerd command --hydra_home PATH [options]
The commands are as follows:
- start start an instance of the application
- stop stop all instances of the application
- restart stop all instances and restart them afterwards
- status show status (PID) of application instances
Required parameters:
—hydra_home: this is the path to your hydra rails applications’ root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
The options:
- -p, —port Stomp port 61613
- -o, —host Host to connect to localhost
- -u, —user User name for stomp listener
- -w, —password Password for stomp listener
- -d, —destination Topic to listen to (default: /topic/fedora.apim.update)
- -h, —help Display this screen
Note:
Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
- Fork the project.
- Make your feature addition or bug fix.
- Add tests for it. This is important so I don’t break it in a
future version unintentionally. - Commit, do not mess with rakefile, version, or history.
(if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull) - Send me a pull request. Bonus points for topic branches.
Technical Lead: Matt Zumwalt (MediaShelf)
Thanks to
Douglas Kim, who created the initial code base for Solrizer.
Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.
Copyright © 2010 Matt Zumwalt. See LICENSE for details.