Skip to content
Jakob Voss edited this page May 8, 2014 · 7 revisions

this page should introduce to stores

Using stores on command line

Stores can be used with the command line client to import, export, count, delete, and move records.

catmandu import [-?hLv] [long options...]

examples:

catmandu import YAML --file books.yml to MongoDB 
    --database_name items --bag book

options:

        -? -h --help        this usage screen
        -L --load_path
        -v --verbose

import

$ catmandu import MARC --type USMARC
    to CouchDB --database_name marc --bag marc 
    < ./data/camel.mrc

$ catmandu import MAB --fix ./fix/zdb_bibliographic.fix
    to MongoDB --database_name mab --bag mab 
    < ./data/journals_mab2.dat

$ catmandu import MAB --fix ./fix/zdb_bibliographic.fix
     to ElasticSearch --index_name mab --bag mab
     < ./data/journals_mab2.dat

export

catmandu export [-?hLqv] [long options...]

examples:

catmandu export MongoDB --database_name items --bag book to YAML

options:

        -? -h --help        this usage screen
        -L --load_path
        -v --verbose
        -q --query
        --limit

Examples:

$ catmandu export CouchDB --database_name marc --bag marc to JSON

$ catmandu export MongoDB --database_name mab --bag mab to JSON

$ catmandu export ElasticSearch --index_name mab --bag mab to JSON

count

catmandu count [-?hLq] [long options...]

examples:

catmandu count ElasticSearch --index_name shop --bag products 
    --query 'brand:Acme'

options:

        -? -h --help        this usage screen
        -L --load_path
        -q --query

Examples:

$ catmandu count CouchDB --database_name marc --bag marc

$ catmandu count MongoDB --database_name mab --bag mab

$ catmandu count MongoDB --database_name mab --bag mab 
    --query "{\"dc.publisher\": \"Heise\"}"

$ catmandu count ElasticSearch --index_name mab --bag mab

$ catmandu count ElasticSearch --index_name mab --bag mab 
    --query 'dc.publisher:"Heise"'

deleting

catmandu delete [-?hLq] [long options...]

examples:

catmandu delete ElasticSearch --index_name items 
    --bag book -q 'title:"Programming Perl"'

options:

        -? -h --help        this usage screen
        -L --load_path
        -q --query

Examples:

$ catmandu delete CouchDB --database_name marc --bag marc

$ catmandu delete MongoDB --database_name mab --bag mab

$ catmandu delete ElasticSearch --index_name mab --bag mab

$ catmandu delete MongoDB --database_name mab --bag mab 
    --q '{"_id":"1262750"}'

$ catmandu delete ElasticSearch --index_name mab --bag mab 
    --q '_id:"1262750"'

move

catmandu move [-?hLqv] [long options...]

examples:

catmandu move MongoDB --database_name items --bag book 
    to ElasticSearch --index_name items --bag book

options:

        -? -h --help        this usage screen
        -L --load_path
        -v --verbose
        -q --query
        --limit

Examples:

$ catmandu move MongoDB --database_name mab --bag mab 
    to ElasticSearch --index_name mab --bag mab

$ catmandu move MongoDB --database_name mab --bag mab 
    to CouchDB --database_name mab --bag mab

$ catmandu move CouchDB --database_name marc --bag marc
    to MongoDB --database_name marc --bag marc

$ catmandu move MongoDB --database_name mab --bag mab 
    --query "{\"dcterms.spatial\": \"XA-DE\"}" 
    to ElasticSearch --index_name moved --bag moved

$ catmandu move ElasticSearch --index_name moved --bag moved 
    --query "dc.identifier:\"47918-4\"" 
    to ElasticSearch --index_name selected --bag selected

Using stores in Perl programs

As explained in the introduction, one of the rationales for creating Catmandu is to ease the serialization of records in our database of choice. The introduction of schemaless databases made the storage of complex records quite easy. Before we delve into this type of database we need to show you what syntax Catmandu is using to store data.

As example lets create the most simple storage mechanism possible, an in memory hash. We use this mock 'database' to show some of the features that any Catmandu::Store has. First we will create a YAML importer as shown above to import records into an in memory hash store:

use Catmandu::Importer::YAML;
use Catmandu::Store::Hash;
use Data::Dumper;

my $importer = Catmandu::Importer::YAML->new(file => "./test.yaml");
my $store    = Catmandu::Store::Hash->new();

# Store an iterable
$store->bag->add_many($importer);

# Store an array of hashes
$store->bag->add_many([ { name => 'John' } , { name => 'Peter' }]);

# Store one hash
$store->bag->add( { name => 'Patrick' });

# Commit all changes
$store->bag->commit;

Each Catmandu::Store have one or more compartments (e.g. tables) to store data called 'bag'. We use the function 'add_many' to store each item in the importer Iterable into the Store. We can also store an array of Perl hashes with the same command. Or store a single hash with the 'add' method.

Each bag is an Iterator so you can apply any of the 'each','any','all',... methods shown above to read data from a bag.

$store->bag->take(3)->each(sub {
 my $obj = shift;
 #.. your code
});

When you store a perl Hash into a Catmandu::Store then an identifier field '_id' gets added to your perl Hash that can be used to retrieve the item at a later stage. Lets take a look at the identifier and how it can be used.

# First store a perl hash and return the stored item which includes the stored identifier
my $item = $store->bag->add( { name => 'Patrick' });

# This will show you an UUID like '414003DC-9AD0-11E1-A3AD-D6BEE5345D14'...
print $item->{_id} , "\n";

# Now you can use this identifier to retrieve the object from the store
my $item2 = $store->bag->get('414003DC-9AD0-11E1-A3AD-D6BEE5345D14');

And that is how it works. Catmandu::Store has some more functionality to delete items and query the store (if the backend supports it), but this is how you can store very complex Perl structures in memory or on disk with just a few lines of code. As a complete example we can show how easy it is to store data in a fulltext search engine like ElasticSearch.

In this example we will download ElasticSearch version 0.19.3 from this website and install it on our system:

$ wget https://github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.19.3.tar.gz
$ tar zxvf elasticsearch-0.19.3.tar.gz
$ cd elasticsearch-0.19.3
$ bin/elasticsearch

After running the last command bin/elasticsearch we have started the search daemon. Now we can index some data with Catmandu:

use Catmandu::Importer::YAML;
use Catmandu::Store::ElasticSearch;

my $importer = Catmandu::Importer::YAML->new(file => './test.yaml');
my $store    = Catmandu::Store::ElasticSearch->new(index_name => 'demo');

$store->bag->add_many($importer);

$store->bag->commit;

All records in the file 'test.yaml' should be available now index. We can test this by executing a new script to read all records stored in the store:

use Catmandu::Store::ElasticSearch;
use Data::Dumper;

my $store = Catmandu::Store::ElasticSearch->new(index_name => 'demo');

$store->bag->each(sub {
   my $obj = shift;
   print Dumper($obj);
});

If everything work correct you should something like this:

$VAR1 = {
        'first' => 'Charly',
        '_id' => '96CA6692-9AD2-11E1-8800-92A3DA44A36C',
        'last' => 'Parker',
        'job' => 'Artist'
      };
$VAR1 = {
        'first' => 'Joseph',
        '_id' => '96CA87F8-9AD2-11E1-B760-84F8F47D3A65',
        'last' => 'Ratzinger',
        'job' => 'Pope'
      };
$VAR1 = {
        'first' => 'Albert',
        '_id' => '96CA83AC-9AD2-11E1-B1CD-CC6B8E6A771E',
        'last' => 'Einstein',
        'job' => 'Physicist'
      };

The ElasticSearch store even provides an implementation of the Lucene and CQL query language:

my $hits = $store->bag->searcher(query => 'first:Albert');

$hits->each(sub {
   my $obj = shift;
   printf "%s %s\n" , $obj->{first} , $obj->{last};
});

This last example will print 'Albert Einstein' as result. Clinton Gormley did some great work in providing a Perl client for ElasticSearch. Searching complex objects can be done by using a dot syntax e.g. 'record.titles.0.subtitle:"My Funny Valentine"'. The beauty of ElasticSearch is that it is completely painless to setup and requires no schema: indexing data is simply done by using JSON over HTTP. All your fields are indexed automatically.

Clone this wiki locally