OpeNER Daemons

This Gem makes it possible for OpeNER components to be used as a daemon using Amazon SQS and Amazon S3. SQS is used for job input while S3 is used for storing results. Daemons only take URLs as input, they don't allow text to be specified directly due to size restrictions of SQS (a maximum of 256 KB).

Usage

Create an executable file bin/<component>-daemon, for example bin/language-identifier-daemon, with the following content:

#!/usr/bin/env ruby
require 'opener/daemons'

controller = Opener::Daemons::Controller.new(
  :name      => 'opener-<component>',
  :exec_path => File.expand_path('../../exec/<component>.rb', __FILE__)
)

controller.run

Replace <component> with the name of the component. For example, for the language identifier this would result in the following:

#!/usr/bin/env ruby
require 'opener/daemons'

controller = Opener::Daemons::Controller.new(
  :name      => 'opener-language-identifier',
  :exec_path => File.expand_path('../../exec/language-identifier.rb', __FILE__)
)

controller.run

Next, create an executable file exec/<component>.rb, for example exec/language-identifier.rb, with the following content:

#!/usr/bin/env ruby
require 'opener/daemons'

require_relative '../lib/opener/<component>'

daemon = Opener::Daemons::Daemon.new(Opener::<constant>)

daemon.start

Replace <component> with the component name, replace <constant> with the corresponding constant. For example, for the language identifier:

#!/usr/bin/env ruby
require 'opener/daemons'

require_relative '../lib/opener/language_identifier'

daemon = Opener::Daemons::Daemon.new(Opener::LanguageIdentifier)

daemon.start

Extra arguments for the component can be specified as a Hash in the second argument of the Daemon.new method:

daemon = Opener::Daemons::Daemon.new(Opener::LanguageIdentifier, :kaf => false)

These options will be passed to every individual instance of the component.

Requirements

A supported Ruby version (see below)
Amazon SQS
Amazon S3
libarchive (for running the tests and such), on Debian/Ubuntu based systems this can be installed using sudo apt-get install libarchive-dev

The following Ruby versions are supported:

Ruby	Required	Recommended
MRI	>= 1.9.3	>= 2.1.4
Rubinius	>= 2.2	>= 2.3.0
JRuby	>= 1.7	>= 1.7.16

Installation

Install it from RubyGems:

gem install opener-daemons

Or using Bundler:

# add this to your Gemfile
gem 'opener-daemons'

# then run this
bundle install

Job Format

Jobs should be serialized as JSON and should adhere to the JSON schema definition schema/sqs_input.json. In short, a job is a JSON object with the following fields:

input_url: the input URL
callbacks: an array of URLs
identifier: a unique identifier to use for the file stored in S3, if no value is given an identifier will be generated automatically
metadata: an object containing arbitrary metadata, will be passed to every callback URL

An example:

{
    "input_url": "http://example.com/my-kaf.xml",
    "callbacks": ["http://example.com/my-callback"],
    "identifier": "foo123",
    "metadata": {
        "customer_id": 123
    }
}

For more specific details see the schema.

Output

Daemon output is stored in an Amazon S3 bucket, output files are named <identifier>.xml where <identifier> is the unique identifier of the document. The content type of these documents is set to application/xml. Metadata associated with the job (as specified in the metadata field) is saved as metadata of the S3 object.

Callback URLs will receive the URL of an uploaded document, not the actual content itself. The S3 URLs are only valid for a limited time (currently 1 hour) so callbacks must ensure they can process the input within that time limit.

Monitoring

Components using this Gem can measure performance using New Relic and report errors using Rollbar. To support this the following two environment variables must be set:

NEWRELIC_TOKEN
ROLLBAR_TOKEN

For New Relic the application names will be opener-<component> where <component> is the component name, as defined by a component itself. If one of these environment variables is not set the corresponding feature is disabled.

CLI Options

Each daemon takes a set of options that can be used to configure the input queue, the S3 bucket and so forth. For an up to date list of these options and their descriptions run a daemon using the --help option.

Some of these options set environment variables that can be used by components, these are as following:

input: sets the input queue in the INPUT_QUEUE variable
threads: sets the amount of threads to use in the DAEMON_THREADS variable
bucket: sets the S3 bucket to use for output documents in the OUTPUT_BUCKET variable

Amazon Environment Variables

To properly configure the daemons for Amazon you should set the following environment variables:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION

If you're running this daemon on an EC2 instance then the first two environment variables will be set automatically if the instance has an associated IAM profile. The AWS_REGION variable must always be set.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
config		config
lib/opener		lib/opener
schema		schema
script		script
spec		spec
task		task
.gitignore		.gitignore
.travis.yml		.travis.yml
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
opener-daemons.gemspec		opener-daemons.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpeNER Daemons

Usage

Requirements

Installation

Job Format

Output

Monitoring

CLI Options

Amazon Environment Variables

About

Releases

Packages

Contributors 4

Languages

License

opener-project/daemons

Folders and files

Latest commit

History

Repository files navigation

OpeNER Daemons

Usage

Requirements

Installation

Job Format

Output

Monitoring

CLI Options

Amazon Environment Variables

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages