-
Notifications
You must be signed in to change notification settings - Fork 20
Running Hive, Pig or Streaming jobs
mlimotte edited this page May 9, 2012
·
1 revision
Running Hive, Pig or Streaming scripts can be done, but since we haven't had the need to do this, it does not have a declarative interface. The suggestions below are not difficult, but they do assume a fair understanding of Clojure (and Java interoperability).
Steps created with defstep are Maps which are eventually used to construct StepConfig objects. The entity defined by defstep is passed to (fire!) to make this happen. But (fire!) can also take StepConfig instances.
So, in brief, to make a StepConfig for Hive or Pig; you can use the StepFactory to construct a HadoopJarStepConfig object. You can then call the StepConfig constructor with a name (String) and the HadoopJarStepConfig instance.
You have several options for a Streaming job.
- Create a StepConfig similar to the hive/pig description above. But use StreamingStep, which is an instance of HadoopJarStepConfig.
- At Climate Corporation, we wrap many of our Streaming jobs in Cascalog queries.
- Here is some Clojure Code using the helper com.climate.services.aws.emr/step-config:
(emr/step-config
"stream-step"
false
"/home/hadoop/contrib/streaming/hadoop-streaming.jar"
nil
["-input" (format "s3://%s/data/simple.txt" bucket)
"-output" "/out"
"-mapper" (format "s3://%s/scripts/wc.sh" bucket)])