You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I saw this example from the document about how to implement multiple steps with mrjobhere:
frommrjob.jobimportMRJobfrommrjob.stepimportMRStepimportreWORD_RE=re.compile(r"[\w']+")
classMRMostUsedWord(MRJob):
defmapper_get_words(self, _, line):
# yield each word in the lineforwordinWORD_RE.findall(line):
yield (word.lower(), 1)
defcombiner_count_words(self, word, counts):
# sum the words we've seen so faryield (word, sum(counts))
defreducer_count_words(self, word, counts):
# send all (num_occurrences, word) pairs to the same reducer.# num_occurrences is so we can easily use Python's max() function.yieldNone, (sum(counts), word)
# discard the key; it is just Nonedefreducer_find_max_word(self, _, word_count_pairs):
# each item of word_count_pairs is (count, word),# so yielding one results in key=counts, value=wordyieldmax(word_count_pairs)
defsteps(self):
return [
MRStep(mapper=self.mapper_get_words,
combiner=self.combiner_count_words,
reducer=self.reducer_count_words),
MRStep(reducer=self.reducer_find_max_word)
]
if__name__=='__main__':
MRMostUsedWord.run()
If I have many steps and I want to write it in another class or different files, how can I import and define the step() function?
The text was updated successfully, but these errors were encountered:
I saw this example from the document about how to implement multiple steps with
mrjob
here:If I have many steps and I want to write it in another class or different files, how can I import and define the
step()
function?The text was updated successfully, but these errors were encountered: