Skip to content

mhernandeza/MapReduce-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MapReduce example

Basic MapReduce implementation for Hadoop data processing. Uses the input provided by a text file and outputs the words found in that file with the ammount of times they are repeated on the file.

I tested this implementation using the Hortonworks Sandbox VirtualBox VM. If you decide to use it you can familiarize yourself with it using this tutorial.

Follow the wiki created for this repository to learn more about Hadoop, MapReduce and working with the Hortonworks Sandbox.

Requirements

  • Single node Hadoop setup
  • Apache Maven (3.3.9)

Expected output

  • Generate a jar file using mvn package in the project's root directory.
  • Run the MapReduce job using hadoop jar yourJarFile.jar [input file path] [output directory path] For example:
hadoop jar /path/to/jar/file/test-1.0-SNAPSHOT.jar		\
/path/to/input/file/votecount-in.txt		\
/path/to/output/directory/      

Using the votecount-in.txt file as input, you should find a file with the following content in your output directory:

one    1
same   1
second 3
third  2
winner 5

You can also use multiple input files, to do so simply add the path to the directory containing the input files and Haddop will take care of it in the end. Run:

hadoop jar /path/to/jar/file/test-1.0-SNAPSHOT.jar \
/path/to/input/directory/
/path/to/output/directory/

Using both the votecount-in.txt and the additional-in.txt files, you should find the following in your output directory:

four	4
one		1
same	1
second	3
third	2
two		2
winner	6

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages