Hello and Welcome to my Portfolio Folder!

Mapreduce program count top 10 words in a large text corpus.

 ** shakespeare_100.txt - The input data file to count the occurences of each word

Run Hadoop streaming wordcount

[root@sandbox lab]# hadoop jar /usr/hdp/2.4.0.0-169/hadoop-mapreduce/hadoopstreaming- 2.7.1.2.4.0.0-169.jar -file /root/lab/ mapper_word_counts.py -mapper mapper_word_counts.py -file /root/lab/ reducer_word_counts.py -reducer reducer_word_counts.py –input /user/root/shakespeare_100.txt -output /user/root/shakespeare_streaming_out

View results

• [root@sandbox lab]# hadoop fs -ls /user/root/shakespeare_streaming_out

• [root@sandbox lab]# hadoop fs –cat /user/root/shakespeare_streaming_out/part-00000 | tail -n 15

Mapreduce program find movie name and rating greater than 3 from the input files

    **   u.data- The dataset has 100000 ratings by 943 users on 1682 movies.
         The file has 4 tab  ("\t") separated columns.  
   
   **   u.item - Information about the items (movies); this is a tab separated file with 3 columns. 
         The first column is movie id, the second column ismovie name, and the third column is release date.
         
   **   u.join - Data from u.data and u.item combined. 
        The first column column has "A" or "B". "A" denotes u.data and "B" denotes u.item.less.
        You can use it for the Map-Reduce job for the question number 2.

Execution of the program

Run Hadoop streaming Movie Rating Count

[root@sandbox lab]# hadoop jar /usr/hdp/2.4.0.0-169/hadoop-mapreduce/hadoopstreaming- 2.7.1.2.4.0.0-169.jar -file /root/lab/ wc_mapper_movie_rating.py -mapper wc_mapper_movie_rating.py -file /root/lab/ wc_reducer_movie_rating.py -reducer wc_reducer_movie_rating.py –input /user/root/u.join -output /user/root/join_streaming_out

View results

• [root@sandbox lab]# hadoop fs -ls /user/root/join_streaming_out

• [root@sandbox lab]# hadoop fs –cat /user/root/join_streaming_out/part-00000 | tail -n 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hello and Welcome to my Portfolio Folder!

Mapreduce program count top 10 words in a large text corpus.

Run Hadoop streaming wordcount

View results

Mapreduce program find movie name and rating greater than 3 from the input files

Execution of the program

Run Hadoop streaming Movie Rating Count

View results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
mapper_word_counts.py		mapper_word_counts.py
reducer_word_counts.py		reducer_word_counts.py
shakespeare_100.txt		shakespeare_100.txt
u.data		u.data
u.item.less		u.item.less
u.join		u.join
wc_mapper_movie_rating.py		wc_mapper_movie_rating.py
wc_reducer_movie_rating.py		wc_reducer_movie_rating.py

NilufaYeasmin/MapReduce

Folders and files

Latest commit

History

Repository files navigation

Hello and Welcome to my Portfolio Folder!

Mapreduce program count top 10 words in a large text corpus.

Run Hadoop streaming wordcount

View results

Mapreduce program find movie name and rating greater than 3 from the input files

Execution of the program

Run Hadoop streaming Movie Rating Count

View results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages