Hello and Welcome to my Portfolio Folder!

PySpark program to count Odd & Even from Integer List.

** integer_list - The input data file to count the Odd & Even number from Integer List

Submit a job to find Odd & Even Count from spark cluster without using the shell

spark-submit --master yarn-client --executor-memory 512m --num-executors 3 --executor-cores 1 --driver-memory 512m Odd_Even_Count.py

Commands on Pyspark Shell for Odd & Even :

textFile = sc.textFile("integer_list.txt") test = textFile.map(lambda x: x.strip()) ints = test.map(lambda x: int (x)) odd_rdd = ints.filter(lambda x: x % 2 != 0).count() ints.take(10) print(odd_rdd) even_rdd = ints.filter(lambda x: x % 2 == 0).count() print "even number -> %s" % (even_rdd) print "even number -> %s\n" % (even_rdd) + "odd number -> %s" % (odd_rdd) print(even_rdd)

PySpark program to find Salary Sum Per Department from Dept Salary Input Data.

** dept_salary - The input data file to find Salary Sum Per Department from Dept Salary Input Data

Submit a job to find Salary Sum Per Department from spark cluster without using the shell

spark-submit --master yarn-client --executor-memory 512m --num-executors 3 --executor-cores 1 --driver-memory 512m Dept_Average_Salary.py

Inspecting the output

[root@sandbox lab]# hadoop fs -ls /user/root/dept_sum.txt [dept_sum output file initialized from PySpark Script]

PySpark program to find Top Word count from large text corpus

** shakespeare_100.txt - The input data file to count the occurences of each word

Submit a job to find Top Word count from spark cluster without using the shell

spark-submit --master yarn-client --executor-memory 512m --num-executors 3 --executor-cores 1 --driver-memory 512m Top_Words_Count.py

Inspecting the output

[root@sandbox lab]# hadoop fs -ls /user/root/Top_word_count_result.txt [output file initialized from PySpark Script]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hello and Welcome to my Portfolio Folder!

PySpark program to count Odd & Even from Integer List.

Submit a job to find Odd & Even Count from spark cluster without using the shell

Commands on Pyspark Shell for Odd & Even :

PySpark program to find Salary Sum Per Department from Dept Salary Input Data.

Submit a job to find Salary Sum Per Department from spark cluster without using the shell

Inspecting the output

PySpark program to find Top Word count from large text corpus

Submit a job to find Top Word count from spark cluster without using the shell

Inspecting the output

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Dept_Average_Salary.py		Dept_Average_Salary.py
Odd_Even_Count.py		Odd_Even_Count.py
README.md		README.md
Top_Words_Count.py		Top_Words_Count.py
dept_salary.txt		dept_salary.txt
integer_list.txt		integer_list.txt
shakespeare_100.txt		shakespeare_100.txt

NilufaYeasmin/APACHE-SPARK

Folders and files

Latest commit

History

Repository files navigation

Hello and Welcome to my Portfolio Folder!

PySpark program to count Odd & Even from Integer List.

Submit a job to find Odd & Even Count from spark cluster without using the shell

Commands on Pyspark Shell for Odd & Even :

PySpark program to find Salary Sum Per Department from Dept Salary Input Data.

Submit a job to find Salary Sum Per Department from spark cluster without using the shell

Inspecting the output

PySpark program to find Top Word count from large text corpus

Submit a job to find Top Word count from spark cluster without using the shell

Inspecting the output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages