Skip to content

Using PySpark framework(a distributed cluster computing framework) to answer queries from a massive dataset.

Notifications You must be signed in to change notification settings

swetanjal/PySpark-Queries

Repository files navigation

Answering Queries using PySpark Framework

Usage:

python3 pyspark_no.py <Output CSV File Name> <Number of CPUs>

where no = 1, 2, or 3 for Question 1, 2 and 3 respectively.

Instructions:

  • Refer Assignment-5.pdf for a detailed description of the queries a user can ask.
  • Refer Dataset directory for details on the dataset which is used to answer the user's queries.

About

Using PySpark framework(a distributed cluster computing framework) to answer queries from a massive dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages