sustain-spark-connector

Provides an interface to run Spark jobs on, using MongoDB as a source of data.

Description

This project is written in Scala, and is Scala Build Tool (SBT) compliant. Currently, it only contains a single source code file, src/main/scala/SustainConnector.scala, which imports a small dataset from the Mongo database and trains/tests a simple linear regression model on the hospital area population to hospital beds data. Ultimately, this project will provide an interface for users to request Apache Spark jobs to be run on Documents/Collections pulled in from the SUSTAIN database (as DataFrames), allowing valuable insights to be calculated using the performance advantages that Spark's in-memory framework provides.

Usage

Compile the project: sbt compile
Run the project: sbt run

Note By default, the project uses the standalone Spark master (local) with 4 cores allocated. If you wish to modify this, perhaps to use a managed cluster, the SparkSessionBuilder's .master() must be changed to point to your cluster manager.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
project		project
src		src
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sustain-spark-connector

Description

Usage

About

Releases

Packages

Languages

Project-Sustain/sustain-spark-connector

Folders and files

Latest commit

History

Repository files navigation

sustain-spark-connector

Description

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages