Skip to content

The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker

Notifications You must be signed in to change notification settings

nghoanglongde/spark-cluster-with-docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Run Spark Cluster within Docker

Untitled Workspace (7)

This is the implementation of spark cluster on top of hadoop (1 masternode, 2 slaves node) using Docker

Follow this steps on Windows 10

1. clone github repo

# Step 1
https://github.com/nghoanglong/spark-cluster-with-docker.git

# Step 2
cd spark-cluster-with-docker

2. pull docker image

docker pull ghcr.io/nghoanglong/spark-cluster-with-docker/spark-cluster:1.0

3. start cluster

docker-compose up

4. access site

  1. hadoop cluster: http://localhost:50070/
  2. hadoop cluster - resource manager: http://localhost:8088/
  3. spark cluster: https://localhost:8080/
  4. jupyter notebook: https://localhost:8888/
  5. spark history server: http://localhost:18080/
  6. spark job monitoring: http://localhost:4040/

About

The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages