Web crawling refers to a program that searches the internet for web pages. The goal of web crawling is to analyze and persist the examined web pages.
This project work should address the mentioned problem. Therefore, it is important that the problem has been understood and feeds directly into all elements of the software development process. The objective of the project work is to develop a scalable and high-performance architecture for a web crawler, which enables efficient processing of large data volumes.
The overall architecture should be based on microservices to ensure easy horizontal and vertical scalability of the system. A resulting architecture concept should be implemented after completion. The implementation is to be realised using modules of the Spring framework, which already provide extensive mechanisms as a basis for achieving the goal of a stable and scalable application
Chapter after chapter, you'll build, containerize, and deploy cloud native applications. Along the journey, you will need the following software installed.
- Java 17+
- OpenJDK: Eclipse Temurin
- GraalVM: GraalVM
- JDK Management: SDKMAN
- Docker 20.10+
- Kubernetes 1.27+
- Other
Clone the project
git clone --recursive https://github.com/avollmaier/hypercrawler.git
Go to the project directory
cd hypercrawler
Create a minikube kubernetes cluster (check out the hypercrawler-deployment project)
cd hypercrawler-deployment/kubernetes/platform/
./create-cluster.sh
Start the tilt server for fast local deployment
cd ../development
tilt up