Hypercrawler Project

Web crawling refers to a program that searches the internet for web pages. The goal of web crawling is to analyze and persist the examined web pages.

Description

This project work should address the mentioned problem. Therefore, it is important that the problem has been understood and feeds directly into all elements of the software development process. The objective of the project work is to develop a scalable and high-performance architecture for a web crawler, which enables efficient processing of large data volumes.

The overall architecture should be based on microservices to ensure easy horizontal and vertical scalability of the system. A resulting architecture concept should be implemented after completion. The implementation is to be realised using modules of the Spring framework, which already provide extensive mechanisms as a basis for achieving the goal of a stable and scalable application

Status

config-service

edge-service

manager-service

frontier-service

crawler-service

filter-service

Prerequisites

Chapter after chapter, you'll build, containerize, and deploy cloud native applications. Along the journey, you will need the following software installed.

Java 17+
- OpenJDK: Eclipse Temurin
- GraalVM: GraalVM
  - JDK Management: SDKMAN
Docker 20.10+
Kubernetes 1.27+
- kubectl
- minikube
Other
- HTTPie

Run Locally

Clone the project

  git clone --recursive https://github.com/avollmaier/hypercrawler.git

Go to the project directory

  cd hypercrawler

Create a minikube kubernetes cluster (check out the hypercrawler-deployment project)

  cd hypercrawler-deployment/kubernetes/platform/

  ./create-cluster.sh

Start the tilt server for fast local deployment

  cd ../development

  tilt up

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.idea		.idea
data		data
http		http
hypercrawler-config-service @ 496cd2e		hypercrawler-config-service @ 496cd2e
hypercrawler-crawler-service @ 8c6607c		hypercrawler-crawler-service @ 8c6607c
hypercrawler-deployment @ dcf3a74		hypercrawler-deployment @ dcf3a74
hypercrawler-edge-service @ 5323a6f		hypercrawler-edge-service @ 5323a6f
hypercrawler-filter-service @ 3c1c7bc		hypercrawler-filter-service @ 3c1c7bc
hypercrawler-frontier-service @ 42c0a78		hypercrawler-frontier-service @ 42c0a78
hypercrawler-manager-service @ 64b2f1f		hypercrawler-manager-service @ 64b2f1f
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hypercrawler Project

Description

Status

config-service

edge-service

manager-service

frontier-service

crawler-service

filter-service

Prerequisites

Run Locally

About

Releases

Packages

Languages

avollmaier/hypercrawler

Folders and files

Latest commit

History

Repository files navigation

Hypercrawler Project

Description

Status

config-service

edge-service

manager-service

frontier-service

crawler-service

filter-service

Prerequisites

Run Locally

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages