Alert Simulation and Remediation
is an advanced monitoring and alerting system designed to help manage alerts from deployments effectively. This project aims to provide a comprehensive solution for simulating various system environments, evaluating alerts, providing remediation recommendations, and delivering real-time notifications and insights.
Simulator
: Simulates various system environments, such as high CPU load, network load, low memory availability, and high disk usage, by creating multiple goroutines, sending HTTP requests, allocating memory, and writing files.
Rule Engine
: Evaluates alerts based on predefined rules and provides remediation recommendations.Prometheus and Grafana Stack
: Fetches and visualizes system metrics using Prometheus and Grafana.Kafka Integration
: Utilizes Kafka for communication between the rule engine and simulator.
Mail Server
: Sends email notifications for critical alerts.Real-time Notifications
: Leverages Redis Streams and Server-Sent Events (SSE) to deliver real-time alert notifications to the frontend dashboard.ASMR QueryBot
: Implements a chatbot powered by Large Language Models (LLMs) to provide interactive insights and answer user queries related to alerts and system performance.MongoDB Vector Search
: Stores alert data as vectors using MongoDB, enabling efficient searching and querying with LLMs using the LlamaIndex framework.
Dockerized
: The entire project is containerized using Docker, ensuring consistent deployment across different environments.