Skip to content

Latest commit

 

History

History
35 lines (24 loc) · 1.51 KB

README.md

File metadata and controls

35 lines (24 loc) · 1.51 KB

AWS NLP Data Pipeline

Ingest real-time streaming text data with automatic appending of NLP metadata

Architecture Kibana Dashboard

Overview

This project represents a mostly serverless data engineering architecture for ingesting real-time streaming data and automatically appending NLP metadata via managed AWS services. The project may serve as a baseline for implementing complex ingestion pipelines powering NLP services.

The following AWS services are leveraged:

Deployment

This project leverages GitHub Actions for its CI/CD pipeline. If forking, you can deploy via your own Actions by providing the following Secrets in your repository:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_REGION_ID
  • IP_ADDRESS

Example

A dataset for demonstration purposes has been provided. Use the following script to send example data to the Ingest Lambda for processing.

python stream.py