Skip to content

steren/datastore-cleaner

Repository files navigation

datastore-cleaner

Automatically clean up old Google Cloud Datastore entities.

Serverless microservice that deletes Cloud Datastore entities that are older than a certain age (a year by default), to be run as a cron job every day (using Cloud Scheduler)

+-------------------+    +-------------+    +-------------------+
|  Cloud Scheduler  | -> |  Cloud Run  | -> |  Cloud Datastore  |
+-------------------+    +-------------+    +-------------------+

This uses a Cloud Run microservice that is privately invoked every day by a Cloud Scheduler job. When invoked, it queries your Cloud Datastore for entities that are older than a year and deletes them. The entity name and attribute containing the date must be provided as parameters. The number of entities and the time period can be customized.

Deploy in your project

  1. Export your project ID as an environment variable. The rest of this setup assumes this environment variable is set.

    export PROJECT_ID="my-project"
  2. Build into a container image using Cloud Build:

    gcloud builds submit . --tag gcr.io/$PROJECT_ID/datastore-cleaner
  3. Deploy the image to Cloud Run:

    gcloud run deploy datastore-cleaner --image gcr.io/$PROJECT_ID/datastore-cleaner --platform managed --region us-central1 --no-allow-unauthenticated
  4. Create a service account with permission to invoke the Cloud Run service:

    gcloud iam service-accounts create "datastore-cleaner-invoker" \
      --project "${PROJECT_ID}" \
      --display-name "datastore-cleaner-invoker"
    gcloud run services add-iam-policy-binding "datastore-cleaner" \
      --project "${PROJECT_ID}" \
      --platform "managed" \
      --region "us-central1" \
      --member "serviceAccount:datastore-cleaner-invoker@${PROJECT_ID}.iam.gserviceaccount.com" \
      --role "roles/run.invoker"
  5. Create a Cloud Scheduler HTTP job to invoke the microservice every day:

    1. Capture the URL of the Cloud Run service:

      export SERVICE_URL=$(gcloud run services describe datastore-cleaner --project ${PROJECT_ID} --platform managed --region us-central1 --format 'value(status.url)')
    2. Create the job that will run every day at 3am.

      Make sure to replace "myentity" by the name of the Datastore entity you wish to clean up, e.g. "Events". And to replace "createdOn" by the attribute that contains the date.

      gcloud scheduler jobs create http "datastore-cleaner-myentity" \
      --uri "${SERVICE_URL}?entity=myentity&attribute=createdOn" \
      --http-method POST \
      --project ${PROJECT_ID} \
      --description "Cleanup myentity" \
      --oidc-service-account-email "datastore-cleaner-invoker@${PROJECT_ID}.iam.gserviceaccount.com" \
      --oidc-token-audience "${SERVICE_URL}" \
      --schedule "0 3 * * *"

    You can create multiple Cloud Scheduler jobs against the same Cloud Run service with different parameters to clean-up different entities.

  6. (Optional) Run the scheduled job now:

    gcloud scheduler jobs run "datastore-cleaner-myentity" \
      --project "${PROJECT_ID}"

    Note: for initial job deployments, you must wait a few minutes before invoking.

Run as a script

In addition to being available as a service, you can execute datastore-cleaner as a script (not listening for web requests)

  1. Build with docker build -f run.Dockerfile -t datastore-cleaner .
  2. Run with docker run datastore-cleaner entity attribute days limit, replacing entity attribute days limit with the parameters described below.

Parameters

  • entity: Name of the Datastore entity to cleanup, e.g. Book
  • attribute: Name of the entity attribute that contains the date to check on, e.g. createdOn
  • days (Optional, default: 365): number of days after which the data should be deleted
  • limit (Optional, default: 10): number of entities to delete each time the service is invoked.

Cost

Because Cloud Run bills only when a request is running and this microservice runs once a day for less than a second, the cost is basically 0.