Automatically clean up old Google Cloud Datastore entities.
Serverless microservice that deletes Cloud Datastore entities that are older than a certain age (a year by default), to be run as a cron job every day (using Cloud Scheduler)
+-------------------+ +-------------+ +-------------------+
| Cloud Scheduler | -> | Cloud Run | -> | Cloud Datastore |
+-------------------+ +-------------+ +-------------------+
This uses a Cloud Run microservice that is privately invoked every day by a Cloud Scheduler job. When invoked, it queries your Cloud Datastore for entities that are older than a year and deletes them. The entity name and attribute containing the date must be provided as parameters. The number of entities and the time period can be customized.
-
Export your project ID as an environment variable. The rest of this setup assumes this environment variable is set.
export PROJECT_ID="my-project"
-
Build into a container image using Cloud Build:
gcloud builds submit . --tag gcr.io/$PROJECT_ID/datastore-cleaner
-
Deploy the image to Cloud Run:
gcloud run deploy datastore-cleaner --image gcr.io/$PROJECT_ID/datastore-cleaner --platform managed --region us-central1 --no-allow-unauthenticated
-
Create a service account with permission to invoke the Cloud Run service:
gcloud iam service-accounts create "datastore-cleaner-invoker" \ --project "${PROJECT_ID}" \ --display-name "datastore-cleaner-invoker"
gcloud run services add-iam-policy-binding "datastore-cleaner" \ --project "${PROJECT_ID}" \ --platform "managed" \ --region "us-central1" \ --member "serviceAccount:datastore-cleaner-invoker@${PROJECT_ID}.iam.gserviceaccount.com" \ --role "roles/run.invoker"
-
Create a Cloud Scheduler HTTP job to invoke the microservice every day:
-
Capture the URL of the Cloud Run service:
export SERVICE_URL=$(gcloud run services describe datastore-cleaner --project ${PROJECT_ID} --platform managed --region us-central1 --format 'value(status.url)')
-
Create the job that will run every day at 3am.
Make sure to replace "myentity" by the name of the Datastore entity you wish to clean up, e.g. "Events". And to replace "createdOn" by the attribute that contains the date.
gcloud scheduler jobs create http "datastore-cleaner-myentity" \ --uri "${SERVICE_URL}?entity=myentity&attribute=createdOn" \ --http-method POST \ --project ${PROJECT_ID} \ --description "Cleanup myentity" \ --oidc-service-account-email "datastore-cleaner-invoker@${PROJECT_ID}.iam.gserviceaccount.com" \ --oidc-token-audience "${SERVICE_URL}" \ --schedule "0 3 * * *"
You can create multiple Cloud Scheduler jobs against the same Cloud Run service with different parameters to clean-up different entities.
-
-
(Optional) Run the scheduled job now:
gcloud scheduler jobs run "datastore-cleaner-myentity" \ --project "${PROJECT_ID}"
Note: for initial job deployments, you must wait a few minutes before invoking.
In addition to being available as a service, you can execute datastore-cleaner as a script (not listening for web requests)
- Build with
docker build -f run.Dockerfile -t datastore-cleaner .
- Run with
docker run datastore-cleaner entity attribute days limit
, replacingentity attribute days limit
with the parameters described below.
entity
: Name of the Datastore entity to cleanup, e.g.Book
attribute
: Name of the entity attribute that contains the date to check on, e.g.createdOn
days
(Optional, default: 365): number of days after which the data should be deletedlimit
(Optional, default: 10): number of entities to delete each time the service is invoked.
Because Cloud Run bills only when a request is running and this microservice runs once a day for less than a second, the cost is basically 0.