From 07c1062aa84271c730c236f402dbb2180ab95262 Mon Sep 17 00:00:00 2001 From: Alex Steel <130377221+asteel-gsa@users.noreply.github.com> Date: Tue, 24 Sep 2024 08:13:29 -0400 Subject: [PATCH 1/2] Bump automerge to v0.16.4 (#4311) --- .github/workflows/auto-merge-staging-pr.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/auto-merge-staging-pr.yml b/.github/workflows/auto-merge-staging-pr.yml index 2368a5abaf..d3e3e56901 100644 --- a/.github/workflows/auto-merge-staging-pr.yml +++ b/.github/workflows/auto-merge-staging-pr.yml @@ -12,7 +12,7 @@ jobs: steps: - id: automerge name: Auto Merge a PR with the correct labels - uses: pascalgn/automerge-action@v0.16.3 + uses: pascalgn/automerge-action@v0.16.4 env: GITHUB_TOKEN: ${{ secrets.DEPLOY_TOKEN }} MERGE_LABELS: "automerge,autogenerated" From a4dabf08b782440cb5a527ff1524c12fb03326d1 Mon Sep 17 00:00:00 2001 From: "Hassan D. M. Sambo" Date: Tue, 24 Sep 2024 10:17:11 -0400 Subject: [PATCH 2/2] #2848 Converted ticket 2848 to an ADR (#4315) --- ...ing-logging-and-transformation-tracking.md | 106 ++++++++++++++++++ 1 file changed, 106 insertions(+) create mode 100644 docs/architecture/decisions/0040-data-migration-implementing-logging-and-transformation-tracking.md diff --git a/docs/architecture/decisions/0040-data-migration-implementing-logging-and-transformation-tracking.md b/docs/architecture/decisions/0040-data-migration-implementing-logging-and-transformation-tracking.md new file mode 100644 index 0000000000..c11596fc67 --- /dev/null +++ b/docs/architecture/decisions/0040-data-migration-implementing-logging-and-transformation-tracking.md @@ -0,0 +1,106 @@ +# 40. Data Migration: Implementing Strategic Logging and Transformation Tracking + +Date: 2024-09-24 + +## Status + +Accepted + +## Areas of impact + +- [ ] Compliance +- [ ] Content +- [ ] CX +- [ ] Design +- [x] Engineering +- [ ] Policy +- [ ] Product +- [x] Process +- [ ] UX + +## Related documents/links + +[ADR for migrating data from Census to GSA](https://github.com/GSA-TTS/FAC/issues/2789) +[ADR for Census Data Migration: Iterative Approach](https://github.com/GSA-TTS/FAC/issues/2919) +[#2816](https://github.com/GSA-TTS/FAC/issues/2816) +[#2815](https://github.com/GSA-TTS/FAC/issues/2815) + +## Context +The FAC data migration from Census to GSA follows an iterative approach summarized as follows: The migration algorithm is executed, logging details of both successful and unsuccessful migrations. The team then examines these logs, updates the algorithm to address failures, and re-runs the migration for the failed reports. If data transformation is needed for these reports to pass, the algorithm is updated accordingly, and the process is repeated until the migration is complete. This ADR focuses primarily on logging (successful/failed migrations and transformations). For a broader view of the migration process, refer to ADR linked in the previous section. + +## Decision +There are three key logging aspects in the historical data migration process: + +1. **General Logging:** + This includes standard logs such as info and debug logs, which are common in most algorithms to provide useful insights during execution. Given the critical nature of this migration task, it's essential to ensure these logs are well-placed and written to aid debugging when needed. + +2. **Logging Migration Attempts (Success/Failure):** + Each migration run will result in some reports successfully migrating and others failing. It's important to record adequate details of both to filter out successfully migrated reports in subsequent iterations. To track this, we'll use a Django model similar to the following: + + ```python + class ReportMigrationStatus(models.Model): + audit_year = models.TextField(blank=True, null=True) + dbkey = models.TextField(blank=True, null=True) + run_datetime = models.DateTimeField(default=timezone.now) + migration_status = models.TextField(blank=True, null=True) + ``` + `audit_year` and `dbkey` will store the AUDITYEAR and DBKEY, respectively. `run_datetime` will record the migration attempt's timestamp, and `migration_status` will indicate whether the report was successfully migrated. + + **MigrationErrorDetail Model:** + This model will link to `ReportMigrationStatus` via a foreign key relationship, enabling a direct association between a migration attempt and its corresponding error details. + ```python + class MigrationErrorDetail(models.Model): + report_migration_status_id = models.ForeignKey(ReportMigrationStatus, on_delete=models.CASCADE) + tag = models.TextField(blank=True, null=True) + exception_class = models.TextField(blank=True, null=True) + detail = models.TextField(blank=True, null=True) + ``` + `tag` will provide a tag for the error, `exception_class` will provide the class of the error, 'detail' will contain the full description of the error. + +3. **Logging Data Transformations (Change Logs):** + This logging is crucial for documenting any modifications to the data throughout the migration process. It is vital to record the state of the data both before and after transformation, along with any other pertinent details for auditing purposes. The following Django model is utilized to track these changes: + ```python + class MigrationInspectionRecord(models.Model): + audit_year = models.TextField(blank=True, null=True) + dbkey = models.TextField(blank=True, null=True) + report_id = models.TextField(blank=True, null=True) + run_datetime = models.DateTimeField(default=timezone.now) + finding_text = JSONField(blank=True, null=True) + additional_uei = JSONField(blank=True, null=True) + additional_ein = JSONField(blank=True, null=True) + finding = JSONField(blank=True, null=True) + federal_award = JSONField(blank=True, null=True) + cap_text = JSONField(blank=True, null=True) + note = JSONField(blank=True, null=True) + passthrough = JSONField(blank=True, null=True) + general = JSONField(blank=True, null=True) + secondary_auditor = JSONField(blank=True, null=True) + ``` + The team has opted for this model as it appears more flexible and allows the use of nested JSON objects within `census_data` to represent the relationships between tables, columns, and data values. Below is a sample: + + ```json + { + "census_data": [ + { + "column": "sample_col1", + "value": "sample_data1" + }, + { + "column": "sample_col2", + "value": "sample_data2" + } + ], + "gsa_fac_data": { + "field": "sample_gsa_fac_field", + "value": "sample_value" + }, + "transformation_functions": "function_name" + } + ``` + +**Database Tables Strategy** +The GSA/FAC application relies on a default database for all its data persistence requirements. During the migration process, there was a consideration to integrate an additional database into the application's infrastructure. This new database was intended to host tables that might not stay active post-migration. However, anticipating scenarios where re-running the data migration could become necessary, the decision was made to maintain all data in a live state for the time being. Consequently, the plan to introduce a second database was set aside. + +## Consequences +- The `ReportMigrationStatus` model tracks migration status and makes it possible to compute the success-to-failure ratio after each iteration. Ideally, the failure rate should be 0% after all iterations for a given audit year. +- The `MigrationInspectionRecord` model provides a comprehensive record of data changes for audit purposes.