Skip to content
This repository has been archived by the owner on May 24, 2024. It is now read-only.

Data Migration from v1 to v2 #14

Open
13 of 18 tasks
dealako opened this issue Nov 18, 2019 · 0 comments
Open
13 of 18 tasks

Data Migration from v1 to v2 #14

dealako opened this issue Nov 18, 2019 · 0 comments
Assignees
Labels
04 - Med Low Medium-Low Priority (Lower than Medium; higher than Low)
Milestone

Comments

@dealako
Copy link
Member

dealako commented Nov 18, 2019

Summary

Data migration from the existing DynamoDB database to the v2 RDS instance.

Background

The EasyCLA v2 system will need to migrate existing data from the old v1 system into the new database model. This will require a migration script that can read from the existing DynamoDB database from {DEV, STAGING, and PROD} and write to the Aurora RDS PostgreSQL database tables in {DEV, STAGING, and PROD}. We will initially test by exporting data from DynamoDB DEV to a local PostgreSQL instance on the developer's machine. We will work to migrate the CLA specific data including signatures, permissions, and other metadata. We will not transfer data that is currently duplicated in our 'system of record' database: Salesforce.

User Stories

  1. As a developer, I want to leverage a script to export data from DynamoDB and import the data into AWS RDS PostgreSQL.
  2. As a developer, I want to re-run the export multiple times without duplicating records in the RDS database.
  3. As a developer, I want to specify the STAGE to export/import - e.g. LOCAL, DEV, STAGING, PROD environments. This will allow us to run a migration in each environment.
  4. As a developer, I want to specify the RDS host, port, user, password and database details. Deployment of the RDS system will require separate connection information for each environment
  5. As a developer, I want to see a report of what was exported/imported. A summary should be provided describing how many records were processed and any errors that occurred. The report should also include how long the process took.
  6. As a developer, I want to run the migration in --dry-run mode which exercises the code but does not import the data.

Tasks

  • Extend the existing tools folder scripts to include a python migration script
  • Include any additional libraries in the requirements.txt file (e.g. dynamodb and postgresql drivers)
  • provide the main routine with command-line options via the click library
  • allow the user to specify the AWS region/credentials
  • allow the user to specify the PostgreSQL database connection details
  • provide a README with documentation on how to set up and run the tool with working examples (don't show credentials)
  • Extract the data from the signatures table
  • Extract the data from the companies table
  • Extract the data from the projects table
  • Extract the data from the users table
  • Extract the data from the user-permissions table
  • Extract the data from the repositories table
  • Extract the data from the company-invitations table
  • Extract the data from the github-orgs table
  • Extract the data from the gerrit-instances table
  • Import data into the relevant RDS tables (schema is TODO for some of the tables)
  • Provide a migration report of what was exported/imported
  • Private a report indicating how long the process took (this will give us a gauge on how long it will take for other environments).

Acceptance Criteria

The "done" criteria:

  1. DEV data is migrated from v1 to v2 migrated.
  2. Demonstrate the set of capabilities to the product team while the code is
    running in the DEV environment.

References

See @dealako for script setup examples and usage of existing v1 python models.

@dealako dealako added the 04 - Med Low Medium-Low Priority (Lower than Medium; higher than Low) label Nov 18, 2019
@dealako dealako added this to the Sprint 02 milestone Nov 27, 2019
@dealako dealako modified the milestones: Sprint 02, Sprint 03 Dec 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
04 - Med Low Medium-Low Priority (Lower than Medium; higher than Low)
Projects
None yet
Development

No branches or pull requests

2 participants