Skip to content

Commit

Permalink
(Mostly) Automate data migration fixture import process (#1012)
Browse files Browse the repository at this point in the history
* (Mostly) Automate data migration fixture import process

Also, improve documentation of the process.

* Update docs/test_data_in_production.md

Co-authored-by: Chuck McCallum <mccalluc@users.noreply.github.com>

* Update docs/test_data_in_production.md

Co-authored-by: Chuck McCallum <mccalluc@users.noreply.github.com>

---------

Co-authored-by: Tyler Wade <81339170+Twade968@users.noreply.github.com>
Co-authored-by: Chuck McCallum <mccalluc@users.noreply.github.com>
  • Loading branch information
3 people authored Feb 27, 2023
1 parent 17e1015 commit 963609e
Show file tree
Hide file tree
Showing 7 changed files with 31 additions and 79 deletions.
1 change: 0 additions & 1 deletion app/models/pdc_metadata/resource.rb
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,6 @@ class << self
def new_from_jsonb(jsonb_hash)
resource = PDCMetadata::Resource.new
return resource if jsonb_hash.blank?

set_basics(resource, jsonb_hash)
set_curator_controlled_metadata(resource, jsonb_hash)
set_additional_metadata(resource, jsonb_hash)
Expand Down
3 changes: 3 additions & 0 deletions app/services/s3_query_service.rb
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,9 @@ def find_s3_file(filename:)
# Retrieve the S3 resources uploaded to the S3 Bucket
# @return [Array<S3File>]
def client_s3_files(reload: false, bucket_name: self.bucket_name)
# Allow migration objects to load locally with an AWS key
return [] if Rails.env.development?

@client_s3_files = nil if reload # force a reload
@client_s3_files ||= begin
start = Time.zone.now
Expand Down
73 changes: 19 additions & 54 deletions docs/test_data_in_production.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,57 +19,22 @@ Work with RDOS to ensure we're identifying representative samples. We want to co
These will need to be kept updated with changes to the UI, changes to the metadata schema, etc.

### 3. Create or refresh the work in staging
Enter a `byebug` at the end of the system spec for the sample work you want to refresh, then run the test. It will drop you into a prompt where you can get the JSON export of the work.
```ruby
(byebug) bitklavier_work.to_json
"{\"titles\":[{\"title\":\"bitKlavier Grand Sample Library—Binaural Mic Image\",\"title_type\":null}],\"description\":\"The bitKlavier Grand consists of sample collections of a new Steinway D grand piano from nine different stereo mic images, with: 16 velocity layers, at every minor 3rd (starting at A0); Hammer release samples; Release resonance samples; Pedal samples. Release packages at 96k/24bit, 88.2k/24bit, 48k/24bit, 44.1k/16bit are available for various applications.\\r\\n Piano Bar: Earthworks—omni-directionals. This microphone system suspends omnidirectional microphones within the piano. The bar is placed across the harp near the hammers and provides a low string / high string player’s perspective. It also produces a close sound without room or lid interactions. It can be panned across an artificial stereophonic perspective effectively in post-production. File Naming Convention: C4 = middle C. Main note names: [note name][octave]v[velocity].wav -- e.g., “D#5v13.wav”. Release resonance notes: harm[note name][octave]v[velocity].wav -- e.g., “harmC2v2.wav”. Hammer samples: rel[1-88].wav (one per key) -- e.g., “rel23.wav”. Pedal samples: pedal[D/U][velocity].wav -- e.g., “pedalU2.wav” =\\u003e pedal release (U = up), velocity = 2 (quicker release than velocity = 1).\\r\\n This dataset is too large to download directly from this item page. You can access and download the data via Globus (See https://www.youtube.com/watch?v=uf2c7Y1fiFs for instructions on how to use Globus).\",\"collection_tags\":[],\"creators\":[{\"value\":\"Trueman, Daniel\",\"name_type\":\"Personal\",\"given_name\":\"Daniel\",\"family_name\":\"Trueman\",\"identifier\":null,\"affiliations\":[],\"sequence\":1},{\"value\":\"Wang, Matthew\",\"name_type\":\"Personal\",\"given_name\":\"Matthew\",\"family_name\":\"Wang\",\"identifier\":null,\"affiliations\":[],\"sequence\":2},{\"value\":\"Villalta, Andrés\",\"name_type\":\"Personal\",\"given_name\":\"Andrés\",\"family_name\":\"Villalta\",\"identifier\":null,\"affiliations\":[],\"sequence\":3},{\"value\":\"Chou, Katie\",\"name_type\":\"Personal\",\"given_name\":\"Katie\",\"family_name\":\"Chou\",\"identifier\":null,\"affiliations\":[],\"sequence\":4},{\"value\":\"Ayres, Christien\",\"name_type\":\"Personal\",\"given_name\":\"Christien\",\"family_name\":\"Ayres\",\"identifier\":null,\"affiliations\":[],\"sequence\":5}],\"resource_type\":\"Dataset\",\"resource_type_general\":\"DATASET\",\"publisher\":\"Princeton University\",\"publication_year\":\"2021\",\"ark\":\"88435/dsp015999n653h\",\"doi\":\"10.34770/r75s-9j74\",\"rights\":{\"identifier\":\"CC BY\",\"uri\":\"https://creativecommons.org/licenses/by/4.0/\",\"name\":\"Creative Commons Attribution 4.0 International\"},\"version_number\":\"1\",\"keywords\":[]}"
```

Then, ssh to the server where you want to create this sample work as the `deploy` user. Open a rails console and make the object there, using the JSON you can copy and paste from your `byebug` prompt:

```ruby
ssh deploy@pdc-discovery-staging1.princeton.edu
Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 4.15.0-39-generic x86_64)

* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
Last login: Fri Sep 16 19:39:27 2022 from 172.20.203.93
deploy@pdc-discovery-staging1:~$ cd /opt/pdc_discovery/current
deploy@pdc-discovery-staging1:/opt/pdc_discovery/current$ RAILS_ENV=staging bundle exec rails c
Loading staging environment (Rails 6.1.5.1)
irb(main):001:0>
irb(main):002:0> bitklavier_resource = PDCMetadata::Resource.new_from_jsonb(JSON.parse("{\"titles\":[{\"title\":\"bitKlavier Grand Sample Library—Binaural Mic Image\",\"title_type\
":null}],\"description\":\"The bitKlavier Grand consists of sample collections of a new Steinway D grand piano from nine different stereo mic images, with: 16 velocity
layers, at every minor 3rd (starting at A0); Hammer release samples; Release resonance samples; Pedal samples. Release packages at 96k/24bit, 88.2k/24bit, 48k/24bit, 44
.1k/16bit are available for various applications.\\r\\n Piano Bar: Earthworks—omni-directionals. This microphone system suspends omnidirectional microphones within the
piano. The bar is placed across the harp near the hammers and provides a low string / high string player’s perspective. It also produces a close sound without room or
lid interactions. It can be panned across an artificial stereophonic perspective effectively in post-production. File Naming Convention: C4 = middle C. Main note names:
[note name][octave]v[velocity].wav -- e.g., “D#5v13.wav”. Release resonance notes: harm[note name][octave]v[velocity].wav -- e.g., “harmC2v2.wav”. Hammer samples: rel[
1-88].wav (one per key) -- e.g., “rel23.wav”. Pedal samples: pedal[D/U][velocity].wav -- e.g., “pedalU2.wav” =\\u003e pedal release (U = up), velocity = 2 (quicker rele
ase than velocity = 1).\\r\\n This dataset is too large to download directly from this item page. You can access and download the data via Globus (See https://www.yout
ube.com/watch?v=uf2c7Y1fiFs for instructions on how to use Globus).\",\"collection_tags\":[],\"creators\":[{\"value\":\"Trueman, Daniel\",\"name_type\":\"Personal\",\"g
iven_name\":\"Daniel\",\"family_name\":\"Trueman\",\"identifier\":null,\"affiliations\":[],\"sequence\":1},{\"value\":\"Wang, Matthew\",\"name_type\":\"Personal\",\"giv
en_name\":\"Matthew\",\"family_name\":\"Wang\",\"identifier\":null,\"affiliations\":[],\"sequence\":2},{\"value\":\"Villalta, Andrés\",\"name_type\":\"Personal\",\"give
n_name\":\"Andrés\",\"family_name\":\"Villalta\",\"identifier\":null,\"affiliations\":[],\"sequence\":3},{\"value\":\"Chou, Katie\",\"name_type\":\"Personal\",\"given_n
ame\":\"Katie\",\"family_name\":\"Chou\",\"identifier\":null,\"affiliations\":[],\"sequence\":4},{\"value\":\"Ayres, Christien\",\"name_type\":\"Personal\",\"given_name\":\"Christien\",\"family_name\":\"Ayres\",\"identifier\":null,\"affiliations\":[],\"sequence\":5}],\"resource_type\":\"Dataset\",\"resource_type_general\":\"DATASET\",\"publisher\":\"Princeton University\",\"publication_year\":\"2021\",\"ark\":\"88435/dsp015999n653h\",\"doi\":\"10.34770/r75s-9j74\",\"rights\":{\"identifier\":\"CC BY\",\"uri\":\"https://creativecommons.org/licenses/by/4.0/\",\"name\":\"Creative Commons Attribution 4.0 International\"},\"version_number\":\"1\",\"keywords\":[]}"))
=>
#<PDCMetadata::Resource:0x000055ad4be0e428
...
irb(main):003:0> work = Work.new(resource: bitklavier_resource)
=>
#<Work:0x000055ad4cea22a8
...
irb(main):004:0> work.collection = Collection.research_data
=>
#<Collection:0x000055ad4d2497d0
...
irb(main):013:0> work.created_by_user_id = User.find_by_uid('bs3097').id
=> 3
work.state = 'draft'
=> "draft"
irb(main):005:0> work.save
=> true
```

The work should now be visible in the application.
1. Regenerate the json representations of the migration data by running:
```
DATA_MIGRATION=true bundle exec rspec spec/system/data_migration
```

2. Copy the newly created .json files to the server where you want to load them:
```
scp tmp/data_migration/*.json deploy@pdc_describe_staging1.princeton.edu:/tmp
```

3. Then, ssh to the server where you want to create this sample work as the `deploy` user.
4. Run the import rake task, specifying the location of the .json files and the netid of the user they should import as:
```
bundle exec rake works:import_works\[/path/to/json/files,bs3097]
```
Note the backslash before the square brackets, and no space after the comma.

The work should now be visible in the application.
3. The works will be in a draft state. If you are testing the migration process, at this point you should attach the payload files and mark the record "ready for review." Then our QA checkers in RDOS and PPPL should be able to review them.
18 changes: 0 additions & 18 deletions lib/tasks/sample_data.rake

This file was deleted.

13 changes: 7 additions & 6 deletions lib/tasks/works.rake
Original file line number Diff line number Diff line change
Expand Up @@ -61,22 +61,23 @@ namespace :works do

# See https://github.com/pulibrary/pdc_describe/blob/main/docs/test_data_in_production.md for
# more information.
# Example: rake works:import_works\[/Users/bess/projects/pdc_describe/tmp/data_migration,bs3097]
desc "Imports works from JSON data"
task :import_works, [:path] => :environment do |_, args|
task :import_works, [:path, :uid] => :environment do |_, args|
if args[:path].blank?
puts "Usage: bundle exec rake works:import_works\\[path_to_json_files]"
puts "Usage: bundle exec rake works:import_works\\[path_to_json_files,uid]"
exit 1
end
path = File.join(args[:path], "*.json")
approver = User.first
approver = User.find_or_create_by uid: args[:uid]
puts "Importing files from: #{path}"
Dir.glob(path).each do |file_name|
hash = JSON.parse(File.read(file_name))
resource = PDCMetadata::Resource.new_from_jsonb(hash)
resource = PDCMetadata::Resource.new_from_jsonb(hash["resource"])
work = Work.new(resource: resource)
work.collection = Collection.research_data
work.collection = Collection.where(code: hash["collection"]["code"]).first
work.created_by_user_id = approver.id
work.state = "approved"
work.state = "draft"
work.save
puts "\t#{file_name}"
end
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@
datacite = PDCSerialization::Datacite.new_from_work(attention_work)
expect(datacite.valid?).to eq true
expect(datacite.to_xml).to be_equivalent_to(File.read("spec/system/data_migration/attention.xml"))
export_spec_data("attention.json", attention_work.to_json)
end
end
end
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
expect(bitklavierimage_work.resource.collection_tags).to eq collection_tags
expect(bitklavierimage_work.collection).to eq Collection.research_data
expect(bitklavierimage_work.ark).to eq ark
export_spec_data("baldwin.json", bitklavierimage_work.to_json)

# Ensure the datacite record produced validates against our local copy of the datacite schema.
# This will allow us to evolve our local datacite standards and test our records against them.
Expand Down

0 comments on commit 963609e

Please sign in to comment.