Matching geography codes with the area names #18

jvillacampa · 2020-01-24T11:07:13Z

As in issue #13. Function that matches a column containing geography codes with their respective area names. I don't think I have done a very good job with the tests and covering potential problems with user's inputs, so any feedback very welcomed.

fixes #13.

jackhannah95 · 2020-01-29T16:52:35Z

Hi @jvillacampa. Just to say that @davidc92 and @lucindalawrie are going to do a review of this and get back to you soon.

davidc92

Hi Jaime!

Have a few initial comments. I'll take another look once you've made these changes and then maybe start looking at the documentation and tests.

Can you add some error handling at the start of the function? Checks that whatever is input is the correct object e.g. is dataset a tibble? If not, throw out an error. Might also be worth running a check on the format of the (presumably) string from code_var. The geography codes are all a standard format so I don't think it would be too difficult to check that the string matches regular expressions for these standard formats.
You need to add the package names before function calls. That way the user doesn't have to have the packages loaded and also mitigates the risk of one packaging masking a function from another. e.g. you would type janitor::clean_names().
Can you be picky over spacing? I.e. add a space after a hash for comments and add blank lines before comments and try and break your code up with blank lines so that it isn't all clumped together? Can you also be stricter with the 80 character limit per line?

Just to note - I looked at the travis output for the checks and it looks like the checks are failing because the unit tests aren't passing. Might be worth a look.

R/match_geo_names.R

jvillacampa · 2020-01-30T13:19:29Z

Cheers David, I will take a look to your comments in the next few days

davidc92

Hi Jaime! Just a couple of small comments here. I tried to test the package on some of my own data, but it errored because you are using the %<>% operator without importing it. You can import it by editing the R/phsmethods.R script and adding @importFrom magrittr %<>% below the other import line. Otherwise the user needs to have magrittr loaded for the function to work. Can you also make sure any package imports you add to the DESCRIPTION file are in alphabetical order?

I'll take a closer look at the code this afternoon :)

DESCRIPTION

R/match_geo_names.R

jvillacampa · 2020-02-06T12:04:17Z

Hi @davidc92, sorry I pushed some changes, but I still need to deal with some of your comments from before! I'll let you know when I have got everything ready. Cheers.

jvillacampa · 2020-02-06T16:54:20Z

Hi @davidc92, thank you for all the comments. I think I have dealt with all of them now and is ready for a new round of review.
A couple of points:

The code extracting the info from the API is in the function at the moment, but it won't run as the reference file exists. Not sure how to ensure this file is kept up to date in a low-maintenance way.
The devtools::check fails as during the tests (test_that) it fails as it can't open the reference files I use. Slightly confused about why is this. Could this be solved using the package here?

davidc92 · 2020-02-06T17:08:06Z

Hi @davidc92, thank you for all the comments. I think I have dealt with all of them now and is ready for a new round of review.
A couple of points:

The code extracting the info from the API is in the function at the moment, but it won't run as the reference file exists. Not sure how to ensure this file is kept up to date in a low-maintenance way.

The devtools::check fails as during the tests (test_that) it fails as it can't open the reference files I use. Slightly confused about why is this. Could this be solved using the package here?

Adding here() might actually solve some of the tests failing - give that a go and see if it works. Also, you'll have to update one of the unit tests for file_size(). Line 8 is testing that the number of files in the tests directory is equal to 8, but you have also added a test file so the 8 should become 9 :)

jvillacampa · 2020-02-07T11:07:40Z

Using "here" doesn't help in this case. Apparently this is a well known issue. I have tried the proposed workaround but still fails. I have done a workaround replicating one of the files and taking out one of the tests that I don't think added much.

However, after doing these changes still fails. The problem is that when the function runs it doesn't recognize that the lookup in the reference files folder exists and then it tries to connect to the open data platform, which could be fine, but for some reason it doesn't find the function getURL from RCurl (which is a dependency of SPARQL). I have tried a couple of things, but a bit stuck here.

The tests run perfectly fine manually, fail as described above for devtools::checks(), and for devtools::test() they connect to the open data platform but fail when it comes to the line where it saves the data in the reference file folder.

I have pushed the latest version of what I was trying, but any help/advice welcome.

jackhannah95 · 2020-04-06T17:05:05Z

Hi @jvillacampa,

Sorry, I have gone overboard with this. The more I thought about it, the more I thought that requiring a data frame to be provided to the function probably meant that it wasn't any less effort than just doing your own join to a lookup file. Some of the commits relate to small changes to the other functions at least.

I think the ease with a function like this would come with being able to just input directly anything from a single code to a column with a million of them and have it return the names in the same format. Something like:

df %>%
  mutate(area_name = match_area(geo_code))

I also thought it should be able to do the reverse, since both are available in the lookup. Please let me know if you're not happy with anything or have any improvements to suggest. Or if it's all terrible and you preferred the original way.

I took the same approach to saving, storing and calling data as the janeaustenr package. It seems to be working fine, and has the handy benefit that R >= 3.5 isn't required and no duplicates need to be created for unit tests. Again though, let me know if you think anything could be done better.

The SPARQL code extracts codes that have no accompanying area name, but I had to drop them as otherwise entering an NA name returned all those codes. In practice dropping them doesn't make a difference to the function as you intended it, because entering any code that isn't in the lookup dataset returns an NA anyway. It'd be more efficient to edit the SPARQL query to only extract codes with an accompanying area name, rather than bringing them all in and dropping all NA values. I tried editing the query and it was a disaster. Any help you're able to provide with that would be much appreciated.

Cheers,

Jack

jvillacampa · 2020-04-09T10:12:25Z

Hi @jackhannah95, thanks for doing that.

Linking name to code - I thought about it, but the main problem is that names are not unequivocal. I really don't think the way the function works would provide useful output. Or at least I don't think I would use to get codes from names as it would require me doing quite a bit of extra work for those cases with multiple codes, which are pretty common. I started taking all this functionality out the code, but I thought I better check it with you first.

I like what you have done so it takes a vector/variable instead of a data frame. Much neater.

I have a bunch of small edits that I will be doing, but I'll wait to hear from you before pushing stuff.
Thanks,
Jaime

jvillacampa · 2020-04-09T10:15:22Z

Just to say, that I think the feature of names to codes would be very useful, but it would need some more control, so users can decide the area type of their input (e.g. HB and they get S08 codes), and we will need a more complete lookup so more combinations of names are included (e.g. NHS Highland and not only Highland or Dumfries and Galloway and Dumfries & Galloway)

jackhannah95 · 2020-04-09T11:29:02Z

Hi @jvillacampa,

You know better than I do what'd be useful functionality in practice so yeah by all means make any changes you want to. I'm not precious about stuff being deleted.

jvillacampa · 2020-04-09T13:51:08Z

@jackhannah95 I uploaded some changes, all cosmetic apart from taking the "code" functionality. I think that would be very useful to have, but maybe for a future release I can figure out based on what you started. I tried to figure out how to drop the NAs from the SPARQL query but I failed, but your solution works well so not sure we need to look at it more. Thanks a lot for the work you have put into this, it's much better now!

jackhannah95 · 2020-04-09T16:41:34Z

Looks good @jvillacampa 👍 there are a few small things to do like delete some old tests and re-word the odd bit of documentation but nothing major.

I guess the only bits I would still quibble over are some of the names. code_var is more descriptive than x, granted, but I don't think many people will ever actually type out the variable name. We've called the first argument x in every other function (apart from file_size() but it takes a filepath rather than a value) and for the sake of consistency I'd prefer it to be the same in this one too, if that's okay.

Similarly, within the function itself, no_9char_codes is definitely more descriptive than n, but it's invisible to the user and I just wanted a concise way of getting that value into the warning message. I don't think it matters too much that it doesn't really describe what that value is (I think the comment is probably descriptive enough) so ideally I'd like to shorten that back to n too if that'd be alright.

Chances are I'll have to do something at the end to make travis build anyway so I'm happy to do those bits (it'll be one or two commits max) and give you another look before this is finally merged. Would that be alright?

jvillacampa · 2020-04-09T16:49:52Z

I hate undescriptive object names in functions, just because it makes much harder to fix it and figure out what the function is doing, but you can change them as your leaving gift from me ☺ I saw that Travis was having a tantrum again, but not sure why it was, thanks for looking into it. Cheers and enjoy the weekend! Jaime From: Jack Hannah <notifications@github.com> Sent: 09 April 2020 17:42 To: Health-SocialCare-Scotland/phsmethods <phsmethods@noreply.github.com> Cc: VILLACAMPA, Jaime (PUBLIC HEALTH SCOTLAND) <jaime.villacampa@nhs.net>; Mention <mention@noreply.github.com> Subject: Re: [Health-SocialCare-Scotland/phsmethods] Matching geography codes with the area names (#18) Looks good @jvillacampa<https://github.com/jvillacampa> 👍 there are a few small things to do like delete some old tests and re-word the odd bit of documentation but nothing major. I guess the only bits I would still quibble over are some of the names. code_var is more descriptive than x, granted, but I don't think many people will ever actually type out the variable name. We've called the first argument x in every other function (apart from file_size() but it takes a filepath rather than a value) and for the sake of consistency I'd prefer it to be the same in this one too, if that's okay. Similarly, within the function itself, no_9char_codes is definitely more descriptive than n, but it's invisible to the user and I just wanted a concise way of getting that value into the warning message. I don't think it matters too much that it doesn't really describe what that value is (I think the comment is probably descriptive enough) so ideally I'd like to shorten that back to n too if that'd be alright. Chances are I'll have to do something at the end to make travis build anyway so I'm happy to do those bits (it'll be one or two commits max) and give you another look before this is finally merged. Would that be alright? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#18 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHHPIVZWYBFHEWXQWBLYMLLRLX3EZANCNFSM4KLEP63Q>. ******************************************************************************************************************** This message may contain confidential information. If you are not the intended recipient please inform the sender that you have received the message in error before deleting it. Please do not disclose, copy or distribute information in this e-mail or take any action in relation to its contents. To do so is strictly prohibited and may be unlawful. Thank you for your co-operation. NHSmail is the secure email and directory service available for all NHS staff in England and Scotland. NHSmail is approved for exchanging patient data and other sensitive information with NHSmail and other accredited email services. For more information and to find out how you can switch, https://portal.nhs.net/help/joiningnhsmail

jackhannah95 · 2020-04-09T17:05:57Z

Haha thanks! Sorry for bugging you. If you ever become a maintainer in the future you're welcome to change all the argument names. But then I guess you'd have to explain to everyone why their code doesn't work anymore 🤷

Hopefully these last two commits take care of everything and we can finally merge this.

jackhannah95

It's time

jvillacampa · 2020-04-14T10:18:57Z

I think now @davidc92 needs to approve before we can merge

jvillacampa added 4 commits January 23, 2020 15:58

packages for match_geo_names

1cfae48

match_geo_names

dceb086

match_geo_names function, tests, documentation

f585996

Comments

f531575

jvillacampa added the enhancement New feature or request label Jan 24, 2020

jvillacampa requested review from davidc92 and lucindalawrie January 24, 2020 11:07

jvillacampa self-assigned this Jan 24, 2020

davidc92 suggested changes Jan 30, 2020

View reviewed changes

jvillacampa added 5 commits January 31, 2020 12:14

function name, packages, tidying

df0f180

rename test

97b4f03

renaming function

70e4c4a

extracting from SG open data

a97c80f

area name lookup

c1b9803

davidc92 suggested changes Feb 6, 2020

View reviewed changes

DESCRIPTION Outdated Show resolved Hide resolved

R/match_geo_names.R Outdated Show resolved Hide resolved

R/match_geo_names.R Outdated Show resolved Hide resolved

jvillacampa added 8 commits February 6, 2020 12:20

sparql and %<>%

fad2bb5

adding tests and stops

bb42c1d

rename file

3ee0c3b

magrittr operator

9495f9e

fix parameter and example

ae5bc6d

adding some more tests

1146a2a

fixing function checks

397f1e6

documentation match area names

677b268

fixing tests

8adcbad

jackhannah95 and others added 8 commits April 6, 2020 16:02

add .data prefix to variable names

fb25346

add some tests

d213328

remove forcats dependency

99bda83

simplify

6f779fb

add extra edge case test

616c9d1

add warning tests

d170bda

add extra warning tests

494e30a

Merge branch 'master' into geo_names

65e11d7

not really working for names match_area("Highland", "code")

c80b1a6

jvillacampa added 2 commits April 9, 2020 11:20

documentation and meaneingful object/parameters names

a2cecd9

comments

55dbd78

tidying

08a2146

jvillacampa and others added 2 commits April 9, 2020 14:55

documentation

c4083ce

Merge branch 'master' into geo_names

2af98b5

jackhannah95 added 2 commits April 9, 2020 18:03

travis fixes

243d0d9

final edits

c33396d

fix typo

f9fa483

jackhannah95 approved these changes Apr 14, 2020

View reviewed changes

davidc92 approved these changes Apr 15, 2020

View reviewed changes

jackhannah95 merged commit fe12c7a into master Apr 15, 2020

jackhannah95 deleted the geo_names branch April 15, 2020 10:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matching geography codes with the area names #18

Matching geography codes with the area names #18

jvillacampa commented Jan 24, 2020 •

edited by davidc92

Loading

jackhannah95 commented Jan 29, 2020

davidc92 left a comment

jvillacampa commented Jan 30, 2020

davidc92 left a comment

jvillacampa commented Feb 6, 2020

jvillacampa commented Feb 6, 2020

davidc92 commented Feb 6, 2020

jvillacampa commented Feb 7, 2020

jackhannah95 commented Apr 6, 2020

jvillacampa commented Apr 9, 2020

jvillacampa commented Apr 9, 2020

jackhannah95 commented Apr 9, 2020 •

edited

Loading

jvillacampa commented Apr 9, 2020

jackhannah95 commented Apr 9, 2020

jvillacampa commented Apr 9, 2020 via email

jackhannah95 commented Apr 9, 2020

jackhannah95 left a comment

jvillacampa commented Apr 14, 2020

Matching geography codes with the area names #18

Matching geography codes with the area names #18

Conversation

jvillacampa commented Jan 24, 2020 • edited by davidc92 Loading

jackhannah95 commented Jan 29, 2020

davidc92 left a comment

Choose a reason for hiding this comment

jvillacampa commented Jan 30, 2020

davidc92 left a comment

Choose a reason for hiding this comment

jvillacampa commented Feb 6, 2020

jvillacampa commented Feb 6, 2020

davidc92 commented Feb 6, 2020

jvillacampa commented Feb 7, 2020

jackhannah95 commented Apr 6, 2020

jvillacampa commented Apr 9, 2020

jvillacampa commented Apr 9, 2020

jackhannah95 commented Apr 9, 2020 • edited Loading

jvillacampa commented Apr 9, 2020

jackhannah95 commented Apr 9, 2020

jvillacampa commented Apr 9, 2020 via email

jackhannah95 commented Apr 9, 2020

jackhannah95 left a comment

Choose a reason for hiding this comment

jvillacampa commented Apr 14, 2020

jvillacampa commented Jan 24, 2020 •

edited by davidc92

Loading

jackhannah95 commented Apr 9, 2020 •

edited

Loading