-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matching geography codes with the area names #18
Conversation
Hi @jvillacampa. Just to say that @davidc92 and @lucindalawrie are going to do a review of this and get back to you soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jaime!
Have a few initial comments. I'll take another look once you've made these changes and then maybe start looking at the documentation and tests.
- Can you add some error handling at the start of the function? Checks that whatever is input is the correct object e.g. is
dataset
atibble
? If not, throw out an error. Might also be worth running a check on the format of the (presumably) string fromcode_var
. The geography codes are all a standard format so I don't think it would be too difficult to check that the string matches regular expressions for these standard formats. - You need to add the package names before function calls. That way the user doesn't have to have the packages loaded and also mitigates the risk of one packaging masking a function from another. e.g. you would type
janitor::clean_names()
. - Can you be picky over spacing? I.e. add a space after a hash for comments and add blank lines before comments and try and break your code up with blank lines so that it isn't all clumped together? Can you also be stricter with the 80 character limit per line?
Just to note - I looked at the travis output for the checks and it looks like the checks are failing because the unit tests aren't passing. Might be worth a look.
Cheers David, I will take a look to your comments in the next few days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jaime! Just a couple of small comments here. I tried to test the package on some of my own data, but it errored because you are using the %<>%
operator without importing it. You can import it by editing the R/phsmethods.R script and adding @importFrom magrittr %<>%
below the other import line. Otherwise the user needs to have magrittr loaded for the function to work. Can you also make sure any package imports you add to the DESCRIPTION file are in alphabetical order?
I'll take a closer look at the code this afternoon :)
Hi @davidc92, sorry I pushed some changes, but I still need to deal with some of your comments from before! I'll let you know when I have got everything ready. Cheers. |
Hi @davidc92, thank you for all the comments. I think I have dealt with all of them now and is ready for a new round of review.
|
Adding |
Using "here" doesn't help in this case. Apparently this is a well known issue. I have tried the proposed workaround but still fails. I have done a workaround replicating one of the files and taking out one of the tests that I don't think added much. However, after doing these changes still fails. The problem is that when the function runs it doesn't recognize that the lookup in the reference files folder exists and then it tries to connect to the open data platform, which could be fine, but for some reason it doesn't find the function getURL from RCurl (which is a dependency of SPARQL). I have tried a couple of things, but a bit stuck here. The tests run perfectly fine manually, fail as described above for devtools::checks(), and for devtools::test() they connect to the open data platform but fail when it comes to the line where it saves the data in the reference file folder. I have pushed the latest version of what I was trying, but any help/advice welcome. |
Hi @jvillacampa, Sorry, I have gone overboard with this. The more I thought about it, the more I thought that requiring a data frame to be provided to the function probably meant that it wasn't any less effort than just doing your own join to a lookup file. Some of the commits relate to small changes to the other functions at least. I think the ease with a function like this would come with being able to just input directly anything from a single code to a column with a million of them and have it return the names in the same format. Something like: df %>%
mutate(area_name = match_area(geo_code)) I also thought it should be able to do the reverse, since both are available in the lookup. Please let me know if you're not happy with anything or have any improvements to suggest. Or if it's all terrible and you preferred the original way. I took the same approach to saving, storing and calling data as the janeaustenr package. It seems to be working fine, and has the handy benefit that R >= 3.5 isn't required and no duplicates need to be created for unit tests. Again though, let me know if you think anything could be done better. The SPARQL code extracts codes that have no accompanying area name, but I had to drop them as otherwise entering an Cheers, Jack |
Hi @jackhannah95, thanks for doing that. Linking name to code - I thought about it, but the main problem is that names are not unequivocal. I really don't think the way the function works would provide useful output. Or at least I don't think I would use to get codes from names as it would require me doing quite a bit of extra work for those cases with multiple codes, which are pretty common. I started taking all this functionality out the code, but I thought I better check it with you first. I like what you have done so it takes a vector/variable instead of a data frame. Much neater. I have a bunch of small edits that I will be doing, but I'll wait to hear from you before pushing stuff. |
Just to say, that I think the feature of names to codes would be very useful, but it would need some more control, so users can decide the area type of their input (e.g. HB and they get S08 codes), and we will need a more complete lookup so more combinations of names are included (e.g. NHS Highland and not only Highland or Dumfries and Galloway and Dumfries & Galloway) |
Hi @jvillacampa, You know better than I do what'd be useful functionality in practice so yeah by all means make any changes you want to. I'm not precious about stuff being deleted. |
@jackhannah95 I uploaded some changes, all cosmetic apart from taking the "code" functionality. I think that would be very useful to have, but maybe for a future release I can figure out based on what you started. I tried to figure out how to drop the NAs from the SPARQL query but I failed, but your solution works well so not sure we need to look at it more. Thanks a lot for the work you have put into this, it's much better now! |
Looks good @jvillacampa 👍 there are a few small things to do like delete some old tests and re-word the odd bit of documentation but nothing major. I guess the only bits I would still quibble over are some of the names. Similarly, within the function itself, Chances are I'll have to do something at the end to make travis build anyway so I'm happy to do those bits (it'll be one or two commits max) and give you another look before this is finally merged. Would that be alright? |
I hate undescriptive object names in functions, just because it makes much harder to fix it and figure out what the function is doing, but you can change them as your leaving gift from me ☺
I saw that Travis was having a tantrum again, but not sure why it was, thanks for looking into it.
Cheers and enjoy the weekend!
Jaime
From: Jack Hannah <notifications@github.com>
Sent: 09 April 2020 17:42
To: Health-SocialCare-Scotland/phsmethods <phsmethods@noreply.github.com>
Cc: VILLACAMPA, Jaime (PUBLIC HEALTH SCOTLAND) <jaime.villacampa@nhs.net>; Mention <mention@noreply.github.com>
Subject: Re: [Health-SocialCare-Scotland/phsmethods] Matching geography codes with the area names (#18)
Looks good @jvillacampa<https://github.com/jvillacampa> 👍 there are a few small things to do like delete some old tests and re-word the odd bit of documentation but nothing major.
I guess the only bits I would still quibble over are some of the names. code_var is more descriptive than x, granted, but I don't think many people will ever actually type out the variable name. We've called the first argument x in every other function (apart from file_size() but it takes a filepath rather than a value) and for the sake of consistency I'd prefer it to be the same in this one too, if that's okay.
Similarly, within the function itself, no_9char_codes is definitely more descriptive than n, but it's invisible to the user and I just wanted a concise way of getting that value into the warning message. I don't think it matters too much that it doesn't really describe what that value is (I think the comment is probably descriptive enough) so ideally I'd like to shorten that back to n too if that'd be alright.
Chances are I'll have to do something at the end to make travis build anyway so I'm happy to do those bits (it'll be one or two commits max) and give you another look before this is finally merged. Would that be alright?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#18 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHHPIVZWYBFHEWXQWBLYMLLRLX3EZANCNFSM4KLEP63Q>.
********************************************************************************************************************
This message may contain confidential information. If you are not the intended recipient please inform the
sender that you have received the message in error before deleting it.
Please do not disclose, copy or distribute information in this e-mail or take any action in relation to its contents. To do so is strictly prohibited and may be unlawful. Thank you for your co-operation.
NHSmail is the secure email and directory service available for all NHS staff in England and Scotland. NHSmail is approved for exchanging patient data and other sensitive information with NHSmail and other accredited email services.
For more information and to find out how you can switch, https://portal.nhs.net/help/joiningnhsmail
|
Haha thanks! Sorry for bugging you. If you ever become a maintainer in the future you're welcome to change all the argument names. But then I guess you'd have to explain to everyone why their code doesn't work anymore 🤷 Hopefully these last two commits take care of everything and we can finally merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's time
I think now @davidc92 needs to approve before we can merge |
As in issue #13. Function that matches a column containing geography codes with their respective area names. I don't think I have done a very good job with the tests and covering potential problems with user's inputs, so any feedback very welcomed.
fixes #13.