Releases: USFWS/migbirdHIP
Releases · USFWS/migbirdHIP
v1.2.8
migbirdHIP 1.2.8
Major changes & new features
- Added a
NEWS.md
file to track changes to the package. - Added package documentation page
man/migbirdHIP-package.Rd
- New
fileCheck()
function: checks if any files in the input folder have already been written to processed folder. - New
shiftCheck()
function: find and print any rows that have a line shift error with number of positions shifted. - New
identicalBags()
function: returns output if any columns are exactly the same in a file; does not return "no season" matches. - New
glyphCheck()
function: pull and view any non-UTF-8 characters in the raw data; helps guide manual fixes to read in the HIP files without line shifts.glyphFinder()
no longer exported, now used internally inside ofglyphCheck()
- Added 3 new internal package functions (
errorLevel_errors_field()
,errorLevel_errors_state()
, andrecordLevel_errors_state()
), which are used insideredFlags()
,errorPlot_fields()
, anderrorPlot_states()
. They reduce code redundancy and ensure updates happen universally. - Added 2 new internal package functions (
issueAssign()
andissuePlot
), which are used inside ofissueCheck()
and by the download report (dl_report.qmd
). - Added internal function
strataFix()
to be used inside ofclean()
to resolve false permit labels. This function edits strata values forband_tailed_pigeon
andcrane
from states that submit permit files for crane and band-tailed pigeons; values changed from"2"
to"0"
. - Edited
writeReport()
to render quarto documents. - Edited
issueCheck()
to place more emphasis onissue_date
to determine relevancy of a record. The function no longer exports future and past data as.csv
files. Past data are still filtered out from the returned tibble. Output messages indicate if future data exist. - Edited
clean()
function:- Filter out any rows that contain a bag value other than a single digit
- Eliminated address cleaning
- Moved zip code checking and messaging to
clean()
fromproof()
; now checks on entire zip code, not just prefix. Remove ending0
whenzip
value is 10 digits long. - Changed Oregon solo permit
hunt_mig_birds
field when it equals"0"
to"2"
. For context, a solo permit contains a"2"
in at least one of theband_tailed_pigeon
,brant
, orseaduck
fields and contains"0"
in all other bag fields.
- Edited
correct()
to remove any records with value of"0"
orNA
value in every bag field; improvedemail
field cleaning and repair. - Edited
strataCheck()
to return two additional fields in output; 1) number of bad strata and 2) proportion of bad strata. The function now checks for permit species coming during regular HIP and returns them as erroneous (e.g. NMband_tailed_pigeon
="2"
). - Edited
write_hip()
to set any state/species combinations without a season to have strata of"0"
; bad bag values remain NA. - Edited
sumLines()
to improve speed and efficiency. In addition, the function now returns a data table with the sum of lines per file instead of a single number. No longer exported; set as internal function. - Edited
read_hip()
to eliminate encoding check and optionally usesumLines()
function to ensure all lines were read in. Returns a message if any records contain a bag value other than a single digit. In addition, now converts blank strings toNA
. - Edited
validate()
to returnsource_file
field and filter out states and species with no season from function output. - Edited
investigate()
to no longer be exported; it works inside ofvalidate()
to return a more detailed output. This replaces the previous workflow of runninginvestigate()
separately. - Removed
manualFix()
function because it is no longer relevant to the package. - Removed
shiftFix()
because line shift errors cannot be fixed programmatically on a reliable basis. - Templates
- New Quarto
dl_report.qmd
replaced RMarkdowndl_report.Rmd
.- The new Quarto layout allows tabset panels which divides content into sections that can be more easily read and focused on by the user. Tabset panels were also incorporated for before and after plots to show proportion of errors that are corrected during pre-processing.
- A new summary section distills the findings of the functions overall for the user to discern the most important issues from the HIP files that were processed. This is partly accomplished with the use of a
catch_messages()
function created only for use in thedl_report.qmd
and is not exported or contained within themigbirdHIP
package internally. Thecatch_messages()
function wraps around pre-processing functions (such asread_hip()
,clean()
,issueCheck()
, etc) and captures messages in a list so that they can be returned as readable bullet points. - A new map displays time lag of files received from 49 states in a hexagonal representation of the continental US.
- Emojis are printed with output text to quickly indicate to readers whether issues ❌ need attention or ✔️ are not concerning.
- Sections added as needed to report on new function output (see above for which new functions were added).
- A new section lists any states that were excluded from the output when they submitted data for that download (e.g. all records were issued in the past and are not eligible for the current season; perhaps sent by mistake).
- Eliminated
season_report.Rmd
template
- New Quarto
- Imports
- Removed
magrittr
andrmarkdown
- Added
quarto
andsf
- Removed
- Suggests
- Added
spelling
- Added
- Internal package data (
sysdata.rda
)- Added vectors of abbreviated US territories and Canada provinces/territories, both updated to include missing abbreviations from previous versions and remove redundant abbreviations
- Added vector of bag field names
- Added vector of two-season states
- Added vectors of seaduck and brant states, seaduck-only states, and two-season states
- Added hexmap grid for download report
- Added tibbles of permit file states/species and states/species of permits received inline
- Updated zip code reference table, bag reference table, license window reference table, and MS reference dates
Minor changes / bug fixes
- License changed to CC0 (previously Public Domain), which was causing a warning in
devtools::check()
- Refactored
write_hip()
to eliminate redundancy; replaced repeatedleft_join()
with for loop - Refactored
findDuplicates()
by throwing an error message for a bad string supplied to thereturn
parameter at the start, which reduces wait time for failure.- Investigated replacing
findDuplicates()
redundancy of searching for duplicate fields using afor
loop orpurrr::map()
, but this change added 20+ seconds of processing time so left the redundancy as-is.
- Investigated replacing
- Refactored all functions that take a path parameter to add a forward slash to the end each supplied path if not included by the user.
- Replaced superseded
tidyr::separate()
withtidyr::separate_wider_delim()
ortidyr::separate_wider_position()
- Replaced
dplyr::summarize()
withdplyr::reframe()
since returning more than 1 row per group was deprecated indplyr 1.1.0
- Replaced
ggplot::stat()
withggplot::after_stat()
, since the former was deprecated inggplot2 3.4.0
- Replaced tidy pipes
%>%
and%<>%
with base R pipe|>
for increased speed and reduced dependency on tidyverse packages. - Edited
DESCRIPTION
file:- Changed package description
- Set language to
en-US
- Added a URL to the Harvest Information Program website
- Incorporated
usethis::use_spell_check()
to package checking workflow, which added aninst/WORDLIST
file (whitelisted words) to the package.
Full Changelog: v1.2.7...v1.2.8
v1.2.7
- Updated strata reference table in sysdata.rda
- Added kable summary tables to horizontal repetition checks in the download report template
- Due to new kables, updated DESCRIPTION to include kableExtra as a Suggests (and added other packages used in the dl_template.Rmd not previously included)
- Updated .Rbuildignore to reduce R CMD check notes
v1.2.6
v1.2.5
v1.2.4
v1.2.3
v1.2.2
Full Changelog: v1.2.1...v1.2.2