Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_trip_id #26

Open
einarhjorleifsson opened this issue May 10, 2024 · 1 comment
Open

create_trip_id #26

einarhjorleifsson opened this issue May 10, 2024 · 1 comment
Labels
Workflow Block 1 Data preprocissing workflow code

Comments

@einarhjorleifsson
Copy link
Collaborator

I am a bit confused with create_trip_id. as currently defined in 0_global.R we have:

# Define a function to create a unique trip identifier
create_trip_id <- function(eflalo) {
  paste(eflalo$LE_ID, eflalo$LE_CDAT, sep="-")
}

now these variables are associated with a "Log event" with the meaning:

  • LE_ID is Log event ID
  • LE_CDAT is Catch date

i.e. these variables are not associated with trip identification. should we not be using the FT_REF and some of the associated FT_** date-time variables?

keep in mind that this eflalo format is a bit alien to me, so this may actually be a bug in my head.

@einarhjorleifsson einarhjorleifsson added the Workflow Block 1 Data preprocissing workflow code label May 10, 2024
@neilcampbelll
Copy link
Contributor

That's a good point. The code here is just a reworking of what was in the previous workflow script so that this...

 # 2.3.3  Remove non-unique trip numbers -----------------------------------------------------------------------------
 
   eflalo <-
    eflalo[
      !duplicated(paste(eflalo$LE_ID, eflalo$LE_CDAT, sep="-")),
    ]
  remrecsEflalo["duplicated",] <-
    c(
      nrow(eflalo),
      100 +
        round(
          (nrow(eflalo) - as.numeric(remrecsEflalo["total", 1])) /
            as.numeric(remrecsEflalo["total", 1]) * 100,
          2)
)

is replaced by this

  # Apply the trip ID function to the eflalo data frame
  trip_id <- create_trip_id(eflalo)
  
  # Remove records with non-unique trip identifiers
  eflalo <- eflalo[!duplicated(trip_id), ]

I think this is a hangover from very early days of the process, when FT_REF wasn't as unique an identifier as it is supposed to be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Workflow Block 1 Data preprocissing workflow code
Projects
None yet
Development

No branches or pull requests

2 participants