Skip to content

Commit

Permalink
Update 06_Data_Validation.Rmd
Browse files Browse the repository at this point in the history
  • Loading branch information
hdvincelette committed Oct 23, 2024
1 parent 4d8c95f commit cd16b5f
Showing 1 changed file with 24 additions and 20 deletions.
44 changes: 24 additions & 20 deletions vignettes/06_Data_Validation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,16 @@ output:
number_sections: yes
vignette: "%\\VignetteIndexEntry{mdJSONdictio} %\\VignetteEncoding{UTF-8} %\\VignetteEngine{knitr::rmarkdown}\n"
---

```{=html}
<style type="text/css">
body{
font-size: 12pt;
}
</style>
---
```

------------------------------------------------------------------------

```{r, include = FALSE}
knitr::opts_chunk$set(
Expand All @@ -41,37 +45,40 @@ library(utils)
library(rjson)
```

```validate.mdJSON()```/```validate.table()``` compares mdJSON and tabular data dictionaries against a corresponding dataset and outputs a warnings table, as shown below. The tabular data dictionary must be formatted to the mdJSONdictio [template](https://hdvincelette.github.io/mdJSONdictio/articles/02_Dictionary_Template.html).These functions are intended to be used as a Quality Assurance step in the data management process.
`validate.mdJSON()`/`validate.table()` compares mdJSON and tabular data dictionaries against a corresponding dataset and outputs a warnings table, as shown below. The tabular data dictionary must be formatted to the mdJSONdictio [template](https://hdvincelette.github.io/mdJSONdictio/articles/02_Dictionary_Template.html).These functions are intended to be used as a Quality Assurance step in the data management process.

##### <span style="color: grey;">Warnings table in Excel</span>
![](https://github.com/hdvincelette/mdJSONdictio/raw/master/man/figures/Warnings_Table.png)
<br />
<br />
##### [Warnings table in Excel]{style="color: grey;"}

![](https://github.com/hdvincelette/mdJSONdictio/raw/master/man/figures/Warnings_Table.png) <br /> <br />

# Validation procedures
```validate.mdJSON()``` and ```validate.tableN()``` can detect numerous discrepencies between a dataset and dictionary, as shown below. "category" corresponds to field(s) in the tabular data dictionary or mdEditor Dictionary record which failed one or more logical tests. "discrepancy" describes the cause of that failure. The output is called a "warnings" table because some discrepancies may not warrant an action (e.g., the data type for the attribute "TagID" appears to be "integer" in the dataset, but the data type is described as "character varying" in the dictionary because entry values can contains letters). It is ultimately up to the data steward to decide what needs to be corrected to most accurately describe the associated dataset.

`validate.mdJSON()` and `validate.tableN()` can detect numerous discrepencies between a dataset and dictionary, as shown below. "category" corresponds to field(s) in the tabular data dictionary or mdEditor Dictionary record which failed one or more logical tests. "discrepancy" describes the cause of that failure. The output is called a "warnings" table because some discrepancies may not warrant an action (e.g., the data type for the attribute "TagID" appears to be "integer" in the dataset, but the data type is described as "character varying" in the dictionary because entry values can contains letters). It is ultimately up to the data steward to decide what needs to be corrected to most accurately describe the associated dataset.

Refer to the [mdEditor Reference Manual](https://guide.mdeditor.org/reference/reference-manual.html) for complete definitions and constraints of all mdEditor Attribute, Domain, and Domain Item fields.

##### <span style="color: grey;">Potential dataset-dictionary discrepancies</span>
<font size="2.5">
##### [Potential dataset-dictionary discrepancies]{style="color: grey;"}

<font size="2.5">

```{r, echo=FALSE}
kable(read.csv(path <-
system.file("extdata", "validation_procedures.csv", package = "mdJSONdictio")
),
align = "l") %>%
kable_styling(full_width = F, position = "left")
```
*see data type rules
</font>
<br />
<br />

# Data type contraints
\*see data type rules </font> <br /> <br />

# Data type constraints

"dataType" values undergo a series of tests based on [Structured Query Language (SQL)](https://www.iso.org/obp/ui/en/#iso:std:iso-iec:9075:-1:ed-6:v1:en) data constraints to ensure the associated dataset attribute is described accurately. Data types are described as followed. Definitions are from the [mdCodes Viewer](https://adiwg.github.io/mdTools/#codes-page) in the mdTools interface, and contraints from the [International Organization for Standardization (ISO)](https://www.iso.org/obp/ui/en/#iso:std:iso-iec:9075:-1:ed-6:v1:en).

##### <span style="color: grey;">Data type rules</span>
<font size="2.5">
##### [Data type rules]{style="color: grey;"}

<font size="2.5">

```{r, echo=FALSE}
kable(read.csv(path<-system.file("extdata", "datatype.rules.csv", package = "mdJSONdictio"), fileEncoding="latin1", na.strings = NA), escape = F, align = "l") %>% kable_styling(full_width = F, position = "left") %>%
column_spec(1, width_min = '2in') %>%
Expand All @@ -83,8 +90,5 @@ kable(read.csv(path<-system.file("extdata", "datatype.rules.csv", package = "mdJ
# kable(read.csv(path<-system.file("extdata", "datatype.rules_definitions.csv", package = "mdJSONdictio"), fileEncoding="latin1", na.strings = NA), align = "l") %>% column_spec(2, width = "50em") %>%
# kable_styling(full_width = F,position = "left")
```
</font>
<br />
<br />


</font> <br /> <br />

0 comments on commit cd16b5f

Please sign in to comment.