version 0.3.5

michalovadek · Mar 15, 2021 · 9ee0eeb · 9ee0eeb
1 parent 319de1d
commit 9ee0eeb
Show file tree

Hide file tree

Showing 26 changed files with 492 additions and 314 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -32,7 +32,6 @@ Suggests:
     tidytext,
     wordcloud,
     purrr,
-    ggplot2,
-    glue
+    ggplot2
 URL: https://michalovadek.github.io/eurlex/
 VignetteBuilder: knitr
diff --git a/NEWS.md b/NEWS.md
@@ -4,7 +4,7 @@
 
 - it is now possible to select all resource types available with `elx_make_query(resource_type = "any")`. Since there are nearly 1 million CELEX codes, use with discretion and expect long execution times
 - results can be restricted to a particular directory code with `elx_make_query(directory = "18")` (directory code "18" denotes Common Foreign and Security Policy)
-- results can be restricted to a particular sector with `elx_make_query(sector = 2)` (sector code 3 denotes EU international agreements)
+- results can be restricted to a particular sector with `elx_make_query(sector = 2)` (sector code 2 denotes EU international agreements)
 
 ## Minor changes
 

diff --git a/R/elx_make_query.R b/R/elx_make_query.R
@@ -1,8 +1,8 @@
-#' Create SPARQL quries
+#' Create SPARQL queries
 #'
 #' Generates pre-defined or manual SPARQL queries to retrieve document ids from Cellar.
 #' List of available resource types: http://publications.europa.eu/resource/authority/resource-type .
-#' Note that not all resource types are compatible with the pre-defined query.
+#' Note that not all resource types are compatible with default parameter values.
 #'
 #' @importFrom magrittr %>%
 #'
@@ -46,6 +46,7 @@ elx_make_query <- function(resource_type = c("directive","regulation","decision"
                            include_directory = FALSE, include_sector = FALSE,
                            order = FALSE, limit = NULL){
 
+  if (missing(resource_type)) stop("'resource_type' must be defined")
   if (!resource_type %in% c("any","directive","regulation","decision","recommendation","intagr","caselaw","manual","proposal","national_impl")) stop("'resource_type' must be defined")
 
   if (resource_type == "manual" & nchar(manual_type) < 2){

diff --git a/README.md b/README.md
@@ -26,11 +26,11 @@ For the moment, it is recommended to retrieve metadata one variable at a time. F
 2. `dates <- elx_make_query("directive", include_date_transpos = TRUE) %>% elx_run_query()`
 3. `ids %>% dplyr::left_join(lbs) %>% dplyr::left_join(dates)`
 
-rather than `elx_make_query("directive", include_lbs = TRUE, include_date_transpos = TRUE)`. This approach should make it easier to understand the returned data frame(s), especially when some variables contain missing or duplicated data.
+rather than `elx_make_query("directive", include_lbs = TRUE, include_date_transpos = TRUE)`. This approach should make it easier to understand the returned data frame(s), especially when some variables contain missing or duplicated data. Always keep an eye on whether the `work` and `celex` columns identify rows uniquely or not.
 
 One of the main contributions of the SPARQL requests is that we obtain a comprehensive list of identifiers that we can subsequently use to obtain more data relating to the document in question. While the results of the SPARQL queries are useful also for webscraping (with the `rvest` package), the function `elx_fetch_data()` enables us to fire GET requests to retrieve data on documents with known identifiers (including Cellar URI). The function currently enables downloading the title and the full text of a document in all available languages.
 
-See the [vignette](https://michalovadek.github.io/eurlex/articles/eurlexpkg.html) for a walkthrough on how to use the package. Check function documentation for most up-to-date overview of features.
+See the [vignette](https://michalovadek.github.io/eurlex/articles/eurlexpkg.html) for a walkthrough on how to use the package. Check function documentation for most up-to-date overview of features. Example use cases are shown in this [paper](https://www.tandfonline.com/doi/full/10.1080/2474736X.2020.1870150).
 
 ## Cite
 Michal Ovádek (2021) Facilitating access to data on European Union laws, Political Research Exchange, 3:1, DOI: [10.1080/2474736X.2020.1870150](https://www.tandfonline.com/doi/full/10.1080/2474736X.2020.1870150)
@@ -40,6 +40,19 @@ This package nor its author are in any way affiliated with the EU Publications O
 
 Please consider contributing to the maintanance and development of the package by reporting bugs or suggesting new features.
 
+## Latest changes
+
+### eurlex 0.3.5
+
+- it is now possible to select all resource types available with `elx_make_query(resource_type = "any")`. Since there are nearly 1 million CELEX codes, use with discretion and expect long execution times
+- results can be restricted to a particular directory code with `elx_make_query(directory = "18")` (directory code "18" denotes Common Foreign and Security Policy)
+- results can be restricted to a particular sector with `elx_make_query(sector = 2)` (sector code 2 denotes EU international agreements)
+
+- new feature: request date of court case submission `elx_make_query(include_date_lodged = TRUE)`
+- new feature: request type of court procedure and outcome `elx_make_query(include_court_procedure = TRUE)`
+- new feature: request directory code of legal act `elx_make_query(include_directory = TRUE)`
+- `elx_curia_list()` has a new default parameter `parse = TRUE` which creates separate columns for `ecli`, `see_case`, `appeal` applying regular expressions on `case_info`
+
 ## Useful resources
 Guide to CELEX numbers: https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html
 

diff --git a/doc/eurlexpkg.R b/doc/eurlexpkg.R
@@ -17,27 +17,36 @@ results <- dirs %>% select(-force,-date)
 
 ## -----------------------------------------------------------------------------
 query_dir %>% 
-  glue::as_glue() # for nicer printing
+  cat() # for nicer printing
 
 elx_make_query(resource_type = "caselaw") %>% 
-  glue::as_glue()
+  cat()
 
 elx_make_query(resource_type = "manual", manual_type = "SWD") %>% 
-  glue::as_glue()
+  cat()
 
 
 ## -----------------------------------------------------------------------------
 elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) %>% 
-  glue::as_glue()
+  cat()
 
 # minimal query: elx_make_query(resource_type = "directive")
 
 elx_make_query(resource_type = "recommendation", include_date = TRUE, include_lbs = TRUE) %>% 
-  glue::as_glue()
+  cat()
 
 # minimal query: elx_make_query(resource_type = "recommendation")
 
 
+## -----------------------------------------------------------------------------
+# request documents from directory 18 ("Common Foreign and Security Policy")
+# and sector 3 ("Legal acts")
+
+elx_make_query(resource_type = "any",
+               directory = "18",
+               sector = 3) %>% 
+  cat()
+
 ## ----runquery, eval=FALSE-----------------------------------------------------
 #  results <- elx_run_query(query = query_dir)
 #  
@@ -65,18 +74,14 @@ rec_eurovoc %>%
 
 
 ## ----eurovoctable-------------------------------------------------------------
-
 eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc)
 
 print(eurovoc_lookup)
 
-
 ## ----appendlabs---------------------------------------------------------------
-
 rec_eurovoc %>% 
   left_join(eurovoc_lookup)
 
-
 ## -----------------------------------------------------------------------------
 eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc,
                                     alt_labels = TRUE,
@@ -86,7 +91,6 @@ rec_eurovoc %>%
   left_join(eurovoc_lookup) %>% 
   select(celex, eurovoc, labels)
 
-
 ## ----getdatapur, message = FALSE, warning=FALSE, error=FALSE------------------
 # the function is not vectorized by default
 elx_fetch_data(results$work[1],"title")
@@ -117,7 +121,9 @@ dirs %>%
 
 ## -----------------------------------------------------------------------------
 dirs %>% 
-  ggplot(aes(x = as.Date(date), y = celex)) +
+  filter(!is.na(force)) %>% 
+  mutate(date = as.Date(date)) %>% 
+  ggplot(aes(x = date, y = celex)) +
   geom_point(aes(color = force), alpha = 0.1) +
   theme(axis.text.y = element_blank(),
         axis.line.y = element_blank(),

diff --git a/doc/eurlexpkg.Rmd b/doc/eurlexpkg.Rmd
@@ -2,7 +2,7 @@
 title: "eurlex: Retrieve data on European Union law in R"
 output: rmarkdown::html_vignette
 description: >
-  Retrieve efficiently tidy data on European Union law in R with
+  Retrieve data on European Union law in R with
   pre-defined SPARQL and REST queries.
 vignette: >
   %\VignetteIndexEntry{eurlex: Retrieve data on European Union law in R}
@@ -29,6 +29,8 @@ The `eurlex` R package attempts to significantly reduce the overhead associated
 
 The `eurlex` package currently envisions the typical use-case to consist of getting bulk information about EU legislation into R as fast as possible. The package contains three core functions to achieve that objective: `elx_make_query()` to create pre-defined or customized SPARQL queries; `elx_run_query()` to execute the pre-made or any other manually input query; and `elx_fetch_data()` to fire GET requests for certain metadata to the REST API.
 
+The package also contains largely self-explanatory functions for retrieving data on EU court cases (`elx_curia_list()`) and Council votes (`elx_council_votes()`) from outside Eur-Lex.
+
 ## `elx_make_query()`: Generate SPARQL queries
 
 The function `elx_make_query` takes as its first argument the type of resource to be retrieved from the semantic database that powers Eur-Lex (and other publications) called Cellar.
@@ -55,35 +57,47 @@ The choice of resource type is then reflected in the SPARQL query generated by t
 
 ```{r}
 query_dir %>% 
-  glue::as_glue() # for nicer printing
+  cat() # for nicer printing
 
 elx_make_query(resource_type = "caselaw") %>% 
-  glue::as_glue()
+  cat()
 
 elx_make_query(resource_type = "manual", manual_type = "SWD") %>% 
-  glue::as_glue()
+  cat()
 
 ```
 
 There are various ways of querying the same information in the Cellar database due to the existence of several overlapping classes and identifiers describing the same resources. The queries generated by the function should offer a reliable way of obtaining exhaustive results, as they have been validated by the helpdesk of the Publication Office. At the same time, it is always possible there will be issues either on the query or the database side; please report any you encounter through Github.
 
 The other arguments in `elx_make_query()` relate to additional metadata to be returned. The results include by default the [CELEX number](https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html) and exclude corrigenda (corrections of errors in legislation). Other data needs to be opted into. Make sure to select ones that are logically compatible (e.g. case law does not have a legal basis). More options should be added in the future.
 
-Note that availability of data for each variable has an impact on the results. The data frame returned by the query will be shrunken to the size of the variable with most missing data. It is recommended to always compare results from a desired query to a minimal query requesting only celex ids.
+Note that availability of data for each variable might have an impact on the results. The data frame returned by the query might be shrunken to the size of the variable with most missing data. It is recommended to always compare results from a desired query to a minimal query requesting only celex ids.
 
 ```{r}
 elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) %>% 
-  glue::as_glue()
+  cat()
 
 # minimal query: elx_make_query(resource_type = "directive")
 
 elx_make_query(resource_type = "recommendation", include_date = TRUE, include_lbs = TRUE) %>% 
-  glue::as_glue()
+  cat()
 
 # minimal query: elx_make_query(resource_type = "recommendation")
 
 ```
 
+You can also decide to not specify any resource types, in which case all types of documents will be returned. As there are over a million documents with a CELEX identifier, this is likely not efficient for a majority of users. But since version 0.3.5 it is possible to request documents belonging to a particular ["sector"](https://eur-lex.europa.eu/content/tools/TableOfSectors/types_of_documents_in_eurlex.html) or [directory code](https://eur-lex.europa.eu/browse/directories/legislation.html).
+
+```{r}
+# request documents from directory 18 ("Common Foreign and Security Policy")
+# and sector 3 ("Legal acts")
+
+elx_make_query(resource_type = "any",
+               directory = "18",
+               sector = 3) %>% 
+  cat()
+```
+
 Now that we have a query, we are ready to run it.
 
 ## `elx_run_query()`: Execute SPARQL queries
@@ -135,20 +149,16 @@ rec_eurovoc %>%
 By default, the endpoint returns the EuroVoc concept codes rather than the labels (keywords). The function `elx_label_eurovoc()` needs to be called to obtain a look-up table with the labels.
 
 ```{r eurovoctable}
-
 eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc)
 
 print(eurovoc_lookup)
-
 ```
 
 The results include labels only for unique identifiers, but with `dplyr::left_join()` it is straightforward to append the labels to the entire dataset.
 
 ```{r appendlabs}
-
 rec_eurovoc %>% 
   left_join(eurovoc_lookup)
-
 ```
 
 As elsewhere in the API, we can tap into the multilingual nature of EU documents also when it comes to the EuroVoc keywords. Moreover, most concepts in the thesaurus are associated with alternative labels; these can be returned as well (separated by a comma).
@@ -161,7 +171,6 @@ eurovoc_lookup <- elx_label_eurovoc(uri_eurovoc = rec_eurovoc$eurovoc,
 rec_eurovoc %>% 
   left_join(eurovoc_lookup) %>% 
   select(celex, eurovoc, labels)
-
 ```
 
 ## `elx_fetch_data()`: Fire GET requests
@@ -186,7 +195,7 @@ print(dir_titles)
 
 ```
 
-Note that text requests are by far the most time-intensive; requesting the full text for thousands of documents is liable to extend the run-time into hours. Currently, no method for downloading text in non-html/plain formats is implemented, which means pdf-only texts will be missing from the results.^[It is worth pointing out that the html and pdf contents of older case law differs. Whereas typically the html file is only going to contain a summary and grounds of a judgment, the pdf should also contain background to the dispute.]
+Note that text requests are by far the most time-intensive; requesting the full text for thousands of documents is liable to extend the run-time into hours. Texts are retrieved from html by priority, but methods for pdfs and .docs are also implemented.^[It is worth pointing out that the html and pdf contents of older case law differs. Whereas typically the html file is only going to contain a summary and grounds of a judgment, the pdf should also contain background to the dispute.] The function even handles multi-document resources (by pasting them together).
 
 # Application
 
@@ -213,7 +222,9 @@ Directives become naturally outdated with time. It might be all the more interes
 
 ```{r}
 dirs %>% 
-  ggplot(aes(x = as.Date(date), y = celex)) +
+  filter(!is.na(force)) %>% 
+  mutate(date = as.Date(date)) %>% 
+  ggplot(aes(x = date, y = celex)) +
   geom_point(aes(color = force), alpha = 0.1) +
   theme(axis.text.y = element_blank(),
         axis.line.y = element_blank(),
@@ -251,6 +262,6 @@ dirs_1970_title %>%
 
 I use term-frequency inverse-document frequency (tf-idf) to weight the importance of the words in the wordcloud. If we used pure frequencies, the wordcloud would largely consist of words conveying little meaning ("the", "and", ...).
 
-This is an extremely basic application of the `eurlex` package. Much more sophisticated methods can be used to analyse both the content and metadata of European Union legislation. If the package is useful for your research, please consider citing it.
+This is an extremely basic application of the `eurlex` package. Much more sophisticated methods can be used to analyse both the content and metadata of European Union legislation. If the package is useful for your research, please consider citing the [accompanying paper](https://www.tandfonline.com/doi/full/10.1080/2474736X.2020.1870150).^[Michal Ovádek (2021) Facilitating access to data on European Union laws, Political Research Exchange, 3:1, DOI: [10.1080/2474736X.2020.1870150](https://www.tandfonline.com/doi/full/10.1080/2474736X.2020.1870150)]