-
Notifications
You must be signed in to change notification settings - Fork 13
/
README.Rmd
173 lines (109 loc) · 5.91 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
output: md_document
---
[![Build
Status](https://travis-ci.org/DCMSstats/eesectors.svg?branch=master)](https://travis-ci.org/DCMSstats/eesectors)
[![Build
status](https://ci.appveyor.com/api/projects/status/vulmerft4p30339l/branch/master?svg=true)](https://ci.appveyor.com/project/ivyleavedtoadflax/eesectors/branch/master)
[![codecov.io](http://codecov.io/github/DCMSstats/eesectors/coverage.svg?branch=master)](http://codecov.io/github/DCMSstats/eesectors?branch=master)
[![GitHub
tag](https://img.shields.io/github/tag/DCMSstats/eesectors.svg)](https://github.com/DCMSstats/eesectors/releases)
```{r setup, echo=FALSE, include=FALSE}
# Note to compile this file to README.mb, run the following:
# rmarkdown::render('README.Rmd',output_format = 'md_document')
knitr::opts_chunk$set( echo = TRUE, warning = FALSE, message = FALSE, error =
FALSE )
```
# eesectors
## Functions for producing the Economic Estimates for DCMS Sectors
First Statistical Release
**This is a prototype and subject to constant development**
This package provides functions used in the creation of a Reproducible
Analytical Pipeline (RAP) for the Economic Estimates for DCMS sectors
publication.
See the [eesectorsmarkdown](https://github.com/DCMSstats/eesectorsmarkdown) repository for an example of implementing these functions in the context of a Statistical First Release (SFR).
## Installation
The package can then be installed using `devtools::install_github('DCMSstats/eesectors')`.
Some users may not be able to use the `devtools::install_github()` commands as a result of network security settings.
If this is the case, `eesectors` can be installed by downloading the [zip of the repository](https://github.com/DCMSstats/eesectors/archive/master.zip) and installing the package locally using `devtools::install_local(<path to zip file>)`.
## Quick start
This package provides functions to recreate Chapter three -- Gross Value Added
(GVA) of the [Economic estimates of DCMS Sectors](https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/544103/DCMS_Sectors_Economic_Estimates_-_August_2016.pdf).
### Extracting data from underlying spreadsheets
The data are provided to DCMS as spreadsheets provided by the Office for
National Statistics (ONS). Hence, the first set of functions in the package are
designed to extract the data from these spreadsheets, and combine the data into
a single dataset, ready to be checked, and converted into tables and figures.
There are four `extract_` functions:
* `extract_ABS_data`
* `extract_DCMS_sectors`
* `extract_GVA_data`
* `extract_SIC91_data`
* `extract_tourism_data`
---
**Note: that with the exception of `extract_DCMS_sectors`, the data extracted by these functions is potentially disclosive, and should therefore be handled with care and considered to be OFFICIAL-SENSITIVE. Steps must be taken to prevent the accidental disclosure of these data.**
**These should include (but not be limited to):**
* Not storing the output of these functions in git repositories
* Labelling of official data with a prefix/suffix of 'OFFICIAL'
* The use of githooks to search for potentially disclosive files at time of commit and push (e.g. https://github.com/ukgovdatascience/dotfiles)
* Suitable 2i/QA steps
---
The extract functions will return a `data.frame`, and can be called as follows (see individual function documentation for more information about each of the arguments).
```r
# path to spreadsheet containing the underlying data
input <- 'working_file_YYYY.xlsm'
# eesectors has a built in example spreadsheet
input <- example_working_file("example_working_file.xlsx")
extract_ABS_data(input)
```
The various datasets used in the GVA chapter can be combined with the `combine_GVA()` function, which will return a `data.frame` of the combined data.
```r
combine_GVA(
ABS = extract_ABS_data(input),
GVA = extract_GVA_data(input),
SIC91 = extract_SIC91_data(input),
tourism = extract_tourism_data(input)
)
```
### Automated checking
The GVA chapter is built around the `year_sector_data` class. To create a
`year_sector_data` object, a `data.frame` must be passed to it which contains
all the data required to produce the tables and charts in Chapter three.
An example of how this dataset will need to look is bundled with the package:
`GVA_by_sector_2016`. These data were extracted directly from the 2016 SFR which
is in the public domain, and provide a test case for evaluating the data.
```{r}
library(eesectors)
GVA_by_sector_2016
```
When an object is instantiated into the `year_sector_data` class, a number of checks
are run on the data passed as the first argument. These are explained in more
detail in the help `?year_sector_data()`
```{r warning=TRUE, message=TRUE}
gva <- year_sector_data(GVA_by_sector_2016)
```
Any failed checks are raised as warnings, not errors, and so the user is able to continue.
However it is also possible to log these warnings as github issues by setting `log_issues=TRUE`.
This is a prototype feature that needs additional work to increase the usefulness of these issues, see below for details on environmental variables that are required for this functionality to work.
### Creating tables and charts
Tables and charts for Chapter three can be reproduced simply by running the relevant functions:
```{r}
year_sector_table(gva)
```
```{r figure3.1}
figure3.1(gva)
```
Note that figures produced remain `ggplot2` objects, and can therefore be edited
in the following way:
```{r figure3.2}
p <- figure3.2(gva)
p
```
Titles, and other layers can then be added simply:
```{r figure3.2edited}
library(ggplot2)
p + ggtitle('Figure 3.2: Indexed growth in GVA (2010 =100)\n in DCMS sectors and UK: 2010-2015')
```
Note that figures make use of the [govstyle](https://github.com/ukgovdatascience/govstyle) package.
See the
[vignette](https://github.com/ukgovdatascience/govstyle/blob/master/vignettes/absence_statistics.md) for more information on how to use this package.