This repository has been archived by the owner on Apr 7, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
14-final.Rmd
71 lines (57 loc) · 5.7 KB
/
14-final.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# Final Thoughts {#final}
It's called Data Science for a reason, record all the data handling, experimentation and analysis in your [lab-notebook](http://colinpurrington.com/tips/lab-notebooks).
Version control is your digital lab notebook.
## A Structured RAP Course
You're welcome to dip into the previous chapters as and when you need, but you may prefer a more comprehensive grounding in the principles of reproducibility. We provide a sequenced list of lessons here to help you on your journey in becoming a RAP champion. We suggest you work through the list. The links designated as HELP are provided as comprehensive resources if you get stuck, and can be otherwise skipped. The estimated completion time for a resource is given in parentheses.
* The Unix Shell or terminal
+ [Interactive lesson that prepares you for git](http://swcarpentry.github.io/shell-novice) (3 hours)
* Version Control with git
+ [Quick overview for those without a computing background](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004668) (30 mins)
+ [Interactive lesson](http://swcarpentry.github.io/git-novice/) (3 hours)
+ HELP: [Comprehensive git book](https://git-scm.com/book/en/v2)
* git and Github
+ [The difference between git and Github](https://stackoverflow.com/questions/13321556/difference-between-git-and-github) (5 mins)
+ ADVANCED: [Github workflow written](http://scottchacon.com/2011/08/31/github-flow.html) (1 hour)
+ [Github workflow visual](https://guides.github.com/introduction/flow/)
* RStudio, R projects and R fundamentals
+ [Interactive lesson](http://swcarpentry.github.io/r-novice-gapminder/) (12 hours)
+ [Interactive lesson (some overlap with above lesson)](http://swcarpentry.github.io/r-novice-inflammation/) (7 hours)
+ HELP:
* Rmarkdown
+ [Passive lesson](http://rmarkdown.rstudio.com/lesson-1.html) (3 hours)
+ [MoJ tutorial](https://github.com/moj-analytical-services/rmarkdown_training)
+ HELP: Write a book in Rmarkdown using [bookdown](https://bookdown.org/yihui/bookdown/)
* R packages
+ HELP: [Hadley Wickham's R packages](http://r-pkgs.had.co.nz/)
* Prevent dependency hell with packrat
+ [packrat setup](https://rstudio.github.io/packrat/walkthrough.html) (1 hour)
* Data concerns
+ [Tidy data](https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjqp8KdjK_XAhXJmBoKHeFMDGEQFggqMAA&url=https%3A%2F%2Fwww.jstatsoft.org%2Farticle%2Fview%2Fv059i10%2Fv59i10.pdf&usg=AOvVaw2vJ6CHw9RT8m_noVUfoeP6) (1 hour)
+ [The minimal tidy data set](https://ukgovdatascience.github.io/rap_companion/exemplar.html#tidy-data) (1 hour)
+ HELP: [thinking about data from spreadsheets](http://www.datacarpentry.org/spreadsheet-ecology-lesson/)
+ HELP: [data management with SQL SQL](http://www.datacarpentry.org/sql-ecology-lesson/)
+ HELP: [organising data](http://kbroman.org/dataorg/)
* Unit tests
+ [Hadley Wickham's chapter](http://r-pkgs.had.co.nz/tests.html) (2 hours)
* Automated testing; detects problems you might miss.
+ [Using RMD check](http://r-pkgs.had.co.nz/check.html)
+ [Getting started with Travis](https://docs.travis-ci.com/user/getting-started/)
+ [Building an R project with Travis](https://docs.travis-ci.com/user/languages/r/)
+ [Look at an example RAP .travis.yml](https://github.com/DCMSstats/eesectors/blob/master/.travis.yml)
* Further reading
+ [Software carpentry](https://software-carpentry.org/reading/)
+ [Data carpentry](http://www.datacarpentry.org/)
+ [DevOps](https://en.wikipedia.org/wiki/DevOps)
+ [DataOps](https://en.wikipedia.org/wiki/DataOps)
+ [Awesome R packages for DevOps](https://awesome-r.com/#awesome-r-r-development)
+ [R for Data Science](http://r4ds.had.co.nz/)
+ [Evidence based data analysis](http://www.pnas.org/content/112/6/1645.full)
It's also worthwhile looking at other Department's RAP efforts. For a good open example see [DCMS's eesectors package](https://github.com/DCMSstats/eesectors).
## RAP MOOC
To complement this book, one of our RAPpers has developed a [Massive Online Open Course](https://www.udemy.com/reproducible-analytical-pipelines/) to share an approach to learning this technical skill-set. This course is an informal introduction and describes the best practices through the use of screencasts and assignments. It is currently available on Udemy and takes you through the RAP journey using a simple RAP example.
## User feedback
The RAP companion is intended to point data scientists in the Civil Service towards the Data Ops toolkit that should be used when attempting to automate some of the boring stuff for their colleagues. Your feedback is welcome and important. It will help us improve the RAP companion through further iterations.
You can feedback through completing this [Google form](https://docs.google.com/forms/d/e/1FAIpQLSeVYmjJIPm-YJ_lgKu0JdIiUwc2glSLtfGFQxKdW1cMmRwbCQ/viewform?usp=pp_url&entry.1747016377=4&entry.305553560=4&entry.349499540&entry.1168732002=Column+4&entry.1948461863=Column+4&entry.1262699325=Column+4&entry.1033407422=Column+4&entry.900063493=Column+4&entry.811492760=Column+4&entry.2083454847&entry.2141214542=Yes&entry.1340586078&entry.879050699&entry.1223500353) which allows us to [measure your satisfaction](https://www.gov.uk/service-manual/measuring-success/measuring-user-satisfaction).
Or, by raising an [issue on the RAP companion repo page](https://github.com/ukgovdatascience/rap_companion/issues).
## Just the beginning...
We've introduced you to the basics of reproducibility and Data Ops. For further development ideas and inspiration consider the [Data Ops manifesto](http://dataopsmanifesto.org/).