-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
139 lines (99 loc) · 3.88 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# readnet
The goal of readnet is to provide tools for preprocessing network data (a companion to igraph, statnet, netdiffuseR, etc.)
Some of the features:
* Automatically handle coding for nodes, time and groups.
* If either `group.id` or `time` are specified, the function makes recursive calls, returning nested edgelists with the following structure:
```
object
--- group1:
--- edgelist in time1
--- ...
--- edgelist in time n
...
--- group n:
--- edgelist in time1
--- ...
--- edgelist in time n
```
* Includes functions to coerce the resulting edgelists into igraph objects and viceversa.
This project is on development and not published on CRAN yet.
## Installation
You can get `readnet` using devtools:
```r
devtools::install_github("USCCANA/readnet")
```
## Example
## Survey to Edgelist
This is a basic example which shows you how to solve a common problem. Suppose that you have a dataset that looks something like this:
```{r fakedata, echo=FALSE}
dat0 <- data.frame(
ego = c("a", "b", "c"),
alter1 = c("b", NA, NA),
alter2 = NA,
time = 0
)
dat1 <- data.frame(
ego = c("a", "b", "c"),
alter1 = c("b", "a", NA),
alter2 = c("c", NA, NA),
time = 1
)
dat <- rbind(dat0, dat1)
```
```{r}
dat
```
This is, a survey with network nominations, and in this case, including a time variable. We can use the function `survey_to_edgelist` to process this data
```{r example}
library(readnet)
(ans <- survey_to_edgelist(dat, "ego", c("alter1", "alter2"), time = "time"))
```
The previous returned a list with elements of class `rn_edgelist`. Each element contains an edgelist, a vector of labels, and a dataframe with data. We can use the function `as_igraph` to turn this into an igraph object
```{r}
as_igraph(ans)
```
and as a `rn_edgelist` back
```{r}
as_rn_edgelist(as_igraph(ans))
```
## Setting up data for Siena (or alike :))
Here is an example that I built for [Prof. Kayla de la Haye](http://profiles.sc-ctsi.org/kayla.delahaye/). The idea here is that we have an edgelist with (finite) timestamps, a dataset with node attributes, and a list of ids that we would like to include in the data. The final output looks like lists of adjacency matrices of $n\times n$ (so we coded the edgelist as 1 through `n`) and `data.frame` of the attributes matching the positions of nodes in the edgelist (you may be able to do something similar with igraph, but the key difference here is that this function allows you to include time and returns the attribute dataset sorted accordignly):
```r
library(readnet)
library(dplyr)
edgelist <- readxl::read_excel("playground/kayla/GROW_Edgelist clean for George.xlsx")
dat <- readr::read_tsv("playground/kayla/GROW Attribute Data for George.csv")
ids2work <- readxl::read_excel("playground/kayla/GROW_Edgelist clean for George.xlsx", "NetMembersN400")
# Catched an error:
# Error: Missing ids in `data`. There are 610 observations in `data` and 613
# identified ids. The observations that are missing in `data` are: '812398',
# '812660', '831086'.
# The same error shows up when I run it filtering the data:
#
# Error: Missing ids in `data`. There are 394 observations in `data` and 397
# identified ids. The observations that are missing in `data` are: '812398',
# '812660', '831086'. So
#
# So I'll just drop it!
dids <- unique(dat$study_id)
edgelist <- filter(edgelist, (Sender %in% dids) & (Receiver %in% dids))
ans <- edgelist_to_adjmat_w_attributes(
edgelist = edgelist,
data = dat,
data.idvar = "study_id",
edgelist.timevar = "wave",
ids.to.keep = ids2work$study_id
)
```