Skip to content

Commit

Permalink
Merge pull request #39 from olivierlabayle/release_0.21
Browse files Browse the repository at this point in the history
Release 0.21
  • Loading branch information
olivierlabayle authored Jan 13, 2022
2 parents b5d4741 + 3ee808e commit a838180
Show file tree
Hide file tree
Showing 19 changed files with 433 additions and 5,198 deletions.
16 changes: 7 additions & 9 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,17 @@ version = "0.2.0"

[deps]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
HypothesisTests = "09f84164-cd44-5f33-b23f-e6b0d136a0d5"
MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
MLJGLMInterface = "caf8df21-4939-456d-ac9c-5fefbfb04c0c"
MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"

[compat]
CategoricalArrays = "0.10"
Distributions = "0.25"
GLM = "1.5"
Tables = "1.5"
MLJ = "0.16"
MLJBase = "0.18"
HypothesisTests = "0.10"
MLJBase = "0.19"
MLJGLMInterface = "0.1"
julia = "1.1"
MLJModels = "0.15"
Tables = "1.5"
julia = "1.6"
3 changes: 2 additions & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@ CategoricalArrays = "0.10"
Distributions = "0.25"
Documenter = "0.27"
GLM = "1.5"
MLJ = "0.16"
MLJ = "0.17"
MLJDecisionTreeInterface = "0.1"
NearestNeighborModels = "0.1"
MLJLinearModels = "0.5"
Tables = "1.5"
julia = "1.1"
72 changes: 65 additions & 7 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,10 +128,12 @@ fit!(mach)
briefreport(mach)
```

The content of the brief report is:
The content of the brief report is a Tuple that for each query presents a NamedTuple containing the following fields:
- query: The associated query
- pvalue: The p-value
- confint: A 95% confidence interval around the estimated quantity
- estimate: An estimate of the quantity of interest
- initial_estimated: The initial estimate that we would have reached without applying the tmle step
- stderror: The estimate of the standard error
- mean_inf_curve: The empirical mean of the influence curve

Expand Down Expand Up @@ -186,29 +188,69 @@ LogisticClassifier = @load LogisticClassifier pkg=MLJLinearModels verbosity=0
query = (T=[1, 0],)
Q = MLJ.DeterministicConstantRegressor()
G = LogisticClassifier()
tmle = TMLEstimator(, G, query)
tmle = TMLEstimator(Q, G, query)
```

Now, all there is to do is to fit the estimator:

```julia
mach = machine(tmle, T, W, y)
fit!(mach)
```

Their are various ways in which you can investigate the results:
#### Regular MLJ entrypoints: `fitted_params` and `report`

- `fitted_params`

all_results = fitted_params(mach)
```julia
fitted_params(mach)
```

The `fitted_params` function gives access to a `NamedTuple` that contains all results from the fit, including:
The `fitted_params` function is the regular `MLJ` entrypoint to retrieve all fitted parameters for all submachines in our TMLEstimator machine, it gives access to a `NamedTuple` that contains all results from the fit, including:
- A fitresult for Q
- A fitresult for G
- A fitresult for the fluctuation denoted F
- A report R containing values for all: estimate, stderror and mean_inf_curve

A simplified way to access the report values only is using `briefreport`:
- `report`

```julia
report(mach)
```

The full report of the fitted_machine, including an entry for each query denoted by fields `queryreport_$i` where `i` is the query index. Each of this entry is a `QueryReport` entity that contains all the necessary information you might need to extract for this specific query.

#### TMLE.jl Specific entrypoints

- `getqueryreport`

```julia
qr = getqueryreport(mach, 1)
```

This will give you an easy access to the `QueryReport` structure.

- `ztest`

It can be called either on the machine by providing a sequence of indices (see the [multiple-queries section](#multiple-queries) for an exemple for more than 1 query) or on the query report itself.

```julia
ztest(mach, 1) == ztest(qr)
ztest(qr)
```

It is a simple wrapper over the `OneSampleZTest` from the [HypothesisTests.jl](https://juliastats.org/HypothesisTests.jl/stable/) package and will provide a confidence interval, a p-value, etc....

- `briefreport`

```julia
briefreport(mach)
```

Finally, the `briefreport` function provides an easy way to access most relevant information in usual cases.

#### Conslusion

We can see that even if one nuisance parameter is misspecified, the double robustness of TMLE enables correct estimation of our target.

### IATE
Expand Down Expand Up @@ -280,6 +322,7 @@ fit!(mach)
briefreport(mach)
```


### Multiple queries

We have seen that we need to estimate nuisance parameters as well as possible and this is usually where the performance bottleneck lies because we are using stacking and many learning algorithms. We might also be interested in multiple questions all related to the same dataset setting. In such a situation, nuisance parameters can be estimated only once while
Expand Down Expand Up @@ -323,7 +366,22 @@ fit!(mach)
briefreport(mach)
```

The report is now a vector of `NamedTuple` as described in [the getting started section](#quick-start).
The report contains a `QueryRport` for each query.

One can for instance perform a paired Z-Test to compare if the estimate resulting from two different queries is significantly different. Here we compare the first and third query:

```julia
ztest(mach, 1 => 3)
```

Or perform a simple Z-Test for each query:

```julia
ztest(mach, 1, 2, 3)
```

which will output a Tuple of three tests.

## API


Expand Down
31 changes: 0 additions & 31 deletions notebooks/Project.toml

This file was deleted.

Loading

0 comments on commit a838180

Please sign in to comment.