-
Notifications
You must be signed in to change notification settings - Fork 7
To Dos
Nima Hejazi edited this page Aug 11, 2017
·
18 revisions
-
Move duplicationCheck into its own function
- Easier to test
- Easier to try different versions
- No real performance impact
-
Alternative makeSparseMat implementations:
- C++
- Armadillo sparseMat
- Eigen?
- Eventually: OpenMP and OpenACC (GPU) support
- R
dplyr
data.table
- C++
-
Alternative duplicationCheck implementations
- C++
- Armadillo sparseMat
- Eigen?
- C++
-
Early stopping on interactions
- So we can avoid the super high interactions
-
Interaction restarting
- Resume adding interactions after initial partial fit to see if it improves performance
-
More extensive performance profiling (RAM and CPU)
-
Clear written description of all parts of the algorithm
- Solid examples in "On Adaptive Propensity Score Truncation in Causal Inference" (Cheng Ju et al.)
-
Alternative lasso implementations
h2o
- C++
- MLPACK
RcppMLPACK2
-
Alternative prediction implementations
- R
dplyr
data.table
- C++
- Armadillo / MLPACK
- R
-
Larger algorithm re-implementation
- Save indicator functions in a list with two vectors: variables used (e.g. x1, x3) and cutoffs (1.5, 10.2)
-
Wider R ML framework support
-
mlr
wrapper -
caret
wrapper
-
-
Python implementation based on C++ core (ala
xgboost
,arborist
, etc.)- then
scikit-learn
wrapper
- then