-
Notifications
You must be signed in to change notification settings - Fork 7
To Dos
Nima Hejazi edited this page Aug 11, 2017
·
18 revisions
-
Move duplicationCheck into its own function
- Easier to test
- Easier to try different versions
- No real performance impact
-
Alternative makeSparseMat implementations:
- C++
- Armadillo sparseMat
- Eigen?
- Eventually: OpenMP and OpenACC (GPU) support
- R
dplyr
data.table
- C++
-
Alternative duplicationCheck implementations
- C++
- Armadillo sparseMat
- Eigen?
- C++
-
Early stopping on interactions
- So we can avoid the super high interactions
-
Interaction restarting
- Resume adding interactions after initial partial fit to see if it improves performance
-
More extensive performance profiling (RAM and CPU)
-
Clear written description of all parts of the algorithm
- Solid examples in "On Adaptive Propensity Score Truncation in Causal Inference" (Cheng Ju et al.)
-
Alternative lasso implementations
h2o
- C++
- MLPACK
-
RcppMLPACK2
- CK: this version doesn't include the MLPACK libraries, and therefore has a much higher installation burden, esp. for windows users. Perhaps there is a pre-built bundled version somewhere.
- CK: Also "Loading the package will crash an RStudio binary that is older than the daily build version 1.1.129."
- NH: the project itself seems like it's an early stage kind of thing (based on a quick glance) and they don't support Windows themselves, other than recommending building from source. if
RcppMLPACK
is for some reason more robust (other than including the actualmlpack
source) than I can see a good argument for using that; otherwise, there's not much sense in using a significantly oldermlpack
version just to cater to some users. Aside: if we're going to comment here, can we move this to an issue -- it's really annoying to cut my comment, reload the page, and paste it, just due to the page being edited.
-
Alternative prediction implementations
- R
dplyr
data.table
- C++
- Armadillo / MLPACK
- R
-
Larger algorithm re-implementation
- Save indicator functions in a list with two vectors: variables used (e.g. x1, x3) and cutoffs (1.5, 10.2)
-
Wider R ML framework support
-
mlr
wrapper -
caret
wrapper
-
-
Python implementation based on C++ core (ala
xgboost
,arborist
, etc.)- then
scikit-learn
wrapper
- then