-
Notifications
You must be signed in to change notification settings - Fork 7
To Dos
Chris Kennedy edited this page Aug 12, 2017
·
18 revisions
-
Move duplicationCheck into its own function
- Easier to test
- Easier to try different versions
- No real performance impact
-
Alternative makeSparseMat implementations:
- C++
- Armadillo sparseMat
- Eigen?
- Eventually: OpenMP and OpenACC (GPU) support
- R
dplyr
data.table
- C++
-
Alternative duplicationCheck implementations
- C++
- Armadillo sparseMat
- Eigen?
- C++
-
Early stopping on interactions
- So we can avoid the super high interactions
-
Interaction restarting
- Resume adding interactions after initial partial fit to see if it improves performance
-
More extensive performance profiling (RAM and CPU)
- Also compare LAPACK/BLAS implementations: default, Intel MKL, or OpenBLAS
- Improved compilation may also be helpful, e.g. -O3 optimization in gcc.
-
Clear written description of all parts of the algorithm
- Solid examples in "On Adaptive Propensity Score Truncation in Causal Inference" (Cheng Ju et al.)
-
Alternative lasso implementations
h2o
- C++
- MLPACK
-
RcppMLPACK2
- CK: this version doesn't include the MLPACK libraries, and therefore has a much higher installation burden, esp. for windows users. Perhaps there is a pre-built bundled version somewhere.
- CK: Also "Loading the package will crash an RStudio binary that is older than the daily build version 1.1.129."
- NH: the project itself seems like it's an early stage kind of thing (based on a quick glance) and they don't support Windows themselves, other than recommending building from source. if
RcppMLPACK
is for some reason more robust (other than including the actualmlpack
source) than I can see a good argument for using that; otherwise, there's not much sense in using a significantly oldermlpack
version just to cater to some users. - NH: Aside: if we're going to comment extensively here, can we move this to an issue? It's really annoying to cut my comment, reload the page, and paste it, just due to the page being edited.
- CK: Sure, sounds good
-
shogun
seems like a nice option as well -- we could either access the library via the existing R wrappers, or perhaps useRcpp
to access the C++ implementations directly. building a core around this would also allow easier porting to Python (sinceshogun
has Python wrappers as well)
-
Alternative prediction implementations
- R
dplyr
data.table
- C++
- Armadillo / MLPACK
- R
-
Larger algorithm re-implementation
- Save indicator functions in a list with two vectors: variables used (e.g. x1, x3) and cutoffs (1.5, 10.2)
-
Wider R ML framework support
-
mlr
wrapper -
caret
wrapper
-
-
Python implementation based on C++ core (ala
xgboost
,arborist
, etc.)- then
scikit-learn
wrapper
- then