Skip to content
Chris Kennedy edited this page Aug 25, 2017 · 18 revisions
  • Move duplicationCheck into its own function

    • Easier to test
    • Easier to try different versions
    • No real performance impact
  • Alternative makeSparseMat implementations:

  • Alternative duplicationCheck implementations

    • C++
      • Armadillo sparseMat
      • Eigen?
  • Early stopping on interactions

    • So we can avoid the super high interactions
  • Interaction restarting

    • Resume adding interactions after initial partial fit to see if it improves performance
  • More extensive performance profiling (RAM and CPU)

    • Also compare LAPACK/BLAS implementations: default, Intel MKL, or OpenBLAS
    • Improved compilation may also be helpful, e.g. -O3 optimization in gcc.
  • Clear written description of all parts of the algorithm

    • Solid examples in "On Adaptive Propensity Score Truncation in Causal Inference" (Cheng Ju et al.)
  • Alternative lasso implementations

    • h2o
    • C++
      • MLPACK
    • RcppMLPACK2
      • CK: this version doesn't include the MLPACK libraries, and therefore has a much higher installation burden, esp. for windows users. Perhaps there is a pre-built bundled version somewhere.
      • CK: Also "Loading the package will crash an RStudio binary that is older than the daily build version 1.1.129."
      • NH: the project itself seems like it's an early stage kind of thing (based on a quick glance) and they don't support Windows themselves, other than recommending building from source. if RcppMLPACK is for some reason more robust (other than including the actual mlpack source) than I can see a good argument for using that; otherwise, there's not much sense in using a significantly older mlpack version just to cater to some users.
      • NH: Aside: if we're going to comment extensively here, can we move this to an issue? It's really annoying to cut my comment, reload the page, and paste it, just due to the page being edited.
      • CK: Sure, sounds good
    • shogun seems like a nice option as well -- we could either access the library via the existing R wrappers, or perhaps use Rcpp to access the C++ implementations directly. building a core around this would also allow easier porting to Python (since shogun has Python wrappers as well)
  • Alternative prediction implementations

    • R
      • dplyr
      • data.table
    • C++
      • Armadillo / MLPACK
  • Larger algorithm re-implementation

    • Save indicator functions in a list with two vectors: variables used (e.g. x1, x3) and cutoffs (1.5, 10.2)
  • Wider R ML framework support

    • mlr wrapper
    • caret wrapper
  • Python implementation based on C++ core (ala xgboost, arborist, etc.)

    • then scikit-learn wrapper
Clone this wiki locally