Skip to content
Nima Hejazi edited this page Aug 11, 2017 · 18 revisions
  • Move duplicationCheck into its own function

    • Easier to test
    • Easier to try different versions
    • No real performance impact
  • Alternative makeSparseMat implementations:

    • C++
      • Armadillo sparseMat
      • Eigen?
      • Eventually: OpenMP and OpenACC (GPU) support
    • R
      • dplyr
      • data.table
  • Alternative duplicationCheck implementations

    • C++
      • Armadillo sparseMat
      • Eigen?
  • Early stopping on interactions

    • So we can avoid the super high interactions
  • Interaction restarting

    • Resume adding interactions after initial partial fit to see if it improves performance
  • More extensive performance profiling (RAM and CPU)

  • Clear written description of all parts of the algorithm

    • Solid examples in "On Adaptive Propensity Score Truncation in Causal Inference" (Cheng Ju et al.)
  • Alternative lasso implementations

    • h2o
    • C++
      • MLPACK
    • RcppMLPACK2
      • CK: this version doesn't include the MLPACK libraries, and therefore has a much higher installation burden, esp. for windows users. Perhaps there is a pre-built bundled version somewhere.
      • CK: Also "Loading the package will crash an RStudio binary that is older than the daily build version 1.1.129."
      • NH: the project itself seems like it's an early stage kind of thing (based on a quick glance) and they don't support Windows themselves, other than recommending building from source. if RcppMLPACK is for some reason more robust (other than including the actual mlpack source) than I can see a good argument for using that; otherwise, there's not much sense in using a significantly older mlpack version just to cater to some users. Aside: if we're going to comment here, can we move this to an issue -- it's really annoying to cut my comment, reload the page, and paste it, just due to the page being edited.
  • Alternative prediction implementations

    • R
      • dplyr
      • data.table
    • C++
      • Armadillo / MLPACK
  • Larger algorithm re-implementation

    • Save indicator functions in a list with two vectors: variables used (e.g. x1, x3) and cutoffs (1.5, 10.2)
  • Wider R ML framework support

    • mlr wrapper
    • caret wrapper
  • Python implementation based on C++ core (ala xgboost, arborist, etc.)

    • then scikit-learn wrapper
Clone this wiki locally