Skip to content
Nima Hejazi edited this page Aug 11, 2017 · 18 revisions

HAL todo list

  • Move duplicationCheck into its own function

    • Easier to test
    • Easier to try different versions
    • No real performance impact
  • Alternative makeSparseMat implementations:

    • C++
      • Armadillo sparseMat
      • Eigen?
      • Eventually: OpenMP and OpenACC (GPU) support
    • R
      • dplyr
      • data.table
  • Alternative duplicationCheck implementations

    • C++
      • Armadillo sparseMat
      • Eigen?
  • Early stopping on interactions

    • So we can avoid the super high interactions
  • Interaction restarting

    • Resume adding interactions after initial partial fit to see if it improves performance
  • More extensive performance profiling (RAM and CPU)

  • Clear written description of all parts of the algorithm

    • Solid examples in "On Adaptive Propensity Score Truncation in Causal Inference" (Cheng Ju et al.)
  • Alternative lasso implementations

  • Alternative prediction implementations

    • R
      • dplyr
      • data.table
    • C++
      • Armadillo / MLPACK
  • Larger algorithm re-implementation

    • Save indicator functions in a list with two vectors: variables used (e.g. x1, x3) and cutoffs (1.5, 10.2)
  • Wider R ML framework support

    • mlr wrapper
    • caret wrapper
  • Python implementation based on C++ core (ala xgboost, arborist, etc.)

    • then scikit-learn wrapper
Clone this wiki locally