Skip to content
Chris Kennedy edited this page Aug 11, 2017 · 18 revisions

HAL todo list

  • Move duplicationCheck into its own function
    • Easier to test
    • Easier to try different versions
    • No real performance impact
  • Alternative makeSparseMat implementations
    • C++
      • Armadillo sparseMat
      • Eigen?
      • Eventually: OpenMP and OpenACC (GPU) support
    • R
      • dplyr
      • data.table
  • Alternative duplicationCheck implementations
    • C++
      • Armadillo sparseMat
      • Eigen?
  • Early stopping on interactions
    • So we can avoid the super high interactions
  • Interaction restarting
    • Resume adding interactions after initial partial fit to see if it improves performance
  • More extensive performance profiling (RAM and CPU)
  • Clear written description of all parts of the algorithm
    • Solid examples in Cheng Ju et al.’s “On Adaptive Propensity Score Truncation in Causal Inference”
  • Alternative lasso implementations
    • h2o
    • C++
      • MLPACK
  • Alternative prediction implementations
    • R
      • dplyr
    • C++
      • Armadillo / MLPACK
  • Larger algorithm re-implementation
    • Saving the indicator functions in a list with two vectors: variables used (e.g. x1, x3), and cutoffs (1.5, 10.2)
  • Wider R ML framework support
    • mlr wrapper
    • caret wrapper
  • Python implementation based on C++ core (ala xgboost, arborist, etc.)
    • then scikit-learn wrapper
Clone this wiki locally