You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a follow-up to the discussion I had with @PhilippBach I'm posting a few questions I have regarding an ongoing analysis.
Context:
I'm running an experiment where we split a set of routes into two groups (A and B) and then apply an intervention on group B routes only.
Questions:
Since we do the route splitting into two groups I'm wondering if we don't need the propensity score model in our case (i.e. no selection bias) and we should set score='experimental' - is that assumption correct?
With cross sectional data how to decide how granular we want to be with time (using days at the moment)? e.g. if we go down to 1 second resolution we'll probably get very few events per row.
How to evaluate RMSE figures for the chosen model and parameters? What's "good enough" for a DiD analysis? Should we check the fit summary and coefficients with a few different types of models/params?
We have a few categorical features which we'd like to use with a catboost regressor as it handles categorical features without any pre-processing but it looks like DoubleMLData doesn't support non-numerical features, so we used one-hot-encoding. Ideally we'd like to avoid that, is there any workaround to use categorical features with catboost in double ml DiD without pre-processing?
Finally, we use sensitivity analysis in a very basic way with the default params i.e. .sensitivity_analysis(cf_y=0.04, cf_d=0.03). That works nicely out of the box, and when we get zero included in the interval we follow up with benchmarking but so far we haven't been able to identify (through trial and error) any covariates with strong confounding effect. Is there anything else we can try in that space? (I understand that in some cases it might be impossible to identify the unobserved confounder because it might be not feasible, not available in the data etc.)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi
As a follow-up to the discussion I had with @PhilippBach I'm posting a few questions I have regarding an ongoing analysis.
Context:
I'm running an experiment where we split a set of routes into two groups (A and B) and then apply an intervention on group B routes only.
Questions:
score='experimental'
- is that assumption correct?DoubleMLData
doesn't support non-numerical features, so we used one-hot-encoding. Ideally we'd like to avoid that, is there any workaround to use categorical features with catboost in double ml DiD without pre-processing?.sensitivity_analysis(cf_y=0.04, cf_d=0.03)
. That works nicely out of the box, and when we get zero included in the interval we follow up with benchmarking but so far we haven't been able to identify (through trial and error) any covariates with strong confounding effect. Is there anything else we can try in that space? (I understand that in some cases it might be impossible to identify the unobserved confounder because it might be not feasible, not available in the data etc.)Cheers,
Nik
Beta Was this translation helpful? Give feedback.
All reactions