Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add confidence interval for MWU #226

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kschuerholt
Copy link

#225
Implemented CI from 'Calculating confidence intervals for some non-parametric analyses', Campbell and Gardner 1988. CI Style is adapted from ttest. The same publication offers a solution for wilcoxon, which is not yet implemented but could be added fairly easily.

raphaelvallat#225 
Implemented CI from 'Calculating confidence intervals for some non-parametric analyses', Campbell and Gardner 1988. CI Style is adapted from ttest. The same publication offers a solution for wilcoxon, which is not yet implemented but could be added fairly easily.
@raphaelvallat raphaelvallat self-requested a review January 22, 2022 01:55
@raphaelvallat raphaelvallat added the feature request 🚧 New feature or request label Jan 22, 2022
@codecov
Copy link

codecov bot commented Jan 22, 2022

Codecov Report

Merging #226 (f31b2e5) into master (b1c334d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #226   +/-   ##
=======================================
  Coverage   98.99%   99.00%           
=======================================
  Files          19       19           
  Lines        3290     3304   +14     
  Branches      527      531    +4     
=======================================
+ Hits         3257     3271   +14     
  Misses         17       17           
  Partials       16       16           
Impacted Files Coverage Δ
pingouin/nonparametric.py 94.11% <100.00%> (+0.52%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1c334d...f31b2e5. Read the comment docs.

conf = confidence
N = scipy.stats.norm.ppf(conf)
ct1, ct2 = len(x),len(y) # count samples
diffs = sorted([i-j for i in x for j in y]) # get ct1xct2 difference
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kschuerholt could we use a numpy function / numpy broadcasting here to avoid the nested for loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's easy enough. I'll add it in a new commit promptly.

MWU 97.0 two-sided 0.00556 0.515 0.2425
>>> pg.mwu(x, y, alternative='two-sided',confidence=0.95)
U-val alternative p-val RBC CLES CI95%
MWU 97.0 two-sided 0.00556 0.515 0.2425 [-0.39290395101879694, -0.09400270319896187]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the actual output that you get? The CI should normally be rounded to two decimals by the _postprocess_dataframe function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the actual output I get. I was wondering about that, too. But then again, the t-test also gives me full floats (at least when confidence!=0.95), so I thought that was intentional.
I can of course round it in MWU or do you want to adress that elsewhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example of the t-test showing that behavior.
grafik

@@ -222,16 +225,20 @@ def mwu(x, y, alternative='two-sided', **kwargs):
Association and the American Statistical Association, 25(2),
101–132. https://doi.org/10.2307/1165329

.. [5] Campbell, M. J. & Gardner, M. J. (1988). Calculating confidence
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add in the "Notes" section a one line explanation of the CI method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll give that a go. Like I said, I'm not a statistician, so that'll have to be proof-read by someone

N = scipy.stats.norm.ppf(conf)
ct1, ct2 = len(x),len(y) # count samples
diffs = sorted([i-j for i in x for j in y]) # get ct1xct2 difference
k = int(round(ct1*ct2/2 - (N * (ct1*ct2*(ct1+ct2+1)/12)**0.5)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure that the code follows the flake8 guideline, i.e. there must be a white space between arithmetic operators

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that.. was editing the file on the fly in github directly, no auto linting/formatting there yet unfortunatley. Next commit will be formatted accordingly.

@raphaelvallat raphaelvallat mentioned this pull request Feb 20, 2022
18 tasks
@raphaelvallat
Copy link
Owner

Hi @kschuerholt,

FYI I have just released a minor release of Pingouin (https://github.com/raphaelvallat/pingouin/releases/tag/v0.5.1) to fix some urgent dependencies bugs. Could you please make sure to update the PR to the new master and solve any conflicts that may arise?

Thank you,
Raphael

@kschuerholt
Copy link
Author

Hi @raphaelvallat

Thanks for the heads-up. It's still on the todo list, but currently other things have to come first. I'm trying to get hold of an original source for CI computation of nonparametric tests. Or did you find something?

Cheers,
Konstantin

@raphaelvallat raphaelvallat mentioned this pull request Jun 18, 2022
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request 🚧 New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return Confidence Interval for nonparametric Mann Whitney U Test
2 participants