Fit discrete distributions to a given data set

For a given set of univariate integer data fit a number of discrete distributions and give information about the fits. The current distributions used are:

discrete uniform (not really a fit then)
beta binomial
zipfian

as implemented by scipy

Requirements

scipy 1.9.0

Usage

> python fit_discrete.py test_data.txt

Successfully fitted the discrete uniform distribution:
  the fit parameters are: FitParams(low=1.0, high=4.0, loc=0.0)
  the negative log likelihood is: 6.591673732008659 

Successfully fitted the beta binomial distribution:
  the fit parameters are: FitParams(n=2.0, a=0.9999990554067835, b=1.9999982152999867, loc=1.0)
  the negative log likelihood is: 6.068425588244196 

Successfully fitted the zipfian distribution:
  the fit parameters are: FitParams(a=2.2195448757323435, loc=0.0)
  the negative log likelihood is: 7.86098064513523

Notes

The goodness of fit is determined here using the maximum likehood approach. The fit itself is done by scipy.stats.fit which optimises for the parameters that maximise the likelyhood estimate. Note, scipy.stats.fit is a new feature introduced in scipy 1.9.0 that allows seamless fitting both discrete and continuous distributions.

The goodness of fit metric shown here is the negative log of the Probability Mass Function (PMF).

Bounds

Fitting requires some rough information about parameters bounds or a guess value to start from. Here, I hardcoded very rough bounds that could very well fail for a variety of data edge cases. These bounds were tested on datasets sampled from uniform distributions.

Future work

The script could easily be extended to fit any other scipy discrete distribution. If additional distributions are required one would have to manually write the distribution function, and then manually find the parameters that maximise the PMF.

Ideally, the choice of distributions would not be hardcoded and the user could choose based on looking at their data what distributions to try.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fit_discrete.py		fit_discrete.py
fits.png		fits.png
test_data.txt		test_data.txt
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fit discrete distributions to a given data set

Requirements

Usage

Notes

Bounds

Future work

About

Releases

Packages

Languages

License

elena-pascal/fit_discrete

Folders and files

Latest commit

History

Repository files navigation

Fit discrete distributions to a given data set

Requirements

Usage

Notes

Bounds

Future work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages