Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix scatter plots and regression lines in Moran plots #187

Merged
merged 3 commits into from
Sep 1, 2024

Conversation

martinfleis
Copy link
Member

Fixes #178

The regressions were all just wrong. I have fixed that and manually verified that all of them are in line with what GeoDa is producing.

There was also wrong description of axes in the notebook which I believe is a result of a confusion of using x and y as variable names and passing those to bivariate Moran as y=x, x=y. So I changed those to explicitly mention the variable name.

Fixing here because it may take a bit before we'll be able to steer people towards plotting within esda and we should not produce erroneous outputs.

I suggest doing a patch release with this and #185 as soon as this is merged.

@jGaboardi jGaboardi added the bug label Sep 1, 2024
@knaaptime
Copy link
Member

nice; good find. It still feels mildly strange that the plots occasionally miss the origin

download

but the implementation looks right now 🤷‍♂️

@martinfleis
Copy link
Member Author

That's the same in GeoDa. And given this is a regression model on real world data, it is not even surprising.

@knaaptime
Copy link
Member

well, since most moran plots do hit the origin, it is kinda surprising :)

@knaaptime knaaptime merged commit 4f77024 into pysal:main Sep 1, 2024
8 checks passed
@knaaptime
Copy link
Member

knaaptime commented Sep 1, 2024

plus, like, conceptually, I think of that line as originating at 0,0 in a moran plot... We intentionally center the scatter on z=0,0 because the quadrants are meaningful. It's the slope of the line that's conceptually meaningful, not the intercept in this plot.

it almost feels reasonable to me that we'd shift the intercept to be 0 in the event that it's not there already

@martinfleis martinfleis deleted the ols branch September 1, 2024 18:49
@martinfleis
Copy link
Member Author

I'll let this to decide to someone more knowledgable :). Now I'm satisfied that we match GeoDa. And I think that spdep is following the same model, but I'm too lazy to test that so guessing from the code.

@knaaptime
Copy link
Member

agree. still curious about this issue though. I wonder if @sjsrey or @ljwolf have thoughts? Is the 'natural' intercept of a moran plot ==0? It feels to me like it is

@knaaptime
Copy link
Member

tldr: the reason the intercept is not zero is because the y-axis is lag(z(y)), not z(lag(y)), so zero on the y-axis is not the mean of the distribution on the y-axis--it's the mean of the distribution on the x-axis. So when we classify an observation into a hotspot quadrant (even along the vertical axis for Wy), we're using mean(y) to define high/low, (not mean(lag(y))). I.e. a "large lag value" means above the mean of y, not above the mean of lag(y).Geoda and the original papers are clear about this, though I don't think its always obvious from the plots

so the intercept will move around based on whether lag(z(y)) is above or below z(y). I'm square on what's happening, though I still think you could argue for z(lag(y)) on the y-axis :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: plot_local_autocorrelation colors do not match between subplots
3 participants