-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix scatter plots and regression lines in Moran plots #187
Conversation
That's the same in GeoDa. And given this is a regression model on real world data, it is not even surprising. |
well, since most moran plots do hit the origin, it is kinda surprising :) |
plus, like, conceptually, I think of that line as originating at 0,0 in a moran plot... We intentionally center the scatter on z=0,0 because the quadrants are meaningful. It's the slope of the line that's conceptually meaningful, not the intercept in this plot. it almost feels reasonable to me that we'd shift the intercept to be 0 in the event that it's not there already |
I'll let this to decide to someone more knowledgable :). Now I'm satisfied that we match GeoDa. And I think that spdep is following the same model, but I'm too lazy to test that so guessing from the code. |
tldr: the reason the intercept is not zero is because the y-axis is lag(z(y)), not z(lag(y)), so zero on the y-axis is not the mean of the distribution on the y-axis--it's the mean of the distribution on the x-axis. So when we classify an observation into a hotspot quadrant (even along the vertical axis for Wy), we're using mean(y) to define high/low, (not mean(lag(y))). I.e. a "large lag value" means above the mean of y, not above the mean of lag(y).Geoda and the original papers are clear about this, though I don't think its always obvious from the plots so the intercept will move around based on whether lag(z(y)) is above or below z(y). I'm square on what's happening, though I still think you could argue for z(lag(y)) on the y-axis :) |
Fixes #178
The regressions were all just wrong. I have fixed that and manually verified that all of them are in line with what GeoDa is producing.
There was also wrong description of axes in the notebook which I believe is a result of a confusion of using
x
andy
as variable names and passing those to bivariate Moran asy=x, x=y
. So I changed those to explicitly mention the variable name.Fixing here because it may take a bit before we'll be able to steer people towards plotting within esda and we should not produce erroneous outputs.
I suggest doing a patch release with this and #185 as soon as this is merged.