Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LineCollection plot #7173

Open
wants to merge 128 commits into
base: main
Choose a base branch
from
Open

Add LineCollection plot #7173

wants to merge 128 commits into from

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Oct 16, 2022

This adds a line plotter based on LineCollections, called .lines.

I wanted to replace darray.plot() with using LineCollection instead. But unfortunately due to how many cases are supported (and tested in xarray) darray.plot() will continue using plt.plot.

Examples:

https://www.reddit.com/r/dataisbeautiful/comments/1d4ahr1/oc_north_atlantic_sea_surface_temperature_anomaly/

Got this working with xarray:

# https://www.reddit.com/r/dataisbeautiful/comments/1d4ahr1/oc_north_atlantic_sea_surface_temperature_anomaly/#lightbox
# https://github.com/afinemax/climate_change_bot/blob/main/bot.py


import requests

import numpy as np
import matplotlib.pyplot as plt
import xarray as xr


def download_json(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for unsuccessful responses (4xx, 5xx)
        json_data = response.json()
        return json_data
    except requests.exceptions.RequestException as e:
        print(f"Error downloading JSON: {e}")
        return None


url = r"https://climatereanalyzer.org/clim/sst_daily/json/oisst2.1_natlan1_sst_day.json"
sst_daily_data = download_json(url)


data = []
years = []
for i in range(len(sst_daily_data) - 3):
    years.append((sst_daily_data[i]["name"]))
    data.append(sst_daily_data[i]["data"])

# convert data from list into nd arr
data = np.asarray(data, dtype=float)  # first index is year, 2nd index is day

# Datetimes not working for some unrelated reason:
# ds = xr.Dataset(
#     data_vars=dict(temperature=(("year", "day"), data)),
#     coords=dict(
#         year=np.array(dict_names, dtype="datetime64[D]"),
#         day=np.arange(data.shape[-1], dtype="timedelta64[D]"),
#     ),
# )
ds = xr.Dataset(
    data_vars=dict(
        temperature=(
            ("year", "day"),
            data,
            dict(
                long_name="Daily North Atlantic (0-60N) Sea Surface Temperature (SST)",
                units="degC",
            ),
        )
    ),
    coords=dict(
        year=np.array(years, dtype=int),
        day=np.arange(data.shape[-1], dtype=int),
    ),
)

plt.figure()
ds.plot.lines(x="day", y="temperature", hue="year")

image

xref:
#4820
#5622

@Illviljan
Copy link
Contributor Author

Illviljan commented Oct 17, 2022

Scatter vs. Lines:

image

ds = xr.tutorial.scatter_example_dataset(seed=42)
hue_ = "y"
x_ = "y"
size_="y"
z_ = "z"

fig = plt.figure()
ax = fig.add_subplot(1, 2, 1, projection='3d')
ds.A.sel(w="one").plot.lines(x=x_, z=z_, hue=hue_, linewidth=size_, ax=ax)
ax = fig.add_subplot(1, 2, 2, projection='3d')
ds.A.sel(w="one").plot.scatter(x=x_, z=z_, hue=hue_, markersize=size_, ax=ax)

@Illviljan Illviljan marked this pull request as ready for review May 24, 2024 18:16
Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my taste it uses a bit too much private matplotlib API, hopefully that doesn't make things complicated in the long run. Maybe you can ask over at matplotlib to make some of these things public API?

With so many special cases it is always difficult to figure out if everything was thought of. My checks did not reveal any critical problems, so I guess we leave it like this and wait for bug reports.

Thanks!

xarray/plot/utils.py Outdated Show resolved Hide resolved
xarray/plot/utils.py Outdated Show resolved Hide resolved
@Illviljan
Copy link
Contributor Author

For my taste it uses a bit too much private matplotlib API, hopefully that doesn't make things complicated in the long run. Maybe you can ask over at matplotlib to make some of these things public API?

I agree, but it should work fine to reference plt.scatter if anything breaks. The code follows scatter very closely besides for 3d, having to make mypy happy and the black code style.

In my dream world matplotlib would add a similar function. plt.scatter is preferred over PathCollection and the same arguments can be used for adding a plt.lines over using LineCollection.

@Illviljan
Copy link
Contributor Author

Illviljan commented May 30, 2024

Here's an interesting plot we should be easily be able to do with Lines:
image
https://www.reddit.com/r/dataisbeautiful/comments/1d4ahr1/oc_north_atlantic_sea_surface_temperature_anomaly/

Would probably require fixing #8831 to get that nice looking datetime colorbar.

Haven't found the data yet, noting it for the future.

@headtr1ck
Copy link
Collaborator

Add an entry in whats-new and this is good to go! Thanks!

@headtr1ck headtr1ck added the plan to merge Final call for comments label Jun 18, 2024
@dcherian
Copy link
Contributor

dcherian commented Jun 18, 2024

This looks nice. I have two requests:

  1. Can we add some info to the docstring about the difference between lines and line and how to choose between the two?
  2. Can we please consider calling it line_collection instead? The one character difference between two very similar but slightly different functions is not friendly to users.

On second thought, there is indeed a lot of private API here. Are you sure this is a good idea? It feels like major technical debt. Perhaps this is best suited to an external MPL extension?

@Illviljan
Copy link
Contributor Author

Can we add some info to the docstring about the difference between lines and line and how to choose between the two?

Done, I think?

Can we please consider calling it line_collection instead? The one character difference between two very similar but slightly different functions is not friendly to users.

Sure I don't mind a different name, but line_collection is also misleading, mpl's LineCollection and PathCollection is as basic as it gets and there's a lot of functionality xarray must have (datetime support for example) that they don't support. Therefore I was thinking to follow mpl's direction with a more distinct name, as scatter/_line do quite a bit more than just call the Collection:

PathCollection -> plt.scatter -> xr.plot.scatter
LineCollection -> _line -> xr.plot.?

On second thought, there is indeed a lot of private API here. Are you sure this is a good idea? It feels like major technical debt. Perhaps this is best suited to an external MPL extension?

_line is as close to a copy/paste of plt.scatter as it can be right now. Sure maybe I can rewrite it in a more public way, but then it'll be harder to use plt.scatter as a reference, which probably will be needed whenever someone finds a bug.

I think 3d plotting is the sort of thing that xarray should be good at, with scatter and lines a lot of the base needs should be covered.

@headtr1ck
Copy link
Collaborator

I'm fine with leaving it like this because currently there is really no nice way of rewriting it in a more "public" way.

@dcherian you can also discuss this in the bi-weekly if you feel like it.

@Illviljan
Copy link
Contributor Author

Illviljan commented Jul 9, 2024

https://www.reddit.com/r/dataisbeautiful/comments/1d4ahr1/oc_north_atlantic_sea_surface_temperature_anomaly/

Got this working with xarray:

# https://www.reddit.com/r/dataisbeautiful/comments/1d4ahr1/oc_north_atlantic_sea_surface_temperature_anomaly/#lightbox
# https://github.com/afinemax/climate_change_bot/blob/main/bot.py


import requests

import numpy as np
import matplotlib.pyplot as plt
import xarray as xr


def download_json(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for unsuccessful responses (4xx, 5xx)
        json_data = response.json()
        return json_data
    except requests.exceptions.RequestException as e:
        print(f"Error downloading JSON: {e}")
        return None


url = r"https://climatereanalyzer.org/clim/sst_daily/json/oisst2.1_natlan1_sst_day.json"
sst_daily_data = download_json(url)


data = []
years = []
for i in range(len(sst_daily_data) - 3):
    years.append((sst_daily_data[i]["name"]))
    data.append(sst_daily_data[i]["data"])

# convert data from list into nd arr
data = np.asarray(data, dtype=float)  # first index is year, 2nd index is day

# ds = xr.Dataset(
#     data_vars=dict(temperature=(("year", "day"), data)),
#     coords=dict(
#         year=np.array(dict_names, dtype="datetime64[D]"),
#         day=np.arange(data.shape[-1], dtype="timedelta64[D]"),
#     ),
# )
ds = xr.Dataset(
    data_vars=dict(
        temperature=(
            ("year", "day"),
            data,
            dict(
                long_name="Daily North Atlantic (0-60N) Sea Surface Temperature (SST)",
                units="degC",
            ),
        )
    ),
    coords=dict(
        year=np.array(years, dtype=int),
        day=np.arange(data.shape[-1], dtype=int),
    ),
)

plt.figure()
ds.plot.lines(x="day", y="temperature", hue="year")

image

I couldn't get it working with datetimes though, but is unrelated to this pr.

@headtr1ck
Copy link
Collaborator

Some want to merge this finally?
Or is there some discussion still required?

@Illviljan
Copy link
Contributor Author

Felt the need for this one again. I'm going to self-merge next week unless someone has any further thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants