Skip to content

Commit

Permalink
ENH: support Arrow PyCapsule Interface on Series for export (#59587)
Browse files Browse the repository at this point in the history
* ENH: support Arrow PyCapsule Interface on Series for export

* simplify

* simplify
  • Loading branch information
MarcoGorelli authored Aug 26, 2024
1 parent d31aa83 commit bb4ab4f
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Other enhancements
- Users can globally disable any ``PerformanceWarning`` by setting the option ``mode.performance_warnings`` to ``False`` (:issue:`56920`)
- :meth:`Styler.format_index_names` can now be used to format the index and column names (:issue:`48936` and :issue:`47489`)
- :class:`.errors.DtypeWarning` improved to include column names when mixed data types are detected (:issue:`58174`)
- :class:`Series` now supports the Arrow PyCapsule Interface for export (:issue:`59518`)
- :func:`DataFrame.to_excel` argument ``merge_cells`` now accepts a value of ``"columns"`` to only merge :class:`MultiIndex` column header header cells (:issue:`35384`)
- :meth:`DataFrame.corrwith` now accepts ``min_periods`` as optional arguments, as in :meth:`DataFrame.corr` and :meth:`Series.corr` (:issue:`9490`)
- :meth:`DataFrame.cummin`, :meth:`DataFrame.cummax`, :meth:`DataFrame.cumprod` and :meth:`DataFrame.cumsum` methods now have a ``numeric_only`` parameter (:issue:`53072`)
Expand Down
27 changes: 27 additions & 0 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
from pandas._libs.lib import is_range_indexer
from pandas.compat import PYPY
from pandas.compat._constants import REF_COUNT
from pandas.compat._optional import import_optional_dependency
from pandas.compat.numpy import function as nv
from pandas.errors import (
ChainedAssignmentError,
Expand Down Expand Up @@ -558,6 +559,32 @@ def _init_dict(

# ----------------------------------------------------------------------

def __arrow_c_stream__(self, requested_schema=None):
"""
Export the pandas Series as an Arrow C stream PyCapsule.
This relies on pyarrow to convert the pandas Series to the Arrow
format (and follows the default behaviour of ``pyarrow.Array.from_pandas``
in its handling of the index, i.e. to ignore it).
This conversion is not necessarily zero-copy.
Parameters
----------
requested_schema : PyCapsule, default None
The schema to which the dataframe should be casted, passed as a
PyCapsule containing a C ArrowSchema representation of the
requested schema.
Returns
-------
PyCapsule
"""
pa = import_optional_dependency("pyarrow", min_version="16.0.0")
ca = pa.chunked_array([pa.Array.from_pandas(self, type=requested_schema)])
return ca.__arrow_c_stream__(requested_schema)

# ----------------------------------------------------------------------

@property
def _constructor(self) -> type[Series]:
return Series
Expand Down
23 changes: 23 additions & 0 deletions pandas/tests/series/test_arrow_interface.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import ctypes

import pytest

import pandas as pd

pa = pytest.importorskip("pyarrow", minversion="16.0")


def test_series_arrow_interface():
s = pd.Series([1, 4, 2])

capsule = s.__arrow_c_stream__()
assert (
ctypes.pythonapi.PyCapsule_IsValid(
ctypes.py_object(capsule), b"arrow_array_stream"
)
== 1
)

ca = pa.chunked_array(s)
expected = pa.chunked_array([[1, 4, 2]])
assert ca.equals(expected)

0 comments on commit bb4ab4f

Please sign in to comment.