Error when calculating variance of series of ndarrays #25542

MakGre · 2019-03-05T09:18:36Z

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
# np.__version__ is 1.14.5
# pd.__version__ is 0.24.1

# create list of dicts with ndarrays
ll = []
for ii in range(2):
    dd = {0: np.ones(2)}
    ll += [dd]

# create data frame
df = pd.DataFrame(ll)

# print(df) looks as expected
#             0
# 0  [1.0, 1.0]
# 1  [1.0, 1.0]

m = df[0].mean() # works as expected
# equivalent to df[0].values.mean(axis=0)
print(m) # array([1., 1.])

v = df[0].values.var(axis=0) # yields expected result array([0., 0.])

v = df[0].var() # raises TypeError: setting an array element with a sequence.

Problem description

A Pandas series od dtype object can contain numpy.ndarrays. This ist useful to store high-dimensional data in DataFrames.
Calculating the mean of such a series works as expected. Calculating the variance however, yields an error. The calculation is easily performed by inserting .values between the series and the var call, so it is no fundamental problem.

This is the error Traceback of df[0].var()

Traceback (most recent call last):

  File "<ipython-input-23-be47a51ab53b>", line 1, in <module>
    df[0].var()

  File "C:\Users\Maksim\WinPython\python-3.6.3.amd64\lib\site-packages\pandas\core\generic.py", line 10976, in stat_func
    skipna=skipna, ddof=ddof)

  File "C:\Users\Maksim\WinPython\python-3.6.3.amd64\lib\site-packages\pandas\core\series.py", line 3626, in _reduce
    return op(delegate, skipna=skipna, **kwds)

  File "C:\Users\Maksim\WinPython\python-3.6.3.amd64\lib\site-packages\pandas\core\nanops.py", line 76, in _f
    return f(*args, **kwargs)

  File "C:\Users\Maksim\WinPython\python-3.6.3.amd64\lib\site-packages\pandas\core\nanops.py", line 138, in f
    raise TypeError(e)

TypeError: setting an array element with a sequence.

Expected Output

I expect
df[0].var()
to yield the same as
df[0].values.var(axis=0)

Output of `pd.show_versions()`

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.24.1
pytest: 3.2.3
pip: 18.1
setuptools: 39.2.0
Cython: 0.27.2
numpy: 1.14.5
scipy: 1.1.0
pyarrow: 0.7.1
xarray: 0.9.6
IPython: 6.2.1
sphinx: 1.6.5
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: 1.5.1
bottleneck: 1.2.1
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.1.0
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.0.2
lxml.etree: 4.1.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.14
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-03-06T18:42:30Z

Operations like these are supposed to reduce to a scalar so I think it's purely happenstance that the mean works and not something we generally make guarantees about.

With that said the error message isn't very helpful. Investigation into what's going on and PRs to make this more useful would certainly be welcome

MakGre · 2019-03-11T17:28:49Z

Thank you for your reply.

So the usage of mean an var is not really supported for object type columns, I guess?

For me it is not much trouble to perform the operations on the values instead. I just wanted to let you guys know.

If this is by design, then the issue can be closed as far as I am concerned.

mroeschke · 2024-08-24T19:54:00Z

Thanks for the issue, but it appears this hasn't gotten traction in a while so closing

mroeschke added Error Reporting Incorrect or improved errors from pandas Numeric Operations Arithmetic, Comparison, and Logical operations labels Nov 2, 2019

jbrockmendel added the Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). label Sep 22, 2020

mroeschke added Enhancement Reduction Operations sum, mean, min, max, etc. and removed Numeric Operations Arithmetic, Comparison, and Logical operations labels Jun 27, 2021

mroeschke closed this as completed Aug 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when calculating variance of series of ndarrays #25542

Error when calculating variance of series of ndarrays #25542

MakGre commented Mar 5, 2019 •

edited

Loading

INSTALLED VERSIONS

WillAyd commented Mar 6, 2019

MakGre commented Mar 11, 2019

mroeschke commented Aug 24, 2024

Error when calculating variance of series of ndarrays #25542

Error when calculating variance of series of ndarrays #25542

Comments

MakGre commented Mar 5, 2019 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Mar 6, 2019

MakGre commented Mar 11, 2019

mroeschke commented Aug 24, 2024

MakGre commented Mar 5, 2019 •

edited

Loading

Output of `pd.show_versions()`