public inbox for git-commits@fedoraproject.org
help / color / mirror / Atom feed
From: Benjamin A. Beasley <code@musicinmybrain.net>
To: git-commits@fedoraproject.org
Subject: [rpms/python-hdmf] rawhide: Backport support for Pandas 3; fixes RHBZ#2481162
Date: Thu, 25 Jun 2026 10:30:18 GMT [thread overview]
Message-ID: <178238341899.1.5316945448875546916.rpms-python-hdmf-8b53fc51bb70@fedoraproject.org> (raw)
A new commit has been pushed.
Repo : rpms/python-hdmf
Branch : rawhide
Commit : 8b53fc51bb7013d8173218a6a45efbc2fdcf07a5
Author : Benjamin A. Beasley <code@musicinmybrain.net>
Date : 2026-06-25T11:04:57+01:00
Stats : +261/-1 in 2 file(s)
URL : https://src.fedoraproject.org/rpms/python-hdmf/c/8b53fc51bb7013d8173218a6a45efbc2fdcf07a5?branch=rawhide
Log:
Backport support for Pandas 3; fixes RHBZ#2481162
---
diff --git a/0001-Accept-pandas-Series-ExtensionArray-for-Data-lift-pa.patch b/0001-Accept-pandas-Series-ExtensionArray-for-Data-lift-pa.patch
new file mode 100644
index 0000000..0555d77
--- /dev/null
+++ b/0001-Accept-pandas-Series-ExtensionArray-for-Data-lift-pa.patch
@@ -0,0 +1,254 @@
+From 51efa8c9f0e56721c29cc64aacd1ee8c74e92876 Mon Sep 17 00:00:00 2001
+From: Ryan Ly <310197+rly@users.noreply.github.com>
+Date: Wed, 24 Jun 2026 19:13:28 -0700
+Subject: [PATCH] Accept pandas Series/ExtensionArray for Data; lift pandas<3
+ cap (#1469)
+
+Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
+Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
+---
+ docs/source/conf.py | 1 +
+ pyproject.toml | 2 +-
+ src/hdmf/container.py | 4 +-
+ src/hdmf/utils.py | 45 +++++++++++++-
+ tests/unit/utils_test/test_utils.py | 94 ++++++++++++++++++++++++++++-
+ 5 files changed, 142 insertions(+), 4 deletions(-)
+
+diff --git a/docs/source/conf.py b/docs/source/conf.py
+index 0ed71852..3b2b18fa 100644
+--- a/docs/source/conf.py
++++ b/docs/source/conf.py
+@@ -90,6 +90,7 @@ nitpick_ignore = [('py:class', 'Intracomm'),
+ ('py:class', 'h5py._hl.dataset.Dataset'),
+ ('py:class', 'function'),
+ ('py:class', 'unittest.case.TestCase'),
++ ('py:class', 'pandas.ExtensionArray'),
+ ]
+
+ suppress_warnings = ["config.cache"]
+diff --git a/pyproject.toml b/pyproject.toml
+index 13b7aaf9..15b89cf2 100644
+--- a/pyproject.toml
++++ b/pyproject.toml
+@@ -35,7 +35,7 @@ dependencies = [
+ "h5py>=3.1.0",
+ "jsonschema>=3.2.0",
+ 'numpy>=1.19.3',
+- "pandas>=1.2.0,<3",
++ "pandas>=1.2.0",
+ "ruamel.yaml>=0.16",
+ ]
+ dynamic = ["version"]
+diff --git a/src/hdmf/container.py b/src/hdmf/container.py
+index 7e6334f5..800668d1 100644
+--- a/src/hdmf/container.py
++++ b/src/hdmf/container.py
+@@ -12,7 +12,7 @@ import pandas as pd
+
+ from .data_utils import DataIO, append_data, extend_data, AbstractDataChunkIterator
+ from .utils import (docval, get_docval, getargs, ExtenderMeta, get_data_shape, popargs, LabelledDict,
+- get_basic_array_info, generate_array_html_repr)
++ get_basic_array_info, generate_array_html_repr, coerce_pandas_data)
+
+ from .term_set import TermSet, TermSetWrapper
+
+@@ -927,6 +927,7 @@ class Data(AbstractContainer):
+ data = popargs('data', kwargs)
+ super().__init__(**kwargs)
+
++ data = coerce_pandas_data(data)
+ self._validate_new_data(data)
+ self.__data = data
+
+@@ -1020,6 +1021,7 @@ class Data(AbstractContainer):
+
+ :param arg: The iterable to add to the end of this VectorData
+ """
++ arg = coerce_pandas_data(arg)
+ self._validate_new_data(arg)
+ self.__data = extend_data(self.__data, arg)
+
+diff --git a/src/hdmf/utils.py b/src/hdmf/utils.py
+index c7fe2b47..62969595 100644
+--- a/src/hdmf/utils.py
++++ b/src/hdmf/utils.py
+@@ -8,10 +8,12 @@ from enum import Enum
+
+ import h5py
+ import numpy as np
++import pandas as pd
++from pandas.api.extensions import ExtensionArray as _PandasExtensionArray
+
+
+ __macros = {
+- 'array_data': [np.ndarray, list, tuple, h5py.Dataset],
++ 'array_data': [np.ndarray, list, tuple, h5py.Dataset, pd.Series, _PandasExtensionArray],
+ 'scalar_data': [str, int, float, bytes, bool],
+ 'data': []
+ }
+@@ -26,6 +28,47 @@ except ImportError:
+ def is_zarr_array(value):
+ return ZARR_INSTALLED and isinstance(value, ZarrArray)
+
++
++def coerce_pandas_data(data):
++ """Convert a pandas Series or ExtensionArray to a numpy array for HDMF storage.
++
++ HDMF stores dataset values as numpy arrays (or array-likes such as h5py.Dataset).
++ Pandas Series and ExtensionArray inputs are normalized at the construction
++ boundary so that downstream code only has to handle numpy/list/tuple data.
++
++ Raises:
++ TypeError: if the input contains missing values (pd.NA / np.nan), which
++ cannot be serialized to HDF5 variable-length string datasets and which
++ HDMF does not support for other dtypes.
++ """
++ if isinstance(data, pd.Series):
++ underlying = data.array
++ elif isinstance(data, _PandasExtensionArray):
++ underlying = data
++ else:
++ return data
++
++ if pd.isna(underlying).any():
++ raise TypeError(
++ "Cannot construct an HDMF dataset from pandas data containing missing "
++ "values (pd.NA or NaN). HDF5 cannot serialize missing values in "
++ "variable-length string datasets, and HDMF does not yet support "
++ "missing values for other dtypes. Replace missing values with a "
++ "sentinel (e.g., empty string) before passing the data to HDMF."
++ )
++
++ # pandas nullable masked dtypes (e.g. Int64, boolean, Float64) expose the
++ # backing numpy dtype. Convert through it so the result keeps that dtype on
++ # all supported pandas versions; a plain to_numpy()/np.asarray() returns an
++ # object array on pandas < 2.2.
++ numpy_dtype = getattr(underlying.dtype, "numpy_dtype", None)
++ if numpy_dtype is not None:
++ return underlying.to_numpy(dtype=numpy_dtype)
++
++ if isinstance(data, pd.Series):
++ return data.to_numpy()
++ return np.asarray(data)
++
+ if ZARR_INSTALLED:
+ # optionally accept zarr.Array as array data to support conversion of data from Zarr to HDMF
+ __macros['array_data'].append(ZarrArray)
+diff --git a/tests/unit/utils_test/test_utils.py b/tests/unit/utils_test/test_utils.py
+index 3b8fb101..96b704c7 100644
+--- a/tests/unit/utils_test/test_utils.py
++++ b/tests/unit/utils_test/test_utils.py
+@@ -2,10 +2,11 @@ import os
+
+ import h5py
+ import numpy as np
++import pandas as pd
+ from hdmf.container import Data
+ from hdmf.data_utils import DataChunkIterator, DataIO
+ from hdmf.testing import TestCase
+-from hdmf.utils import get_data_shape, to_uint_array, is_newer_version
++from hdmf.utils import get_data_shape, to_uint_array, is_newer_version, coerce_pandas_data
+
+
+ class TestGetDataShape(TestCase):
+@@ -221,6 +222,97 @@ class TestToUintArray(TestCase):
+ with self.assertRaisesWith(ValueError, 'Cannot convert array of dtype float64 to uint.'):
+ to_uint_array(arr)
+
++class TestCoercePandasData(TestCase):
++ """Tests for coerce_pandas_data, which normalizes pandas Series/ExtensionArray to numpy."""
++
++ def test_passthrough_non_pandas(self):
++ arr = np.array([1, 2, 3])
++ self.assertIs(coerce_pandas_data(arr), arr)
++ lst = [1, 2, 3]
++ self.assertIs(coerce_pandas_data(lst), lst)
++
++ def test_string_array(self):
++ sa = pd.array(['a', 'b', 'c'], dtype='string')
++ out = coerce_pandas_data(sa)
++ self.assertIsInstance(out, np.ndarray)
++ self.assertEqual(list(out), ['a', 'b', 'c'])
++
++ def test_arrow_string_array(self):
++ try:
++ asa = pd.array(['a', 'b', 'c'], dtype='string[pyarrow]')
++ except ImportError:
++ self.skipTest('pyarrow not installed')
++ out = coerce_pandas_data(asa)
++ self.assertIsInstance(out, np.ndarray)
++ self.assertEqual(list(out), ['a', 'b', 'c'])
++
++ def test_series_string(self):
++ s = pd.Series(['a', 'b', 'c'], dtype='string')
++ out = coerce_pandas_data(s)
++ self.assertIsInstance(out, np.ndarray)
++ self.assertEqual(list(out), ['a', 'b', 'c'])
++
++ def test_series_numeric_lossless(self):
++ s = pd.Series([1, 2, 3])
++ out = coerce_pandas_data(s)
++ self.assertIsInstance(out, np.ndarray)
++ self.assertEqual(out.dtype, np.int64)
++ np.testing.assert_array_equal(out, [1, 2, 3])
++
++ def test_categorical(self):
++ cat = pd.Categorical(['x', 'y', 'x'])
++ out = coerce_pandas_data(cat)
++ self.assertIsInstance(out, np.ndarray)
++ self.assertEqual(list(out), ['x', 'y', 'x'])
++
++ def test_string_array_with_na_raises(self):
++ sa = pd.array(['a', None, 'c'], dtype='string')
++ with self.assertRaisesRegex(TypeError, 'missing values'):
++ coerce_pandas_data(sa)
++
++ def test_series_object_with_nan_raises(self):
++ s = pd.Series(['a', np.nan, 'c'])
++ with self.assertRaisesRegex(TypeError, 'missing values'):
++ coerce_pandas_data(s)
++
++ def test_integer_array_lossless(self):
++ ia = pd.array([1, 2, 3], dtype='Int64')
++ out = coerce_pandas_data(ia)
++ self.assertIsInstance(out, np.ndarray)
++ self.assertEqual(out.dtype, np.int64)
++ np.testing.assert_array_equal(out, [1, 2, 3])
++
++ def test_boolean_array_lossless(self):
++ ba = pd.array([True, False, True], dtype='boolean')
++ out = coerce_pandas_data(ba)
++ self.assertIsInstance(out, np.ndarray)
++ self.assertEqual(out.dtype, np.bool_)
++ np.testing.assert_array_equal(out, [True, False, True])
++
++ def test_integer_array_with_na_raises(self):
++ ia = pd.array([1, None, 3], dtype='Int64')
++ with self.assertRaisesRegex(TypeError, 'missing values'):
++ coerce_pandas_data(ia)
++
++
++class TestDataAcceptsPandas(TestCase):
++ """Verify pandas Series/ExtensionArray flow through Data construction."""
++
++ def test_vector_data_from_arrow_string_values(self):
++ from hdmf.common import VectorData
++ df = pd.DataFrame({'animal': ['cat', 'dog', 'bird']})
++ vd = VectorData(name='animal', description='', data=df['animal'].values)
++ self.assertIsInstance(vd.data, np.ndarray)
++ self.assertEqual(list(vd.data), ['cat', 'dog', 'bird'])
++
++ def test_vector_data_from_series(self):
++ from hdmf.common import VectorData
++ s = pd.Series(['a', 'b', 'c'])
++ vd = VectorData(name='s', description='', data=s)
++ self.assertIsInstance(vd.data, np.ndarray)
++ self.assertEqual(list(vd.data), ['a', 'b', 'c'])
++
++
+ class TestVersionComparison(TestCase):
+ """Test the version comparison functionality in NamespaceCatalog."""
+
+--
+2.54.0
+
diff --git a/python-hdmf.spec b/python-hdmf.spec
index 2fc2e0e..b55d465 100644
--- a/python-hdmf.spec
+++ b/python-hdmf.spec
@@ -48,6 +48,10 @@ URL: %forgeurl
Source0: %forgesource
# Man page hand-written for Fedora in groff_man(7) format based on help output
Source1: validate_hdmf_spec.1
+# Accept pandas Series/ExtensionArray for Data; lift pandas<3 cap (#1469)
+# https://github.com/hdmf-dev/hdmf/commit/744cf1971f92f34673c41b55376952f8ffe4707f
+# Backported to 4.3.1, without modifications to CHANGELOG.md
+Patch: 0001-Accept-pandas-Series-ExtensionArray-for-Data-lift-pa.patch
BuildArch: noarch
@@ -83,7 +87,9 @@ Obsoletes: python3-hdmf+zarr < 4.1.0-2
rm -vrf src/hdmf/common/hdmf-common-schema/
# Upstream pins numcodecs because “numcodecs 0.16.0 is not compatible with
# zarr<3,” but we cannot respect this.
-sed -r -i 's/("numcodecs)<[^"]+"/\1"/' pyproject.toml
+%pyproject_patch_dependency numcodecs:drop_upper
+# Allow pandas 3
+%pyproject_patch_dependency pandas:set_upper:4.0
%generate_buildrequires
%pyproject_buildrequires -x tqdm%{?with_zarr:,zarr},sparse%{?with_termset:,termset}
reply other threads:[~2026-06-25 10:30 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=178238341899.1.5316945448875546916.rpms-python-hdmf-8b53fc51bb70@fedoraproject.org \
--to=code@musicinmybrain.net \
--cc=git-commits@fedoraproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox