QDB-10908 - Add Arrow query API support to Python API #108

vikonix · 2025-11-24T09:03:23Z

No description provided.

setup.py

solatis · 2025-12-10T13:13:43Z

tests/conftest.py

 ):

    index = pd.Index(
+        # pd.date_range(start_date, periods=row_count, freq="s"), name="$timestamp"


What's this?

tests/test_numpy.py: 634 warnings D:\work\quasar\qdb-api-python\tests\conftest.py:685: FutureWarning: 'S' is deprecated and will be removed in a future version, please use 's' instead. pd.date_range(start_date, periods=row_count, freq="S"), name="$timestamp"

solatis · 2025-12-10T13:13:56Z

tests/conftest.py

    return request.param


+# @pytest.fixture(params=["s"], ids=["frequency=s"])


What's this?

solatis

I think this is good as a demonstration of the concept, but it lacks proper integration with the existing abstractions and, even more importantly, doesn't handle object/pointer/ownership correctly between Python and C++.

I think things can/should be simplified before merging this into master.

The intention here is to be the "standard" API? Did we run a benchmark to compare the performance of this insertion/reading method with the old one?

Is this intended only for the bulk reader, or will the query API be adapted as well?

I'm happy to pick things up from here!

solatis · 2026-02-09T03:26:39Z

quasardb/detail/writer.hpp

    std::vector<any_column> _columns;

    std::vector<qdb_exp_batch_push_column_t> _columns_data;
+    std::vector<const char *> _duplicate_ptrs;


This appears very fragile; what is the "ownership" of the _duplicate_ptrs, who "owns" the const char * and when does it go out of scope?

What I do to circumvent this is:

Copy all data into staged_table, whose lifetime is coupled with the batch writer session (i.e. it outlives the python function invocation)

Then use pointers from the copied object

solatis · 2026-02-09T03:27:00Z

quasardb/detail/writer.hpp

        std::vector<std::string> const & columns,
-        qdb_exp_batch_push_table_t & out)
+        qdb_exp_batch_push_table_t & out,
+        std::vector<const char *> & duplicate_ptrs)


Why have 2 different output vectors? This seems like a fragile design.

In the original code you create unique_ptr object
auto where_duplicate = std::make_unique<char const *[]>(columns.size());

and then give ownership of C++ object to C pointer
out.where_duplicate = where_duplicate.release();

const char ** where_duplicate;

is it expected by design?
Is this memory leak planned?

It makes the ownership transfer a bit more explicit (as the original unique ptr is inaccessible after the release call), but I agree it's not pretty. I'll make it prettier.

solatis · 2026-02-09T03:29:06Z

quasardb/arrow_batch_push.cpp

What's the purpose of introducing an entirely new file? Wouldn't we want this to be in writer.cpp / writer.hpp ?

solatis · 2026-02-09T03:30:41Z

quasardb/arrow_batch_push.cpp

+void exp_batch_push_arrow_with_options(handle_ptr handle,
+    const std::string & table_name,
+    const pybind11::object & reader,
+    pybind11::kwargs args)
+{


Is this a qdb-api-python-specific function, or something from our C API? Because it doesn't integrate well / match the design of the existing writer.hpp / writer.cpp, and seems much more "C-style" ?

It seems like most of this code is duplicated from elsewhere?

solatis · 2026-02-09T03:35:03Z

dev-requirements-32.txt

Why did you introduce a whole new file for this? This is only for development, not for actual dependencies / releasing?

I think we dropped win32 support, does that make things easier?

I dont understand this idea to accumulate table by individual columns on C level. It needs when we work with the qdb table object.

Batch writers operate with whole table. And we can prepare this table on the python level.

The problem concerns memory ownership, native numpy arrays, and differences in representation.

Let's say I have this code:

writer = quasar.Writer() for row in range(rows): df = process_row(row) writer.add(df) writer.push()

The lifetime of df (and all numpy arrays underneath it) are only scoped to the for() loop.

Additionally, timestamps and string data are represented in entirely different ways between our C API and Python.

Additionally, Python offers no guarantees at all about memory stability between function invocations: all memory can be moved around in between multiple invocations.

The solution employed in the APIs is:

Upon invocation, copy data into a native representation which we "own" and is stable;

We couple the lifetime of these data structures with the batch writer;

In practice, this means it's a member object of our batch writer object.

When invoking functions of our C API, we avoid copies and provide a pointer to our data structures instead.

It is possible that with PyArrow, since the representation of the data between Python and our C API is the same, but then still we must tell the Python GC that our batch writer objects references the PyArrow objects, otherwise we'll get into ownership issues if e.g. Python decides to garbage collect. I.e. this is tricky and needs to be considered carefully. You can read more about this over here: https://pybind11.readthedocs.io/en/stable/advanced/functions.html, especially the section "additional call policies".

…thon-api-2

vikonix added 30 commits November 20, 2025 18:14

fix warnings

85088f1

revert s 1

e155e3b

variant 2

fdd991f

add Arrow 1

fa6a636

arrow 2

42b6cc8

cosmetic

78b6f2a

off arrow

4797f5e

add depend

6d0a8d9

fix name

df04b1e

fix capsule

c96db3e

cut off "$timestamp"

7d8143e

formatting

0cad78c

maybe fix

d5d5ae2

fix stub

c557e37

?fix 3

a7722d5

? fix 4

6a7ca8f

?fix 5

95d5de5

fix version

e036c6e

fix fix

67b877b

test

0f14b8a

fix reader tests

ab98f8b

add arrow batch

638a9a4

register arrow writer

9c08e4f

validation

dd9426d

fix ownership

7482652

cosmetic

a42339b

batch push arrow test

94d7a7a

validation

0f5769c

more tests

868a78d

validation

966e140

vikonix added 5 commits December 10, 2025 13:47

revert 1

32d9b24

revert 2

7749d35

cosmetic

ab7dff2

revert 3

00e68ad

cosmetic

a8d0e5f

solatis reviewed Dec 10, 2025

View reviewed changes

vikonix added 12 commits December 11, 2025 18:21

panda tests

b249687

fix tests

2c4fde7

validation

d1e5536

validation 2

9b4912f

validation 3

1ec2bd8

metrics

4ee0770

fix 1

493363c

debugging

e5adffc

fix test

cde8d0b

revert

af983e0

fix test

5a5ab22

fix dependencies

9083933

vikonix requested a review from solatis December 16, 2025 19:04

vikonix added 8 commits December 16, 2025 20:32

v1

6a679d9

v2

e6e1fd0

v3

0ed4c18

v32

9779674

fix

4eb78d5

v4

d5e0951

v5

5bd1c25

cosmetic

bee83e8

solatis requested changes Feb 9, 2026

View reviewed changes

vikonix added 2 commits February 10, 2026 17:58

add more code protections

4154ccb

Merge branch 'master' into sc-10908/add-arrow-query-api-support-to-py…

970b7de

…thon-api-2

		return request.param


		# @pytest.fixture(params=["s"], ids=["frequency=s"])

QDB-10908 - Add Arrow query API support to Python API #108

Are you sure you want to change the base?

QDB-10908 - Add Arrow query API support to Python API #108

Uh oh!

Conversation

vikonix commented Nov 24, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

solatis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vikonix Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vikonix Feb 9, 2026 •

edited

Loading