Performance bottleneck in FluxCsvParser when parsing large CSV payloads (10MB+) #691

vessaldaneshvar · 2025-04-13T17:26:14Z

Specifications

Client Version: 1.48.0
InfluxDB Version: 2.7
Platform: macos

Code sample to reproduce problem

import influxdb_client
client = influxdb_client.InfluxDBClient(
url="http://localhost:8086",
token="TOKEN",
org="organization",
)
query_api = client.query_api()
query = 'from(bucket: "sensors") |> range(start: 2025-04-13T14:18:11.036Z, stop: 2025-04-13T14:33:11.036Z)'
result = query_api.query(org="matna", query=query)

Expected behavior

runtime of this query must be same as ui influx

Actual behavior

runtime this code is not same order

Additional info

Hi InfluxDB team,

I've encountered a significant performance bottleneck in the FluxCsvParser class within the InfluxDB Python client when working with larger datasets.

🐞 Issue Description
When querying data (~10MB in size), the network call returns results in under 20 ms, which is excellent. However, the CSV parsing step takes over 5 seconds to complete. This introduces an unacceptable latency for high-throughput or low-latency use cases.

In contrast, using the Go client for the same query and dataset, the full query—including parsing—is completed in under 200 ms. This makes the Python client around 25x slower just in the parsing stage.

📈 Performance Benchmark
Data size: ~10MB (Flux CSV)

Query response time (network): < 20 ms

Parsing time (Python client): > 5000 ms

Parsing time (Go client): < 200 ms

🔍 Root Cause
Profiling indicates that the performance degradation is centered in the FluxCsvParser implementation. The current parsing logic in Python seems to be inefficient for large responses due to overhead in string parsing, tokenization, and possibly memory management.

💡 Suggested Improvement
To address this, I suggest reviewing the implementation of FluxCsvParser—specifically around how it handles tokenization, buffering, and line-by-line parsing. Additionally, performance could be dramatically improved by offloading the CSV parsing to a C extension (e.g., using cffi, cython, or ctypes) or integrating an existing optimized parser like libcsv or simdjson.

This would help close the gap with the Go client's performance while maintaining compatibility with the current interface.
✅ Request
Could the maintainers review the FluxCsvParser code path, especially in generate function?

Is there openness to rewriting this part as a performance-critical native extension, or at least modularizing it for optional native acceleration?

vessaldaneshvar added the bug Something isn't working label Apr 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance bottleneck in FluxCsvParser when parsing large CSV payloads (10MB+) #691

Performance bottleneck in FluxCsvParser when parsing large CSV payloads (10MB+) #691

vessaldaneshvar commented Apr 13, 2025

Performance bottleneck in FluxCsvParser when parsing large CSV payloads (10MB+) #691

Performance bottleneck in FluxCsvParser when parsing large CSV payloads (10MB+) #691

Comments

vessaldaneshvar commented Apr 13, 2025

Specifications

Code sample to reproduce problem

Expected behavior

Actual behavior

Additional info