You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've encountered a significant performance bottleneck in the FluxCsvParser class within the InfluxDB Python client when working with larger datasets.
🐞 Issue Description
When querying data (~10MB in size), the network call returns results in under 20 ms, which is excellent. However, the CSV parsing step takes over 5 seconds to complete. This introduces an unacceptable latency for high-throughput or low-latency use cases.
In contrast, using the Go client for the same query and dataset, the full query—including parsing—is completed in under 200 ms. This makes the Python client around 25x slower just in the parsing stage.
📈 Performance Benchmark
Data size: ~10MB (Flux CSV)
Query response time (network): < 20 ms
Parsing time (Python client): > 5000 ms
Parsing time (Go client): < 200 ms
🔍 Root Cause
Profiling indicates that the performance degradation is centered in the FluxCsvParser implementation. The current parsing logic in Python seems to be inefficient for large responses due to overhead in string parsing, tokenization, and possibly memory management.
💡 Suggested Improvement
To address this, I suggest reviewing the implementation of FluxCsvParser—specifically around how it handles tokenization, buffering, and line-by-line parsing. Additionally, performance could be dramatically improved by offloading the CSV parsing to a C extension (e.g., using cffi, cython, or ctypes) or integrating an existing optimized parser like libcsv or simdjson.
This would help close the gap with the Go client's performance while maintaining compatibility with the current interface.
✅ Request
Could the maintainers review the FluxCsvParser code path, especially in generate function?
Is there openness to rewriting this part as a performance-critical native extension, or at least modularizing it for optional native acceleration?
The text was updated successfully, but these errors were encountered:
Specifications
Code sample to reproduce problem
import influxdb_client
client = influxdb_client.InfluxDBClient(
url="http://localhost:8086",
token="TOKEN",
org="organization",
)
query_api = client.query_api()
query = 'from(bucket: "sensors") |> range(start: 2025-04-13T14:18:11.036Z, stop: 2025-04-13T14:33:11.036Z)'
result = query_api.query(org="matna", query=query)
Expected behavior
runtime of this query must be same as ui influx
Actual behavior
runtime this code is not same order
Additional info
Hi InfluxDB team,
I've encountered a significant performance bottleneck in the FluxCsvParser class within the InfluxDB Python client when working with larger datasets.
🐞 Issue Description
When querying data (~10MB in size), the network call returns results in under 20 ms, which is excellent. However, the CSV parsing step takes over 5 seconds to complete. This introduces an unacceptable latency for high-throughput or low-latency use cases.
In contrast, using the Go client for the same query and dataset, the full query—including parsing—is completed in under 200 ms. This makes the Python client around 25x slower just in the parsing stage.
📈 Performance Benchmark
Data size: ~10MB (Flux CSV)
Query response time (network): < 20 ms
Parsing time (Python client): > 5000 ms
Parsing time (Go client): < 200 ms
🔍 Root Cause
Profiling indicates that the performance degradation is centered in the FluxCsvParser implementation. The current parsing logic in Python seems to be inefficient for large responses due to overhead in string parsing, tokenization, and possibly memory management.
💡 Suggested Improvement
To address this, I suggest reviewing the implementation of FluxCsvParser—specifically around how it handles tokenization, buffering, and line-by-line parsing. Additionally, performance could be dramatically improved by offloading the CSV parsing to a C extension (e.g., using cffi, cython, or ctypes) or integrating an existing optimized parser like libcsv or simdjson.
This would help close the gap with the Go client's performance while maintaining compatibility with the current interface.
✅ Request
Could the maintainers review the FluxCsvParser code path, especially in generate function?
Is there openness to rewriting this part as a performance-critical native extension, or at least modularizing it for optional native acceleration?
The text was updated successfully, but these errors were encountered: