Data loading is very slow when retrieving OHLCV for 600 instruments #1910

trungtv · 2025-04-13T02:55:29Z

❓ Questions and Help

Hi Qlib team,

Thank you for the great work on this project. I'm currently using Qlib to build a forecasting pipeline and have encountered a serious performance issue when loading data.

Specifically, when retrieving OHLCV data for ~600 instruments (using default Alpha158 features), the data loading process takes around 170 seconds, which is significantly longer than expected.

Here is a log snippet from my run:

[32077:MainThread](2025-04-13 09:41:09,724) INFO - qlib.timer - [log.py:127] - Time cost: 168.134s | Loading data Done
[32077:MainThread](2025-04-13 09:41:09,777) INFO - qlib.timer - [log.py:127] - Time cost: 0.038s | DropnaProcessor Done
[32077:MainThread](2025-04-13 09:41:10,816) INFO - qlib.timer - [log.py:127] - Time cost: 1.038s | FilterByInstrumentLengthProcessor Done
[32077:MainThread](2025-04-13 09:41:10,830) INFO - qlib.timer - [log.py:127] - Time cost: 0.009s | DropnaLabel Done
[32077:MainThread](2025-04-13 09:41:10,842) INFO - qlib.timer - [log.py:127] - Time cost: 0.011s | DropnaLabel Done
[32077:MainThread](2025-04-13 09:41:11,905) INFO - qlib.timer - [log.py:127] - Time cost: 1.063s | FilterByInstrumentLengthProcessor Done
[32077:MainThread](2025-04-13 09:41:11,907) INFO - qlib.timer - [log.py:127] - Time cost: 2.182s | fit & process data Done
[32077:MainThread](2025-04-13 09:41:11,907) INFO - qlib.timer - [log.py:127] - Time cost: 170.318s | Init data Done
This makes experimentation and model development inefficient. I've tried checking disk performance and system load, and everything seems normal.

Could you please help clarify:

Is this expected behavior with the current version of Qlib?

Are there any recommended configurations (e.g., cache setup, parallel loading, data format) to reduce the data loading time?

Would switching to a different storage format (e.g., parquet or Arrow) help here?

Are there any best practices when using a large number of instruments?

Qlib version: 0.9.6

Python version: 3.9

OS: MacOS

Data: custom dataset

trungtv added the question Further information is requested label Apr 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data loading is very slow when retrieving OHLCV for 600 instruments #1910

Data loading is very slow when retrieving OHLCV for 600 instruments #1910

trungtv commented Apr 13, 2025

Data loading is very slow when retrieving OHLCV for 600 instruments #1910

Data loading is very slow when retrieving OHLCV for 600 instruments #1910

Comments

trungtv commented Apr 13, 2025

❓ Questions and Help