You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
File "/llama_index/core/indices/keyword_table/base.py", line 90, in __init__
super().__init__(
File "/llama_index/core/indices/base.py", line 82, in __init__
self._storage_context.index_store.add_index_struct(self._index_struct)
File "/llama_index/core/storage/index_store/keyval_index_store.py", line 46, in add_index_struct
self._kvstore.put(key, data, collection=self._collection)
File "/llama_index/storage/kvstore/mongodb/base.py", line 136, in put
self.put_all([(key, val)], collection=collection)
File "/llama_index/storage/kvstore/mongodb/base.py", line 175, in put_all
self._db[collection].bulk_write(new_docs)
pymongo.errors.OperationFailure: BSONObj size: 18616930 (0x11C1262) is invalid. Size must be between 0 and 16793600(16MB) First element: q: { _id: "ffb66f08-ca96-4ab2-ad53-82444ee9295e" }, full error: {'ok': 0.0, 'errmsg': 'BSONObj size: 18616930 (0x11C1262) is invalid. Size must be between 0 and 16793600(16MB) First element: q: { _id: "ffb66f08-ca96-4ab2-ad53-82444ee9295e" }', 'code': 10334, 'codeName': 'BSONObjectTooLarge'}
Version
0.11.10
Steps to Reproduce
The error is because I generate a large amount of nodes (~5000) and use a Keyword indices to extract keywords using LLM. Later inside the keyword_table index, it will write the nodes (chunks) and keywords into MongoDB but llama_index did not handle this well and the total size the document is exceeding 16 MB. I am wondering did you consider this before?
Relevant Logs/Tracbacks
The text was updated successfully, but these errors were encountered:
The error you're encountering is due to MongoDB's BSON size limit of 16MB. When using LlamaIndex, there are no built-in mechanisms specifically mentioned for splitting or compressing data to avoid exceeding this limit [1]. However, you can manage large datasets by configuring parameters such as batch_size when using MongoDB as a storage backend [1].
To handle this issue, consider breaking down your data into smaller chunks before storing it in MongoDB. You can also adjust the batch_size parameter in the put_all method to ensure that data is processed in smaller batches, which might help in managing the size of each document being stored [2].
I didn't find any similar threads related to this specific issue, but you might want to check issue #11958 that you mentioned for any related discussions or solutions.
Hmm, I guess put_all() should probably be batching, i.e. in File "/llama_index/storage/kvstore/mongodb/base.py", line 175, in put_all -- not terribly hard to fix, I welcome a PR
Thanks Logan. I checked the put_all function and found there was already batch put implemented. But I am still wondering what would be the cause since I cannot imagine how can dictionary be larger than 16MB given that the full file I am using is smaller than 16MB...
Bug Description
pymongo.errors.OperationFailure: BSONObj size: 18616930 (0x11C1262) is invalid. Size must be between 0 and 16793600(16MB) First element: q: { _id: "ffb66f08-ca96-4ab2-ad53-82444ee9295e" }, full error: {'ok': 0.0, 'errmsg': 'BSONObj size: 18616930 (0x11C1262) is invalid. Size must be between 0 and 16793600(16MB) First element: q: { _id: "ffb66f08-ca96-4ab2-ad53-82444ee9295e" }', 'code': 10334, 'codeName': 'BSONObjectTooLarge'}
Version
0.11.10
Steps to Reproduce
The error is because I generate a large amount of nodes (~5000) and use a Keyword indices to extract keywords using LLM. Later inside the
keyword_table
index, it will write the nodes (chunks) and keywords into MongoDB but llama_index did not handle this well and the total size the document is exceeding 16 MB. I am wondering did you consider this before?Relevant Logs/Tracbacks
The text was updated successfully, but these errors were encountered: