-
Notifications
You must be signed in to change notification settings - Fork 16.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redis scores seems to be sorted from lowest to highest (with cosine) #8177
Comments
Answer generated by a 🤖 AnswerI understand that you're experiencing an issue with the Redis vector store in LangChain, where the scores seem to be sorted from lowest to highest when using cosine similarity. You're expecting the scores to be sorted from highest to lowest instead. In LangChain, the Redis vector store sorts the results from lowest to highest because the cosine similarity, which is used as the scoring metric, ranges from -1 to 1. A score of 1 means the vectors are identical, and -1 means they are diametrically opposed. However, Redis interprets these scores as distances, so a lower score means the vectors are closer together, hence more similar. The sorting is done in the If you want the results to be sorted from highest to lowest, you can modify the def _prepare_query(self, k: int) -> Query:
...
return (
Query(base_query)
.return_fields(*return_fields)
.sort_by("vector_score", "DESC")
.paging(0, k)
.dialect(2)
) Please note that this change will make the results return from least similar to most similar, which is the opposite of the current behavior. This is one potential solution to your issue. However, please be aware that as an AI, I can only see text and cannot see images or links. If you believe this is a bug that could impact other users, you might want to consider making a pull request with your proposed changes. If this solution doesn't align with your experience, I would recommend upgrading to the latest version of LangChain to see if the issue persists. SourcesThis response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. |
damn this bot is awesome! |
I've tried to make the score vs distance appearent in the new API for Redis. distance should be used for vector distance and score should be used for vector similarity. Is this still causing you issues or can this be closed @FrancescoSaverioZuppichini ? |
not sure I understood - I am not using langchain anymore tbh but out of curiosity, is it a distance or real cosine similarity? |
Hi, @FrancescoSaverioZuppichini I'm helping the LangChain team manage our backlog and am marking this issue as stale. From what I understand, you reported an issue with Redis scores appearing to be sorted from lowest to highest when using cosine similarity, and I provided an explanation of how Redis interprets and sorts the scores, suggesting a potential solution to modify the sorting method. Spartee also mentioned an update to the API for Redis, asking if the issue can be closed. You expressed uncertainty about the update and whether the scores represent distance or real cosine similarity, indicating that you are no longer using LangChain. Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you! |
you got it my friend, let's close it - it is not cosine similarity but a distance metric which is very confusing btw |
Thank you @FrancescoSaverioZuppichini for closing the issue! Your contribution is much appreciated. |
System Info
python = 3.10
langchain = 0.0.222
Who can help?
No response
Information
Related Components
Reproduction
Not sure how to give you all the steps but here attached a screenshot of the debugger run at line
vectorstores.redis
338 and as you can see it goes from the lowest to the highest. You can see thevector_score
on the rightExpected behavior
I think it should return from highest to lowest
The text was updated successfully, but these errors were encountered: