-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: COG-1523 add top_k in run_question_answering #625
Conversation
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
9573981 | Triggered | Generic Password | 17e46f1 | helm/docker-compose-helm.yml | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
WalkthroughThe changes update several modules to support an optional Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant QA as run_question_answering
participant R as Retriever
U->>QA: Submit question (optionally with top_k)
QA->>R: Call retriever with system_prompt and top_k
R-->>QA: Return top_k document chunks
QA->>U: Provide answer based on retrieved context
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
cognee/eval_framework/answer_generation/run_question_answering_module.py
(3 hunks)cognee/modules/retrieval/completion_retriever.py
(1 hunks)cognee/modules/retrieval/graph_completion_retriever.py
(1 hunks)cognee/modules/retrieval/graph_summary_completion_retriever.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint (ubuntu-latest, 3.10.x)
- GitHub Check: docker-compose-test
🔇 Additional comments (7)
cognee/modules/retrieval/graph_completion_retriever.py (2)
18-18
: Parameter type modification looks goodThe change from
int
toOptional[int]
for thetop_k
parameter adds flexibility by allowing users to passNone
value.
23-23
: Proper fallback handling for None valuesGood implementation of the fallback logic to maintain the default behavior when
None
is passed.cognee/modules/retrieval/graph_summary_completion_retriever.py (1)
15-15
: Consistent parameter type updateThe change to
Optional[int]
aligns with the parent class implementation and maintains consistency across retrievers.cognee/modules/retrieval/completion_retriever.py (1)
16-16
: Parameter addition looks goodAdding the
top_k
parameter with a default value of 1 is consistent with the PR objectives and the proper fallback logic is implemented.Also applies to: 21-21
cognee/eval_framework/answer_generation/run_question_answering_module.py (3)
3-3
: Import added correctlyThe addition of
Optional
from thetyping
module is necessary for the new parameter type hint.
35-35
: Parameter addition aligns with PR objectivesThe addition of the
top_k
parameter with a default ofNone
is consistent with the PR objectives and allows flexibility for users.
51-53
: Passing top_k to retrieversThe
top_k
parameter is correctly passed to the retriever, which aligns with the PR objectives.
cognee/eval_framework/answer_generation/run_question_answering_module.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
cognee/modules/retrieval/completion_retriever.py (1)
31-42
: 💡 Verification agent🧩 Analysis chain
Consider updating
get_completion
method to handle multiple context items.The
get_completion
method doesn't seem to be updated to handle the case where multiple context items are returned fromget_context
whentop_k > 1
. This could lead to issues if the context type changes from a string to a list.You might need to update the
get_completion
method to handle the different context structure when multiple items are returned:
🏁 Script executed:
#!/bin/bash # Check if generate_completion can handle a list of contexts rg -A 5 -B 5 "generate_completion" --glob "*.py" | grep -B 3 -A 3 "context"Length of output: 4824
Action Required: Update Context Handling in
get_completion
The current implementation in
cognee/modules/retrieval/completion_retriever.py
always passes the value returned byget_context(query)
directly togenerate_completion
. However, whentop_k > 1
,get_context
may return multiple context items (i.e. a list) whilegenerate_completion
incognee/modules/retrieval/utils/completion.py
expects a string. This mismatch could lead to runtime issues if multiple context items are returned.
- Issue Location:
cognee/modules/retrieval/completion_retriever.py
(lines 31-42)- Recommendation:
- Detect if
context
is a list after callingawait self.get_context(query)
.- Aggregate or convert the list into a single string (e.g., using
" ".join(context)
) before passing it togenerate_completion
.- Alternatively, if separate completions are desired per context item, consider iterating over the list and handling the responses accordingly.
Please update the method to ensure compatibility with both single and multiple context items.
♻️ Duplicate comments (1)
cognee/modules/retrieval/completion_retriever.py (1)
26-29
:⚠️ Potential issueInconsistent handling of multiple results with top_k parameter.
The method now retrieves multiple chunks based on
self.top_k
, but still only returns the first chunk's text regardless of how many are retrieved. This doesn't fully implement the behavior expected from atop_k
parameter.Apply this fix to handle multiple chunks consistently:
async def get_context(self, query: str) -> Any: """Retrieves relevant document chunks as context.""" vector_engine = get_vector_engine() found_chunks = await vector_engine.search("DocumentChunk_text", query, limit=self.top_k) if len(found_chunks) == 0: raise NoRelevantDataFound - return found_chunks[0].payload["text"] + if self.top_k == 1: + return found_chunks[0].payload["text"] + else: + return [chunk.payload["text"] for chunk in found_chunks]
🧹 Nitpick comments (1)
cognee/modules/retrieval/completion_retriever.py (1)
21-21
: Consider removing redundant default value check.The
top_k
parameter already has a default value of 1 in the method signature. This assignment adds redundancy unless there's a specific reason to have this double-check.- self.top_k = top_k if top_k is not None else 1 + self.top_k = top_k
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/modules/retrieval/completion_retriever.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: Test on macos-15
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: run_networkx_metrics_test / test
- GitHub Check: Test on macos-15
- GitHub Check: run_notebook_test / test
- GitHub Check: Test on macos-15
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: Test on macos-13
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_eval_framework_test / test
- GitHub Check: Test on macos-13
- GitHub Check: test
- GitHub Check: Test on macos-13
- GitHub Check: windows-latest
- GitHub Check: run_notebook_test / test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: Test on ubuntu-22.04
- GitHub Check: test
- GitHub Check: test
- GitHub Check: run_simple_example_test
- GitHub Check: docker-compose-test
- GitHub Check: test
- GitHub Check: Build Cognee Backend Docker App Image
🔇 Additional comments (1)
cognee/modules/retrieval/completion_retriever.py (1)
16-16
: Good addition of thetop_k
parameter.Adding the optional
top_k
parameter to theCompletionRetriever
class aligns well with the PR objectives of enhancing the question answering functionality.
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin
Summary by CodeRabbit