-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(wren-ai-service): fix text2sql #1230
Conversation
WalkthroughThis pull request introduces a new Changes
Sequence DiagramsequenceDiagram
participant Generator
participant CleaningFunction
participant Processor
Generator->>CleaningFunction: Generate raw result
CleaningFunction-->>CleaningFunction: Normalize whitespace
CleaningFunction-->>CleaningFunction: Remove special characters
CleaningFunction->>Processor: Return cleaned result
Processor->>Processor: Process cleaned result
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
wren-ai-service/src/core/engine.py (1)
30-43
: Add type hints and docstring for better maintainability.The function would benefit from:
- Type hints for the helper function
- A docstring explaining the purpose and behavior
- Edge case handling for None/empty strings
Consider this implementation:
def clean_generation_result(result: str) -> str: + """Clean and normalize SQL generation result. + + Args: + result: Raw generation result string + + Returns: + Cleaned and normalized string with consistent whitespace and removed markers + + Raises: + ValueError: If result is None or empty + """ + if not result: + raise ValueError("Generation result cannot be None or empty") + - def _normalize_whitespace(s: str) -> str: + def _normalize_whitespace(text: str) -> str: return re.sub(r"\s+", " ", s).strip() return ( _normalize_whitespace(result) .replace("\\n", " ") .replace("```sql", "") .replace("```json", "") .replace('"""', "") .replace("'''", "") .replace("```", "") .replace(";", "") )wren-ai-service/src/pipelines/generation/utils/sql.py (1)
33-33
: LGTM! Consider improving error message specificity.The integration of
clean_generation_result
is correct, but the error handling could be more specific.Consider this improvement:
- cleaned_generation_result = orjson.loads(clean_generation_result(replies[0])) + try: + cleaned_generation_result = orjson.loads(clean_generation_result(replies[0])) + except Exception as e: + logger.exception(f"Failed to parse generation result: {str(e)}") + return { + "results": { + "description": "Failed to process generation result", + "steps": [], + }, + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
wren-ai-service/src/core/engine.py
(1 hunks)wren-ai-service/src/pipelines/generation/sql_correction.py
(1 hunks)wren-ai-service/src/pipelines/generation/sql_expansion.py
(1 hunks)wren-ai-service/src/pipelines/generation/utils/sql.py
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: pytest
- GitHub Check: pytest
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (5)
wren-ai-service/src/core/engine.py (1)
42-42
: Verify if semicolon removal is safe for all SQL dialects.The function removes semicolons from SQL statements. While this might be acceptable for some SQL dialects, verify that this doesn't affect query execution across all supported databases.
✅ Verification successful
Semicolon removal is safe across all supported SQL dialects
The codebase only processes single SQL statements and all supported databases (BigQuery, DuckDB, Postgres) handle single statements without semicolons. Similar standardization is used consistently throughout the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check if semicolons are required in the supported SQL dialects # Look for SQL execution patterns and dialect configurations # Search for SQL execution patterns echo "Searching for SQL execution patterns..." rg -A 5 "execute_sql|run_sql|query" # Search for SQL dialect configurations echo "Searching for SQL dialect configurations..." rg -l "dialect|sql_flavor|database_type"Length of output: 68108
Script:
#!/bin/bash # Search for SQL execution implementation and dialect handling echo "Searching for SQL execution implementation..." rg -A 5 "execute_sql.*\{" echo "Searching for dialect-specific handling..." rg -A 5 "dialect.*=|database_type.*=|sql_flavor.*=" echo "Searching for multi-statement query patterns..." rg -A 3 "split.*sql|multiple.*queries|;.*;"Length of output: 79453
wren-ai-service/src/pipelines/generation/sql_correction.py (1)
29-38
: LGTM! Consider adding error handling for JSON parsing.The prompt changes clearly specify the JSON format requirement. However, ensure that the post-processor can handle malformed JSON responses gracefully.
Let's verify the error handling in the post-processor:
wren-ai-service/src/pipelines/generation/sql_expansion.py (1)
30-36
: LGTM! Consistent format specification across modules.The changes maintain consistency with the SQL correction module by specifying the same JSON format requirement.
wren-ai-service/src/pipelines/generation/utils/sql.py (2)
123-125
: LGTM! Consistent application of cleaning logic.The cleaning function is correctly applied to both dictionary and string responses.
Also applies to: 130-132
344-350
: LGTM! Format specification aligns with other modules.The JSON format requirement in the system prompt maintains consistency across all SQL generation modules.
Summary by CodeRabbit
Release Notes
New Features
Improvements
Technical Updates