Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: notebooks errors #565

Merged
merged 3 commits into from
Feb 19, 2025
Merged

fix: notebooks errors #565

merged 3 commits into from
Feb 19, 2025

Conversation

borisarzentar
Copy link
Contributor

@borisarzentar borisarzentar commented Feb 19, 2025

Description

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin

Summary by CodeRabbit

  • New Features

    • Automatically creates a blank graph when a file isn’t found, ensuring smoother operations.
    • Updated demonstration notebooks with dynamic configurations, including refined search operations and input prompts.
    • Introduced optional support for additional graph functionalities via an integrated dependency.
  • Refactor

    • Streamlined processing by eliminating duplicate steps and simplifying graph rendering workflows.
  • Chores

    • Updated environment configurations and upgraded the Python runtime for improved performance and consistency.

Copy link
Contributor

coderabbitai bot commented Feb 19, 2025

Walkthrough

The changes introduce an asynchronous method in the NetworkX adapter for initializing and saving an empty graph, refactoring error handling in load_graph_from_file. Notebook demos are updated to modify environment variable settings, adjust task configurations with a new detailed_extraction flag, simplify function calls, and replace answer evaluation logic with an asynchronous search operation. Additionally, one notebook now reads database configuration from defined constants and updates its Python version. Finally, a new optional dependency, graphiti-core, is added to the dependency management.

Changes

Files Change Summary
cognee/infrastructure/databases/graph/networkx/adapter.py Added async def create_empty_graph(file_path: str) to initialize and save a new MultiDiGraph; refactored load_graph_from_file to call it and removed redundant code.
notebooks/cognee_code_graph_demo.ipynb Modified environment variable inputs (added OPENAI_API_KEY, adjusted GRAPHISTRY_PASSWORD), removed some dependency graph tasks, introduced detailed_extraction flag, updated task configurations, simplified the render_graph call, and added a new asynchronous search operation.
notebooks/cognee_graphiti_demo.ipynb Removed hardcoded database configuration in favor of constants, reset execution counts and cleared outputs, and updated the Python version from 3.9.6 to 3.11.8.
pyproject.toml Added an optional dependency graphiti-core = {version = "^0.7.0", optional = true} and a new extra group graphiti including this dependency.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant A as NetworkXAdapter
    participant FS as File System

    C->>A: load_graph_from_file(file_path)
    alt File exists
        A->>FS: Read graph file
        FS-->>A: Return file content
    else File missing
        A->>A: create_empty_graph(file_path)
        A->>FS: Check or create directory
        A->>A: Instantiate MultiDiGraph
        A->>FS: Write graph to file
    end
    A-->>C: Return graph
Loading
sequenceDiagram
    participant N as Notebook
    participant S as SearchService

    N->>S: search(query_type=CODE, query_text="def create_graphrag_config")
    S-->>N: Return search results
    N->>N: Print results
Loading

Suggested reviewers

  • lxobr

Poem

In a burrow of code so bright,
I’ve hopped on changes day and night.
Graphs now spring with async flow,
Notebooks sing and gracefully grow.
I’m a rabbit of code, happy and free,
Celebrating revisions with a joyful “whee!”
🐰 Hop on, let’s review merrily!

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@borisarzentar borisarzentar changed the title Fix/notebooks fix: notebooks errors Feb 19, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
notebooks/cognee_code_graph_demo.ipynb (3)

12-13: Add input validation for credentials.

Consider adding validation for the input values to ensure they are not empty or malformed before setting them as environment variables.

-os.environ["GRAPHISTRY_PASSWORD"] = input("Please enter your graphistry password")
-os.environ["OPENAI_API_KEY"] = input("Please enter your OpenAI API key")
+graphistry_password = input("Please enter your graphistry password").strip()
+openai_key = input("Please enter your OpenAI API key").strip()
+if not graphistry_password or not openai_key:
+    raise ValueError("Credentials cannot be empty")
+os.environ["GRAPHISTRY_PASSWORD"] = graphistry_password
+os.environ["OPENAI_API_KEY"] = openai_key

79-83: Document the purpose and impact of detailed_extraction.

The detailed_extraction flag significantly affects task behavior and batch sizes. Please add documentation explaining:

  • The purpose of this flag
  • The rationale behind different batch sizes (100 vs 500)
  • The impact on performance and resource usage
+# detailed_extraction: When True, performs a more thorough analysis with smaller batch sizes
+# to ensure accuracy. When False, uses larger batches for faster processing.
 detailed_extraction = True

 tasks = [
     Task(get_repo_file_dependencies, detailed_extraction=detailed_extraction),
     Task(add_data_points, task_config={"batch_size": 100 if detailed_extraction else 500}),
 ]

126-130: Enhance search functionality with error handling and result formatting.

The current implementation could be improved by:

  1. Adding error handling for the search operation
  2. Formatting the results for better readability
  3. Making the search query configurable
 from cognee import search, SearchType

-results = await search(query_type=SearchType.CODE, query_text="def create_graphrag_config")
-
-print(results)
+async def perform_code_search(query: str):
+    try:
+        results = await search(query_type=SearchType.CODE, query_text=query)
+        if not results:
+            print("No results found")
+            return
+        print("\nSearch Results:")
+        print("-" * 50)
+        for idx, result in enumerate(results, 1):
+            print(f"\nResult {idx}:")
+            print(result)
+    except Exception as e:
+        print(f"Error during search: {e}")
+
+# Make the search query configurable
+query = "def create_graphrag_config"  # Could be taken as input
+await perform_code_search(query)
cognee/infrastructure/databases/graph/networkx/adapter.py (2)

246-254: Add docstring and type hints to create_empty_graph.

The new method would benefit from documentation and proper type hints.

-    async def create_empty_graph(self, file_path: str) -> None:
+    async def create_empty_graph(self, file_path: str) -> None:
+        """Create and save an empty MultiDiGraph.
+        
+        Args:
+            file_path (str): Path where the empty graph will be saved
+            
+        Raises:
+            OSError: If directory creation fails
+            IOError: If saving the graph fails
+        """
         self.graph = nx.MultiDiGraph()
 
         file_dir = os.path.dirname(file_path)
         if not os.path.exists(file_dir):
             os.makedirs(file_dir, exist_ok=True)
 
         await self.save_graph_to_file(file_path)

336-339: Improve error handling specificity.

The current error handling catches all exceptions. Consider catching specific exceptions for better error handling.

-        except Exception:
+        except (json.JSONDecodeError, IOError) as e:
             logger.error("Failed to load graph from file: %s", file_path)
+            logger.debug("Error details: %s", str(e))
 
             await self.create_empty_graph(file_path)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 811e932 and 692d41a.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • cognee/infrastructure/databases/graph/networkx/adapter.py (2 hunks)
  • notebooks/cognee_code_graph_demo.ipynb (4 hunks)
  • notebooks/cognee_graphiti_demo.ipynb (6 hunks)
  • pyproject.toml (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (25)
  • GitHub Check: test
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on macos-15
  • GitHub Check: Test on macos-13
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: Test on macos-13
  • GitHub Check: run_eval_framework_test / test
  • GitHub Check: run_multimedia_example_test / test
  • GitHub Check: test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: run_simple_example_test / test
  • GitHub Check: Test on ubuntu-22.04
  • GitHub Check: run_networkx_metrics_test / test
  • GitHub Check: test
  • GitHub Check: run_dynamic_steps_example_test / test
  • GitHub Check: test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: test
  • GitHub Check: windows-latest
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: run_notebook_test / test
  • GitHub Check: docker-compose-test
🔇 Additional comments (4)
notebooks/cognee_code_graph_demo.ipynb (1)

110-110:

❓ Verification inconclusive

Verify default behavior of render_graph.

The parameters include_nodes=True and include_labels=True were removed. Please verify that these are the default values in the render_graph implementation to ensure the behavior remains unchanged.


🏁 Script executed:

#!/bin/bash
# Description: Check the default parameter values in render_graph implementation
ast-grep --pattern $'def render_graph($$$) {
  $$$
}'

Length of output: 54


Action Required: Verify Default Behavior of render_graph

The recent changes removed the explicit parameters (include_nodes=True and include_labels=True) from the call, so we need to ensure that the render_graph implementation still defaults to these values. The initial verification command didn’t yield any useful results. Please run the commands below to extract the function signature and search for the default parameter settings:

#!/bin/bash
echo "Extracting render_graph function signature..."
rg -o "def render_graph\s*\([^)]*\)" -n .

echo "Searching for default 'include_nodes=True' setting..."
rg -n "include_nodes\s*=\s*True" .

echo "Searching for default 'include_labels=True' setting..."
rg -n "include_labels\s*=\s*True" .

Review the output to confirm that the defaults for include_nodes and include_labels are indeed set to True. If they aren’t, please update the implementation accordingly.

notebooks/cognee_graphiti_demo.ipynb (1)

226-226:

❓ Verification inconclusive

Verify Python version compatibility.

The Python version has been updated from 3.9.6 to 3.11.8. Please verify that this version is compatible with all project dependencies and requirements.


🌐 Web query:

What are the breaking changes between Python 3.9 and 3.11?

💡 Result:

Python 3.11 introduces several breaking changes compared to Python 3.9 that developers should consider when upgrading:

Enum Behavior Changes

  • String formatting with mixed Enums now returns member names instead of values in f-strings/format() when using str/int mixins:
    class Foo(str, Enum): BAR = "bar"
    
    # Python ≤3.10: "bar"
    # Python ≥3.11: "Foo.BAR" 
    Fix: Use StrEnum/IntEnum instead of manual mixins[2]

System Path Handling

  • Bytes paths prohibited: sys.path no longer accepts byte strings for filesystem paths[3][5]

Deprecated Features Removed

1️⃣ Distutils components:

  • Removed bdist_msi command (deprecated since Python 3.9)[5]

2️⃣ Inspect module:

  • Removed deprecated functions:
    • getargspec() → Use signature()/getfullargspec()
    • formatargspec() → Use Signature objects[5]

Windows Specific Changes

  • Import behavior: Default _PTH file removed in Windows distributions to fix relative imports[6]

Other Notable Changes

  • Exception formatting: Tracebacks now show exact error locations using ^ markers[1][4]
  • Math functions: Combines improved precision for large arguments with stricter input validation[5]
  • String formatting: New 'z' specifier normalizes signed zeros (f"{num:z}")[3]

Upgrade recommendation: Test codebases thoroughly for Enum usage patterns and deprecated feature dependencies before migrating[2][5].

Citations:


Python 3.11.8 Upgrade – Verify Dependency Compatibility

The project’s Python version has been updated from 3.9.6 to 3.11.8. Note that Python 3.11 brings several breaking changes compared to 3.9, including:

  • Enum Behavior: Mixed-base Enums now format differently—member names may be used instead of values.
  • Deprecated Features Removed: Functions like getargspec() and commands related to distutils (e.g., bdist_msi) have been removed.
  • System Path Handling: The use of byte strings in sys.path is prohibited.

Please review and test all project dependencies to ensure they are compatible with these changes. In particular, verify that any code or libraries relying on the old Enum behaviors, deprecated functions, or non-standard sys.path usage are updated.

pyproject.toml (2)

113-113: LGTM!

The graphiti extra group is properly defined and follows the same pattern as other extra groups in the file.


89-89:

✅ Verification successful

Verify graphiti-core version compatibility.

Please verify that version 0.7.0 of graphiti-core is the latest stable version and is compatible with the project's requirements.


🌐 Web query:

What is the latest stable version of graphiti-core?

💡 Result:

The latest stable version of graphiti-core is v0.7.0, released approximately five days ago[2][9]. This release introduced support for custom entity types using Pydantic BaseModels during data ingestion[2].

For context:

  • The graphiti-core Python library focuses on building temporally-aware knowledge graphs[9].
  • Previous versions include v0.6.x (released last week) and v0.5.x (released in early February 2025)[2][4].

Note: While PyPI listings show v0.5.3 as the latest indexed version[4], the project's GitHub repository confirms active development with newer releases[2][9]. Always verify versions directly from official sources when possible[2][9].

Citations:


Graphiti-core version compatibility confirmed.

After verifying, version 0.7.0 is indeed the latest stable release according to the GitHub repository. The dependency declaration with ^0.7.0 in pyproject.toml correctly captures this version and allows for compatible minor updates. No further changes are required.

@Vasilije1990 Vasilije1990 self-requested a review February 19, 2025 22:06
@Vasilije1990 Vasilije1990 merged commit 45f7c63 into dev Feb 19, 2025
36 checks passed
@Vasilije1990 Vasilije1990 deleted the fix/notebooks branch February 19, 2025 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants