Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ian/mastra slim #2972

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

Conversation

ianmacartney
Copy link

Aiming to make @libsql/client an opt-in dependency that isn't bundled by default, without losing the "default" storage or vector stores which are useful for getting started.

This involves two efforts:

  1. Ensure libsql is never imported except when used.
  2. Provide an in-memory substitution for storage and vector DB (memory).

The strategy for backwards compatibility is to use the dev env vars as a signal as to whether to use the files with libsql or do the in-memory version. Serverless deploys won't have the environment variables present (unless they set them intentionally).

Backwards compatibility story:

  • For in-memory storage, this shouldn't have an affect users didn't already have when restarting the server
  • For on-disc memory, we are checking for the old & new file locations and giving preference to those over in-memory.
  • Those using mastra dev will still default into an on-disk experience with a new env variable to mirror the storage url

Fixes #2968

Copy link

codesandbox bot commented Mar 15, 2025

Review or Edit in CodeSandbox

Open the branch in Web EditorVS CodeInsiders

Open Preview

Copy link

changeset-bot bot commented Mar 15, 2025

⚠️ No Changeset found

Latest commit: 09db6de

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

vercel bot commented Mar 15, 2025

@ianmacartney is attempting to deploy a commit to the Mastra Team on Vercel.

A member of the Team first needs to authorize it.

Copy link

promptless bot commented Mar 15, 2025

📝 Documentation updates detected! You can review documentation updates here

@abhiaiyer91 abhiaiyer91 requested a review from wardpeet March 15, 2025 18:28
Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this Pull Request. This is not an easy feat to grasp, but you did a great job!

The things I would really want to keep are the following parts:

  • Betters async/await in proxy storage, instead of my setup function
  • vector proxy implementation

I've commented on the issue what my strategy is to get what we all want.

} else {
this.storage = new Promise((resolve, reject) => {
try {
import(['./', 'libsql'].join('')) // avoid automatic bundling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, this can quickly become a foot gun, where we ignore optimizing these files. If we ever introduce a side effect, our tooling will not capture it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, the challenge I ran into is that esbuild will automatically analyze ./libsql and run into this error: Cannot find module '@libsql/linux-arm64-gnu'. I'm not sure the best way to do this. I was considering pushing the try/catch import down into ./libsql/index.ts for @libsql/client entirely, but the exported class depends on the base class.. any other ideas?

The problem ultimately is that if we do the regular string import, @libsql/client can't be an optional dependency and might not get stripped during bundling (though I'm not an expert at this)

`Found deprecated Memory vector db file ${oldDb} this db is now merged with the default ${newDb} file. Delete the old one to use the new one. You will need to migrate any data if that's important to you. For now the deprecated path will be used but in a future breaking change we will only use the new db file path.`,
);
const hasNewDb = existsSync(join(process.cwd(), newDb)) || existsSync(join(process.cwd(), '.mastra', newDb));
const connectionUrl =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, we should put these env pieces in mastra

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't imported or used from Mastra, and didn't previously exist - it seems to only be used from the @mastra/memory class. So I can put this environment variable there.

metadata: Record<string, unknown>;
};

export class InMemoryVector extends MastraVector {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we not use the same idea for this as we do for regular storage and use libsql?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left more context in the package.json. tl;dr is the goal is to not depend on @libsql/client unless it's necessary

@@ -130,6 +129,14 @@
"xstate": "^5.19.2",
"zod": "^3.24.2"
},
"peerDependencies": {
"@libsql/client": "^0.14.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why we need to do this. Adding these lines doesn't fail the install step when it cannot be installed, but it will still install it on almost every platform. I'm not sure why convex is different.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@libsql/client is a big dependency (30MB) that bloats the bundle size, which is particularly bad for serverless environments (it ends up exceeding the Convex bundle size limit). This package is really big and doesn't need to be installed unless you're using the Storage or Vector implementation that uses libsql (by comparison, pg is .5MB). No storage provider currently uses @libsql/client except the ones that interact with file-based databases.

Instead of making it optional overall, this PR could instead focus on:

  1. Making @libsql/client not the default when you're running in an environment where the env vars aren't set (not mastra dev, e.g. in a Convex serverless function).
  2. Ensure there's a way to deploy Mastra to Convex-like environments where it doesn't include @libsql/client. Two options for that:
    a. Server install: having it configurable to not install it server-side, as it pulls it in by default currently.
    b. Local bundling: Figuring out how to get @libsql/client to analyze correctly on my mac and let it tree-shake it out if it isn't used at import time.

(1) is solved with in-memory defaults outside of mastra dev
(2) is an open question, if we don't do the import obfuscation

@ianmacartney
Copy link
Author

The repo has the in-memory implementations (pending tests if it's the right direction), but still lacks the ability to build locally for a cloud deploy (due to @libsql/client depending on a package not available locally), or installing remotely (due to bundle size).

The hacks by which I've gotten it to work:

  1. dynamic import of ["@libsql", "client"].join("/") from within core/src/libsql/index.ts, which prevents bundling @libsql/client but is dangerous for when you do want to bundle it.
  2. overriding my demo project's package.json (that imports @mastra/core) to:
  "overrides": {
    "@libsql/client": "file:libsql-client-0.14.0-noop.tgz"
  },

where I made a "noop" implementation of @libsql/client with the correct types at the correct version.

Another route here would be to make browser-compatible targets for all entrypoints. That's a big undertaking, as it'd require replacing all dependencies relying on a node.js environment with browser-compatible equivalents, like subbing out pino in browsers. This would allow bundling for Convex locally with the browser target and deploying to the default v8 runtime

@ianmacartney
Copy link
Author

The test failures all seemed to be related to setting environment variables, which I'm assuming I'd need to set in my own GitHub environment. What's the best way to run them in your github org?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Bundle too big for serverless environments due to demo-only default storage and vectorDB
2 participants