Use Cloud Run to host AI agents. AI agents can be implemented as Cloud Run services and perform tasks and provide information to users in a conversational manner. Cloud Run provides automatic scaling and high scalability without provisioning resources, while only billing for actual usage. AI agents can be used for a variety of purposes, such as customer service, virtual assistants, and content generation.
You can use a Cloud Run service as a scalable API endpoint to process prompts from end users. Your service runs an AI orchestration framework, such as LangChain, LangGraph, or Genkit, which orchestrates calls to:
- AI models such as Gemini API, Vertex AI endpoints, or another GPU-enabled Cloud Run service.
- Vector databases like Cloud SQL for PostgreSQL or
AlloyDB for PostgreSQL with the
pgvector
extension. - Other services or APIs.
You can stream the agent response back to the client using WebSockets.
For a more detailed architecture, see Infrastructure for a RAG-capable generative AI application using Vertex AI and AlloyDB for PostgreSQL.
Learn how to deploy Genkit to Cloud Run in the Genkit documentation.
Learn how to build and deploy a LangChain app to Cloud Run by working through a codelab or watching "Building generative AI apps on Google Cloud with LangChain".