Skip to Content
DocumentationDeploymentInfrastructure Overview

Infrastructure overview

This page provides an overview of the core architecture and infrastructure components that make up the Erato platform. Understanding these components is essential for successfully deploying and operating Erato in your environment.

Erato’s architecture is designed with extensibility at its core - particularly through the Model Context Protocol (MCP) servers - enabling you to integrate external data sources, APIs, and tools to meet your organization’s specific needs.

The modular design also has significant operational impact: the number of running services is kept intentionally small for simpler operation and maintenance, while still allowing you to scale individual components independently and adapt the deployment to your infrastructure requirements. Some parts (e.g. file processor) also have to option to involve external services as part of their processign steps in order to make them more extensible.

For detailed deployment instructions using Helm, see the Helm deployment guide.

Architecture Diagram (simplified)

The following diagram illustrates the simplified architecture of an Erato deployment:

Flow description:

  1. User accesses the application through their browser
  2. Authorization Proxy (oauth2-proxy) handles authentication and validates user sessions via OIDC
  3. Erato App serves the frontend and provides the backend API
  4. PostgreSQL Database stores all persistent data (conversations, documents, user settings, etc.)
  5. MCP Servers provide external integrations and tool capabilities (can be multiple instances for different services)

Extended Architecture (Example)

The following diagram shows a more concrete example of an Erato deployment with specific MCP servers and external integrations:

Key components in this example:

  1. Microsoft Entra ID provides identity and authentication services:

    • The authorization proxy uses it for OIDC authentication
    • The Erato application reads user and group information from the identity provider for access control and user management
  2. Built-in MCP Servers (Web Access, Web Search) are deployed as part of the Erato installation and provide web browsing capabilities:

    • Authentication: none/simple - These servers require no authentication or simple API keys, as they don’t access sensitive organizational data
  3. Custom MCP Servers extend functionality with organization-specific integrations:

    • Intranet MCP Server connects to internal company intranet services
    • Authentication: forwarded - User authentication credentials are forwarded from Erato to the MCP server, allowing it to access the intranet service on behalf of the authenticated user
  4. External MCP Servers are provided by external SaaS vendors:

    • JIRA MCP Server integrates with external SaaS JIRA instance for project management
    • Authentication: oauth2 - The MCP server uses OAuth2 to authenticate with the JIRA service, typically using service account credentials or user-delegated tokens
  5. External services (Intranet, JIRA) are accessed through their respective MCP servers, maintaining a clean separation of concerns and proper authentication boundaries

Message Lifecycle

This section explains how Erato processes chat messages from submission through LLM generation and final persistence.

High-Level Overview

The message lifecycle consists of five main phases:

Phase 1: Submission & Persistence

The request arrives via SSE endpoint and the user message is immediately persisted:

Key operations:

  • Chat is created if it doesn’t exist
  • User message is immediately saved to ensure it’s not lost
  • Frontend receives confirmation via SSE event

Phase 2: File Processing & Message Composition

Files are processed and the complete LLM input is assembled from multiple sources:

Key operations:

  • Files are processed in parallel with caching for performance
  • Message history is retrieved
  • Generation input is composed in specific order: system prompt → assistant prompt → history → files → current message

Phase 3: Turn Lifecycle Loop

The core generation loop handles multi-turn interactions with tool calls:

Key operations:

  • Each turn starts by executing any pending tool calls from the previous turn
  • Tool calls are executed via MCP servers with real-time SSE updates
  • LLM generation is streamed with text deltas sent to frontend
  • If LLM requests more tool calls, loop continues (up to 15 turns)
  • Langfuse tracks each turn separately with detailed metrics

Phase 4: Finalization

The complete assistant message is saved with all metadata:

Key operations:

  • Assistant message is constructed with all text content and tool uses
  • Complete message is saved with generation metadata (model, tokens, turns)
  • Langfuse trace is updated with final output
  • Frontend receives completion event

Langfuse Integration

When enabled, Langfuse provides observability throughout the lifecycle:

Trace Creation:

  • Trace ID created at start of request
  • Associated with user_id and session_id (chat_id)

Generation Spans:

  • One span per turn
  • Captures input messages, output, token counts, and latency
  • Includes model and provider information

Tool Call Tracking:

  • Each tool call recorded as metadata
  • Tool names aggregated for trace-level statistics

Final Metrics:

  • Cumulative token usage across all turns
  • Total turn count
  • Complete request duration

Stream Buffering & Reconnection

Erato implements a robust streaming infrastructure that allows clients to reconnect to ongoing message generations without losing any events. This is particularly important for handling network interruptions or browser tab refreshes.

Architecture Overview

How It Works

1. Streaming Buffer Creation

When a message submission starts, the backend creates a StreamingTask associated with the chat ID:

  • Broadcast Channel: Uses Tokio’s broadcast channel with 1,000 event capacity for live streaming
  • Event History: Stores up to 10,000 events in memory for replay capability
  • Task Lifecycle: Managed per-chat, allowing only one active generation per chat

2. Event Flow

Every event generated during message processing follows this flow:

  1. Event is sent to the StreamingTask
  2. Event is stored in the history buffer (if under 10,000 events)
  3. Event is broadcast to all currently subscribed clients via the broadcast channel
  4. If no clients are subscribed, the event is still stored in history

3. Reconnection Process

Clients can reconnect to an ongoing generation using the /me/messages/resumestream endpoint:

POST /api/v1beta/me/messages/resumestream { "chat_id": "uuid-of-chat" }

The reconnection process:

  1. Verify Access: Checks that the user has permission to access the chat
  2. Retrieve Task: Looks up the active StreamingTask for the chat_id
  3. Get History: Retrieves all events stored in the event history
  4. Replay Events: Streams all historical events to the client first
  5. Subscribe to Live: After replay, subscribes to live events
  6. Seamless Transition: Client receives complete event stream without gaps

This architecture allows us to do generation over long-running tasks that are resilient to network connection issues or user behaviour that may terminate a generation task (e.g. closing a browser tab with ongoing generation)

Last updated on