Skip to main content

Architecture

This page explains how Orqestra works internally — what each piece is, why it exists, and how they fit together. You don't need this to use Orqestra, but it helps if you're deploying it, debugging it, or building on top of it.


The big picture

When your service fires an event, four components work together to turn it into a delivered notification:

flowchart TD
A[Your service<br/>HTTP POST /events/trigger] -->|validated + queued| B[Worker<br/>NestJS / TypeScript]
B -->|stores pending log| C[(PostgreSQL)]
B -->|publishes| D[Kafka<br/>tenant.event.received]
D -->|consumed| B
B -->|publishes| E[Kafka<br/>notification.dispatch]
E -->|consumed| F[Gateway<br/>Go microservice]
F -->|sends via| G[Email / SMS / Push providers]
F -->|updates| C
F -->|failures| H[Kafka<br/>notification.retry → dlq]
H -->|consumed| F
F -->|real-time| I[Centrifugo<br/>WebSocket server]
I -->|WebSocket push| J[User's browser]

Component by component

Worker (NestJS / TypeScript)

Why it exists: The Worker is the brain of Orqestra. It handles all tenant-facing API calls — accepting events, managing templates and providers, authenticating admins, and translating raw events into structured dispatch payloads.

What it does:

  • Validates incoming events and API keys
  • Applies per-tenant rate limits and idempotency checks (stored in Redis)
  • Looks up the matching templates for each event type
  • Creates notification_logs entries so every event has a traceable record
  • Publishes structured dispatch payloads to Kafka

Why it's separate from the Gateway: The Worker acknowledges your HTTP call in milliseconds. It doesn't wait for the provider to respond. That separation means slow provider APIs never make your event trigger slow.


Go Gateway

Why it exists: Dispatching to external providers (Resend, Twilio, etc.) involves network calls that can be slow, fail, or rate-limit. The Gateway handles all of that in isolation so failures don't cascade into the Worker.

What it does:

  • Consumes notification.dispatch messages from Kafka
  • Loads provider credentials from PostgreSQL (decrypted in memory)
  • Calls the correct provider adapter (Resend, SendGrid, Twilio, Centrifugo)
  • Handles retries with exponential back-off (via notification.retry topic)
  • Routes permanently failed notifications to the Dead Letter Queue (notification.dlq)
  • Updates notification_logs to SENT, FAILED, or RETRYING

Why Go? Provider dispatch is I/O-bound and high-throughput. Go's lightweight goroutines handle many concurrent dispatch calls efficiently without the overhead of a Node.js event loop.


Kafka

Why it exists: Kafka is the async message bus between the Worker and the Gateway. It decouples event acceptance from event delivery — your service gets a fast acknowledgment, and delivery happens independently.

Topics:

TopicPublisherConsumerPurpose
tenant.event.receivedWorker (events endpoint)Worker (notification handler)Raw events from tenants
notification.dispatchWorkerGatewayStructured payloads ready for dispatch
notification.retryGatewayGatewayFailed dispatches to retry
notification.dlqGatewayWorker (DLQ handler)Permanently failed notifications

Why this matters: If the Gateway is temporarily unavailable (restart, deploy), Kafka holds the messages. Events don't get lost — they queue up and are processed when the Gateway comes back.


PostgreSQL

Why it exists: PostgreSQL is the persistent store for everything that needs to survive a restart or be queried later.

What it stores:

  • Tenant records (API keys, rate limits, delivery channels, provider config references)
  • Templates (content, event type mappings, version history)
  • Provider configs (encrypted API keys)
  • Notification logs (every dispatch attempt with status and error details)
  • Failed notifications (DLQ entries with full payload for replay)
  • Audit logs (admin actions, session events)

Row-Level Security (RLS): Every query the Worker runs executes with the current tenant ID set as a PostgreSQL session variable. RLS policies reject any query that tries to read data for a different tenant. This means even a software bug can't return cross-tenant data.


Redis

Why it exists: Redis handles the real-time, short-lived state that doesn't belong in PostgreSQL.

What it stores:

  • Rate limit counters (per tenant, per minute)
  • Idempotency keys (24-hour deduplication window)
  • Sandbox session records (session ID, IP binding, TTL)

Centrifugo

Why it exists: Real-time in-app notifications require a persistent WebSocket connection. Centrifugo is a purpose-built WebSocket server that Orqestra delegates real-time push to — rather than building WebSocket handling into the Worker.

How it works: When the Gateway dispatches a PUSH channel notification, it publishes to a Centrifugo channel. Your frontend subscribes to that channel via the Centrifugo JS client. The message arrives in the user's browser within milliseconds.


Admin UI (Next.js)

Why it exists: Orqestra is meant to be operated by product teams, not just engineers. The Admin UI lets non-technical users create templates, configure providers, monitor logs, and replay failed notifications without touching the API directly.


Advanced: Security model

Tenant isolation

Every row in the tenants-facing tables (templates, notification_logs, provider_configs, etc.) has a tenant_id column. PostgreSQL RLS policies enforce that queries only return rows matching the current session's tenant ID. The Worker sets this via:

SET LOCAL app.current_tenant_id = '<uuid>';

This happens inside a Prisma transaction wrapper (DbContextService.withActorContext). All queries within that transaction automatically filter by the tenant ID.

Encryption

Provider API keys are encrypted with AES-256-GCM before storage and decrypted in memory only when the Gateway needs to dispatch. The encryption key is never stored in the database.

Authentication

Tenant admins authenticate with a username and password. On success, a short-lived JWT is issued. Sandbox sessions are additionally IP-bound (and optionally device-bound) and expire after 2 hours.