ServiceOff
  • Cursor
  • AI
  • SaaS

OpenAI API Integration with Cursor: Real Apps, Not Demos

Wire chatbots, agents, and automations with auth, RAG, rate limits, and cost controls — production patterns for small teams.

Published May 21, 2026 · 11 min read

Browseryour UIYour APIauth · limits · logsbuilt with CursorOpenAIchat / embedPostgrespgvector RAGNever expose API keys in the browser
Production OpenAI apps route all model calls through your server — never the browser.

Introduction

Tutorial repos call OpenAI from the client with a leaked key. Production apps do not. You need server-side routes, authentication, rate limits, logging, and cost caps — patterns Cursor can scaffold in an afternoon if you specify the architecture upfront.

This guide walks through integrating the OpenAI API for real apps: chat, embeddings, RAG, and agents — deployed on DigitalOcean, not localhost demos.

Demo vs production

DemoProductionKey in frontendKey in server envNo authSession + RBACUnlimited callsRate limits + quotasNo loggingToken + cost logsShip production patterns from day one — refactors are expensive under load
The gap between a hackathon demo and a shippable OpenAI integration.
ConcernDemo shortcutProduction pattern
API keyNEXT_PUBLIC_* envServer-only OPENAI_API_KEY
AuthNoneSession or JWT on /api routes
AbuseUnlimitedRate limit per user + IP
CostIgnoredToken logging + daily budget cap
ModelsHard-codedEnv-driven model router
Patterns to implement before you share a public URL.

Request lifecycle

RequestPOST /api/chatAuthsession / JWTRate limitper user/IPRAG retrievepgvectorOpenAI callserver-sideLog + respondtokens + costEvery step runs on your server — keys stay in env varsCursor scaffolds routes, Zod validation, and OpenAI SDK wrappers in /services
Every chat request passes auth, limits, optional RAG, then OpenAI — with logging on the way out.

Put OpenAI calls in /services/openai or your API layer — not in React components. Validate input with Zod, truncate prompts server-side, and return structured errors the UI can display without exposing stack traces.

RAG with pgvector

Retrieval-augmented generation keeps answers grounded in your docs. Chunk markdown, embed with text-embedding-3-small, store vectors in Postgres with pgvector, and retrieve top-k chunks before each chat completion.

EmployeeYour appVector DBLLM APIRequest →1. Question2. Embed + search chunks3. Context + prompt → LLMResponse ←4. Answer text → app5. Show reply (+ handoff if low confidence)
Support bot RAG flow — ingest, embed, retrieve, generate.

Cost and model routing

Route simple tasks to smaller models; reserve GPT-4 class models for complex reasoning. Cache identical FAQ questions. Log prompt and completion tokens per user for billing and debugging.

Use caseModel tierCost note
ClassificationSmall / miniSub-cent per call
Support botMid + RAGCache repeated questions
Agent planningLargeCap steps + timeout
EmbeddingsEmbedding modelBatch on ingest, not per chat

Deploy on DigitalOcean

Run API and Postgres on one Droplet for MVPs; move to managed DB when backups matter. Set OPENAI_API_KEY in Docker env or DO App Platform secrets — never in git. See Full-stack deployment on DigitalOcean for the full pipeline.

Common mistakes

  • Streaming responses without abort controllers — runaway token burn
  • No max_tokens limit on user-facing chat
  • Pasting user content into system prompts without sanitization
  • Skipping idempotency keys on webhook-triggered agent runs

FAQ

Can Cursor generate the whole integration?

Yes — routes, services, Zod schemas, and pgvector migrations. You still review auth boundaries, rate limits, and env handling before production.

OpenAI or Anthropic?

Abstract behind one interface in your service layer. Pick OpenAI for v1 if you need embeddings + chat in one vendor; swap via env later.

How do I estimate monthly API cost?

Log tokens for 50 test sessions, multiply by expected daily users, add 30% buffer. Set hard daily caps in code until you trust the math.

Next steps

Scaffold one authenticated POST /api/chat route with rate limiting, deploy to a Droplet, and load-test with 100 parallel requests. Read Build AI business tools with Cursor for product patterns beyond the API layer.