AI SaaS starter for Next.js, chat, RAG, credits, Stripe
Most 'AI SaaS starter' templates are a chat UI plus a `fetch()` to OpenAI. That is a useful demo and a poor product foundation. A real AI SaaS needs retrieval (so answers ground in customer data), metering (so token costs become a billing surface instead of a financial surprise), billing (so the metering connects to revenue), and auth (so customer data is actually scoped). SaaSForge AI ships all four together.
Chat, streaming with Claude and OpenAI
The chat UI streams responses via the Vercel AI SDK, with provider switching between Claude and OpenAI as a config flag. Conversation history persists per user in Supabase Postgres. Edit and regenerate flows are wired so users can iterate on prompts without losing the thread.
The provider abstraction matters because LLM pricing and quality change month to month. A boilerplate that hardcodes one provider is a refactor waiting to happen; SaaSForge AI's provider layer is structured so adding Mistral, Bedrock, or a self-hosted model is an SDK call, not a rewrite.
RAG, upload, chunk, embed, retrieve
Users upload PDFs, text, or Markdown. The pipeline extracts text, chunks it into ~500-1000 token segments with overlap, embeds each chunk via the configured embedding model, and stores the vectors in a `vector` column via pgvector. Each chat turn embeds the question and retrieves top-k chunks scoped to the user's workspace.
Workspace scoping is enforced via Postgres Row Level Security, so a careless retrieval query cannot surface another tenant's chunks into an LLM context. That is the worst-case privacy failure for an AI product, and the boilerplate closes the door on it at the database layer.
Credits, metering tied to real token cost
Each user action consumes tokens (embedding the question, the LLM input prompt, the LLM output). The token usage converts to credits at a configurable rate that covers the underlying API cost plus margin. The credit balance lives in Postgres; debits are idempotent (a retried client call does not double-charge).
Concurrent debits are guarded with row-level locking so two simultaneous chat turns cannot both succeed when only one credit remains. That costs a small amount of latency on the chat path and prevents reconciliation drift, which is the bigger problem.
await debitCredits({
workspaceId,
amount: computeCreditCost({ inputTokens, outputTokens, model }),
reason: "chat",
idempotencyKey: `chat:${turnId}`,
});Billing, Stripe subscriptions that reset credits
Stripe Checkout handles new subscriptions; the Customer Portal handles plan changes, payment methods, and cancellations. The `invoice.payment_succeeded` webhook resets the credit balance to the plan's allotment each cycle. Idempotency tracking on webhook events prevents Stripe retries from double-crediting customers.
Plan changes (upgrade, downgrade, cancel) each have well-defined credit semantics, documented in the webhook handler so you know what you are extending versus what to leave alone. The Customer Portal covers the high-stakes UI parts (payment method storage, invoice access) so the boilerplate keeps custom billing UI small.
What you would still customize per product
The boilerplate is opinionated where the patterns are universal (auth, isolation, metering) and stays out of the way where the product diverges (system prompts, tool calls, custom workflows, eval loops). The folder boundaries are structured so adding a tool-calling agent or a custom retrieval scorer does not require rewriting the chat surface.
The setup validation dashboard runs after clone and confirms env, database, and API key wiring. First deploy day is meant to be boring, the interesting work is the product layer you build on top.