Skip to main content

Feature deep-dive · SaaSForge AI

AI SaaS starter for Next.js, chat, RAG, credits, Stripe

AI SaaS starter for Next.js, chat, RAG, credits, Stripe

Most 'AI SaaS starter' templates are a chat UI plus a `fetch()` to OpenAI. That is a useful demo and a poor product foundation. A real AI SaaS needs retrieval (so answers ground in customer data), metering (so token costs become a billing surface instead of a financial surprise), billing (so the metering connects to revenue), and auth (so customer data is actually scoped). SaaSForge AI ships all four together.

Chat, streaming with Claude and OpenAI

The chat UI streams responses via the Vercel AI SDK, with provider switching between Claude and OpenAI as a config flag. Conversation history persists per user in Supabase Postgres. Edit and regenerate flows are wired so users can iterate on prompts without losing the thread.

The provider abstraction matters because LLM pricing and quality change month to month. A boilerplate that hardcodes one provider is a refactor waiting to happen; SaaSForge AI's provider layer is structured so adding Mistral, Bedrock, or a self-hosted model is an SDK call, not a rewrite.

RAG, upload, chunk, embed, retrieve

Users upload PDFs, text, or Markdown. The pipeline extracts text, chunks it into ~500-1000 token segments with overlap, embeds each chunk via the configured embedding model, and stores the vectors in a `vector` column via pgvector. Each chat turn embeds the question and retrieves top-k chunks scoped to the user's workspace.

Workspace scoping is enforced via Postgres Row Level Security, so a careless retrieval query cannot surface another tenant's chunks into an LLM context. That is the worst-case privacy failure for an AI product, and the boilerplate closes the door on it at the database layer.

Credits, metering tied to real token cost

Each user action consumes tokens (embedding the question, the LLM input prompt, the LLM output). The token usage converts to credits at a configurable rate that covers the underlying API cost plus margin. The credit balance lives in Postgres; debits are idempotent (a retried client call does not double-charge).

Concurrent debits are guarded with row-level locking so two simultaneous chat turns cannot both succeed when only one credit remains. That costs a small amount of latency on the chat path and prevents reconciliation drift, which is the bigger problem.

Idempotent credit debit after a chat turn
await debitCredits({
  workspaceId,
  amount: computeCreditCost({ inputTokens, outputTokens, model }),
  reason: "chat",
  idempotencyKey: `chat:${turnId}`,
});

Billing, Stripe subscriptions that reset credits

Stripe Checkout handles new subscriptions; the Customer Portal handles plan changes, payment methods, and cancellations. The `invoice.payment_succeeded` webhook resets the credit balance to the plan's allotment each cycle. Idempotency tracking on webhook events prevents Stripe retries from double-crediting customers.

Plan changes (upgrade, downgrade, cancel) each have well-defined credit semantics, documented in the webhook handler so you know what you are extending versus what to leave alone. The Customer Portal covers the high-stakes UI parts (payment method storage, invoice access) so the boilerplate keeps custom billing UI small.

What you would still customize per product

The boilerplate is opinionated where the patterns are universal (auth, isolation, metering) and stays out of the way where the product diverges (system prompts, tool calls, custom workflows, eval loops). The folder boundaries are structured so adding a tool-calling agent or a custom retrieval scorer does not require rewriting the chat surface.

The setup validation dashboard runs after clone and confirms env, database, and API key wiring. First deploy day is meant to be boring, the interesting work is the product layer you build on top.

Frequently asked

Do I have to use both Claude and OpenAI?
No. The provider layer supports either one alone or both with switching. Many buyers start on one provider and add the second when they want A/B quality testing or vendor-redundancy. The Vercel AI SDK abstraction means a third provider (Mistral, Bedrock, a self-hosted model) is an SDK call away.
Why pgvector instead of Pinecone or Weaviate?
pgvector lives in the same Postgres as your application data, so retrieval can join against domain tables (workspace, user, document) without cross-system queries. For B2B-shaped AI products where workspace isolation matters, that is a much simpler story than running a separate vector database. The trade-off is at very large scale (10M+ chunks per workspace), at that point you can swap the retrieval layer without rewriting the rest.
Can I sell one-time credit top-ups in addition to subscriptions?
Yes. The credit balance is just a number; subscription resets and one-time top-ups both increment it. The boilerplate ships the subscription path by default; adding a one-time top-up is another Stripe Checkout endpoint hitting the same `addCredits` primitive.
How is hallucination handled?
RAG reduces hallucinations by grounding answers in retrieved chunks but does not eliminate them. The boilerplate's mitigations: instruct the model to cite chunks explicitly, refuse to answer when retrieval scores are below a similarity threshold, and surface citations in the UI so users can verify. Eval loops are the next layer up, that is product work you would add on top.
What is the upgrade path if I outgrow the template?
The boundaries are designed so you can swap any one layer without rewriting the others: change the LLM provider (Vercel AI SDK), swap pgvector for a dedicated vector DB (retrieval layer is one module), move billing to a different provider (Stripe abstraction is one set of files). Outgrowing the template usually looks like adding modules, not replacing them.
Ships in SaaSForge AI

See SaaSForge AI. Skip the deliberation.

Full source code. Lifetime updates. Polar Merchant-of-Record checkout. Private GitHub repo on purchase.