yeray@blog:~$ cat ai-chat-portfolio.md

How I Added an AI Chat to My Portfolio Website

2026-03-22·7 min read

Most developer portfolios are glorified PDFs. I wanted mine to actually demonstrate that I build AI stuff — not just list it on a resume. So I added a chat that lets visitors ask questions about my experience, and it answers using my CV as context.

Here's exactly how I built it, what it costs (spoiler: almost nothing), and the decisions I made along the way.

Tech Stack

TanStack Start — Full-stack React framework with file-based routing and server functions
@tanstack/ai + @tanstack/ai-react — Streaming chat primitives (SSE-based)
@tanstack/ai-openai — OpenAI adapter for TanStack AI
OpenAI gpt-4o-mini — Fast, cheap, good enough for Q&A
Tailwind CSS — Terminal-style UI with green-on-black aesthetic
Vercel — Free tier hosting

Why TanStack Start? I use it daily at MongoDB for the Atlas Pricing Calculator. It's fast, the DX is great, and the AI package gives you streaming chat out of the box with zero boilerplate.

Why gpt-4o-mini? It's ~10x cheaper than gpt-4o and perfectly capable for structured Q&A over a small context. There's no reason to use a bigger model here.

The Server Route

TanStack Start uses file-based routing. A file at src/routes/api/chat.ts becomes a POST /api/chat endpoint automatically. Here's the full implementation:

// src/routes/api/chat.ts
import { chat, toServerSentEventsResponse } from '@tanstack/ai'
import { openaiText } from '@tanstack/ai-openai'
import { createFileRoute } from '@tanstack/react-router'

const CV_CONTEXT = `You are an AI assistant for Yeray Díaz Rodríguez's portfolio website...`

export const Route = createFileRoute('/api/chat')({
  server: {
    handlers: {
      POST: async ({ request }) => {
        if (!process.env.OPENAI_API_KEY) {
          return new Response(
            JSON.stringify({ error: 'OPENAI_API_KEY not configured' }),
            { status: 500, headers: { 'Content-Type': 'application/json' } },
          )
        }

        const rateLimitKey = getRateLimitKey(request)
        const { limited, remaining } = isRateLimited(rateLimitKey)

        if (limited) {
          return new Response(
            JSON.stringify({ error: 'Too many requests. Please try again later.' }),
            { status: 429 },
          )
        }

        const { messages, conversationId } = await request.json()

        const stream = chat({
          adapter: openaiText('gpt-4o-mini'),
          messages: [
            { role: 'system', content: CV_CONTEXT },
            ...messages,
          ],
          conversationId,
        })

        return toServerSentEventsResponse(stream, {
          headers: { 'X-RateLimit-Remaining': String(remaining) },
        })
      },
    },
  },
})

A few things to note:

chat() + openaiText() — TanStack AI abstracts the OpenAI SDK. You pass an adapter and messages, get a stream back.
toServerSentEventsResponse() — Converts the stream into an SSE response. The client handles parsing automatically.
System prompt is prepended server-side — The client only sends user/assistant messages. The CV context never leaves the server.
OPENAI_API_KEY check — Fail fast with a clear error instead of a cryptic OpenAI SDK crash.

The System Prompt

This is where the magic happens. Instead of RAG (retrieval-augmented generation), I just dump my entire CV into the system prompt:

const CV_CONTEXT = `You are an AI assistant for Yeray Díaz Rodríguez's portfolio website.
Answer questions about his experience, skills, and projects based on the following CV:

# Yeray Díaz Rodríguez
AI Engineer / Senior Engineer · Madrid, Spain (Remote)

## Summary
Product engineer with 10+ years of full-stack experience, evolving into AI engineering.
Currently at MongoDB, where I own a public-facing pricing tool and built a working MCP
server that enables AI agents to query and recommend Atlas pricing...

[... full CV ...]

---
Rules:
- Be concise and helpful
- Answer in the same language the user asks in (Spanish or English)
`

Why not RAG?

RAG (splitting your content into chunks, embedding them, storing in a vector database, retrieving relevant chunks at query time) makes sense when you have a lot of content — documentation, knowledge bases, hundreds of pages.

A CV is ~1,500 tokens. gpt-4o-mini supports 128K tokens. My entire CV fits in the system prompt with room to spare. Adding a vector database for this would be like using Kubernetes to deploy a static HTML page.

Rule of thumb: If your context fits in the system prompt, skip RAG. You'll ship faster, have fewer moving parts, and the model gets full context on every query (which often produces better answers than retrieved chunks).

The Chat UI

The chat component uses @tanstack/ai-react's useChat hook, which handles message state, streaming, and SSE parsing:

import { useChat, fetchServerSentEvents } from '@tanstack/ai-react'

export function Chat() {
  const [input, setInput] = useState('')
  const { messages, sendMessage, isLoading } = useChat({
    connection: fetchServerSentEvents('/api/chat'),
  })

  return (
    <section className="py-8" id="chat">
      <div className="border border-[#00ff41]/30 bg-black/80 backdrop-blur glow-border">
        {/* Terminal-style header with traffic lights */}
        <div className="flex items-center gap-2 px-4 py-2 border-b border-gray-800">
          <div className="w-3 h-3 rounded-full bg-red-500/80" />
          <div className="w-3 h-3 rounded-full bg-yellow-500/80" />
          <div className="w-3 h-3 rounded-full bg-green-500/80" />
          <span className="text-gray-500 text-xs ml-2">yeray-ai</span>
        </div>

        {/* Messages + input */}
        <div className="h-80 overflow-y-auto p-4 space-y-4">
          {messages.map((message) => (
            <div key={message.id}>
              <span className={message.role === 'assistant'
                ? 'text-[#00ff41]' : 'text-blue-400'}>
                {message.role === 'assistant' ? 'yeray-ai $' : 'you $'}
              </span>
              <div className="text-gray-300">{/* ... */}</div>
            </div>
          ))}
        </div>

        <form onSubmit={handleSubmit}>
          <span className="text-[#00ff41]">&#x26;gt;</span>
          <input placeholder="Ask about Yeray's experience..." />
        </form>
      </div>
    </section>
  )
}

Key decisions:

useChat + fetchServerSentEvents — One line connects UI to streaming API
Suggestion chips — Clickable prompts dramatically increase engagement
Terminal aesthetic — Traffic lights, $ prefixes, pulsing cursor

Rate Limiting

Without rate limiting, anyone can spam your endpoint and run up your OpenAI bill. Three layers of protection:

const RATE_LIMIT_WINDOW_MS = 60 * 60 * 1000  // 1 hour
const MAX_REQUESTS_PER_WINDOW = 15
const MAX_MESSAGES_PER_CONVERSATION = 20
const MAX_TOKENS_PER_RESPONSE = 500

const rateLimitMap = new Map<string, { count: number; resetAt: number }>()

function isRateLimited(key: string): { limited: boolean; remaining: number } {
  const now = Date.now()
  const entry = rateLimitMap.get(key)

  if (!entry || now > entry.resetAt) {
    rateLimitMap.set(key, { count: 1, resetAt: now + RATE_LIMIT_WINDOW_MS })
    return { limited: false, remaining: MAX_REQUESTS_PER_WINDOW - 1 }
  }

  entry.count++
  if (entry.count > MAX_REQUESTS_PER_WINDOW) {
    return { limited: true, remaining: 0 }
  }

  return { limited: false, remaining: MAX_REQUESTS_PER_WINDOW - entry.count }
}

Cost analysis

Input: $0.15 / 1M tokens
Output: $0.60 / 1M tokens
Per conversation (~5 messages): ~$0.001

That's one tenth of a cent per conversation. Even 1,000 conversations/month costs ~$1.

Why in-memory and not Redis?

Adding Redis means another service to manage, another thing to pay for. The in-memory Map resets on cold start — slightly permissive, but fine for a portfolio site. Perfect is the enemy of shipped.

Deployment

Vercel free tier. One environment variable (OPENAI_API_KEY), connect GitHub, done.

Item	Cost
Vercel hosting	$0/mo
OpenAI API (gpt-4o-mini)	$0–1/mo
Total	$0–1/mo

Results & Learnings

What worked

Engagement is way up. People interact with the chat instead of scanning and leaving.
Multilingual for free. "Answer in the user's language" in the system prompt just works.
TanStack AI is incredibly underrated. Four functions and you have a full streaming chat.

What surprised me

gpt-4o-mini is more than good enough for structured Q&A over a known context.
The system prompt approach beats RAG when your context is small.
Rate limiting is table stakes. Almost shipped without it. Best 30 minutes I spent.

What I'd add next

Analytics — Track what questions people ask most
Conversation persistence — LocalStorage to survive page reloads
Voice input — Whisper API is cheap

Built by Yeray Díaz Rodríguez. If you build something similar, I'd love to see it — reach out.