Five Agents, One Chatbot, Zero Cloud
It is Friday Morning. I am sitting in my apartment watching a chatbot answer questions about Techopolis in real time, and it actually knows what it is talking about.
That is not a small thing.
Most help chatbots on the web are useless. You ask a question and get a canned response that has nothing to do with what you asked. Or you get routed to a form. Or you get told to check the FAQ.
I wanted something that actually knows our apps, our services, our courses, and our community. Something that gives real answers with real data and keeps the conversation going.
So I built one. From scratch. In Swift.
Vapor. Swift actors. Apple Foundation Models. No cloud APIs. No per-token fees. Just a Mac and Swift from top to bottom.
This is still a work in progress. But it works.
Getting Foundation Models on a Server
The first problem was the model. Apple Foundation Models run on-device. They are fast, private, and powerful. But they are tied to Apple hardware running macOS 26 or later. You cannot deploy them to a Linux server. You cannot call them from a browser. They live on the device.
We solved this by building the Perspective Intelligence Server. It is a macOS menu bar app that takes Apple’s on-device Foundation Models and exposes them as a local HTTP API. You launch it on a Mac, and it gives you an OpenAI-compatible and Ollama-compatible API endpoint running on localhost.
The model is called apple.local.
That is the key piece. The Perspective Intelligence Server turns any Mac with Apple Silicon into a Foundation Models server. No cloud. No third-party model. Apple’s own on-device models, served over HTTP, available to anything that can make an API call.
The server supports streaming with server-sent events and newline-delimited JSON. Chat completions, text completions, and tool calling all work. Xcode 26 Intelligence Mode, Cursor, Continue.dev, and anything that speaks the OpenAI or Ollama API format can connect to it. The default port is 11434.
The project is open source at github.com/Techopolis/Perspective-Intelligence-Server.
For the chatbot, Michael runs the Perspective Intelligence Server on his Mac. The Vapor application connects to it over the local network.
When someone asks the chatbot a question, the request goes from the browser to the Vapor container. From the container to Michael’s Mac. Through the Perspective Intelligence Server to Apple Foundation Models. The response streams back the same way.
Foundation Models on-device, serving a web chatbot. That is what we built.
The JSON Knowledge Base
Before I wrote a single line of server code, I sat down and built a JSON dataset. Every piece of information the chatbot would need to answer questions about Techopolis, structured and organized in one file.
The dataset covers the company, the apps, the services, the courses, and the social channels.
For each app, I documented everything. Name, description, price, subscription details, platforms, rating, App Store URL, free features, premium features. For services, what we offer, what is included, and how to book. For courses, titles, descriptions, prices, and enrollment status.
That dataset is the single source of truth. When I write the system prompts for each agent, the data comes from that file. The Apps Specialist’s prompt contains the app details from the dataset. The Courses Specialist contains the course details. Everything traces back to one place.
This matters because the chatbot is only as good as the data behind it. A language model can sound confident about anything. But if the data in the prompt is wrong, the confident answer is wrong too. By building the dataset first and treating it as the authority, I know exactly what each agent knows and I can update it in one place.
I built this dataset through Claude code. The model does not invent knowledge. It uses what I give it.
Why Vapor
I write Swift every day. It is the language behind every Techopolis app. When I needed a web backend for this chatbot, I did not want to context switch to Python or Node. I wanted to stay in Swift.
Vapor gave me that. It is a server-side Swift framework with native WebSocket support, async and await throughout, Codable for all JSON handling, and Leaf for HTML templates. If you know Swift, Vapor does not feel foreign. The patterns transfer directly.
The chatbot WebSocket route is registered in the routes file. When a client connects to the help endpoint, Vapor creates a new handler for that connection. The handler is a Swift actor. More on that in a moment.
Five Agents
The chatbot is not one model with one massive prompt trying to know everything. It is five specialized agents, each with curated knowledge about a specific part of Techopolis.
The Apps Specialist knows Perspective Intelligence, Perspective Notify, Perspective Meetings, Beyond The Gallery, and Current City. It knows the prices. The ratings. The platforms. The features. The App Store URLs.
The Services Specialist handles questions about native app development, accessibility testing and auditing, and user experience testing.
The Courses Specialist covers SwiftUI Basics, Practical Python, the upcoming Apple App Development course, Accessing the Mac Terminal Accessibly, and the All Access subscription.
The Community Specialist handles Discord, Mastodon, GitHub, email, and general contact information.
And the Manager is the fallback. It has broad knowledge across all of Techopolis. If a question does not clearly belong to a specialist, the Manager handles it.
Five agents. Five knowledge bases. One system.
Keyword Routing
When a message comes in, the system needs to decide which agent responds. I could have sent the message to the language model first, asked it to classify the intent, and then routed to the right agent. That would work. It would also add latency and burn an extra inference call for every single message.
Instead, the system checks the message against keyword lists. Each specialist has a list of words and phrases associated with its domain. The specialist with the most keyword matches wins. If nothing matches, the Manager picks it up.
Someone asks about Perspective Intelligence. The word Perspective is in the message. Apps Specialist. Someone asks about SwiftUI Basics. The words are right there. Courses Specialist. Someone asks how to contact us. The word contact triggers the Community Specialist.
I was skeptical that this would work well enough. It does. The vocabulary around our products is specific enough that keyword matching handles the vast majority of questions correctly. And it is instant.
Simple is not always wrong. For a known domain with predictable vocabulary, simple is exactly right.
The Actor
Every WebSocket connection creates its own instance of TechopolisHelpHandler. That handler is a Swift actor.
Actors are Swift’s solution to shared mutable state in concurrent code. The compiler enforces that an actor’s properties can only be accessed from within the actor itself. Concurrent access is serialized automatically. There are no data races because the language makes data races impossible.
The chatbot needs mutable state. Each connection maintains a conversation history of up to twenty messages so the model has context for follow-up questions. If you ask about our apps and then ask what they cost, the model needs the history to know what you are referring to.
Multiple people can be using the chatbot at the same time. Multiple WebSocket connections. Multiple actors. Each one with its own isolated conversation history. No locking. No dispatch queues. No semaphores. The compiler handles it.
When a message arrives, the actor decodes the JSON and routes it to the right agent. It appends the message to the history. It builds the full message array with the system prompt and conversation so far. Then it streams the response back chunk by chunk, extracts follow-up suggestions, trims the history, and sends the done message.
All async. All inside the actor. All safe.
Streaming
The language model generates text token by token. Instead of waiting for the entire response, each chunk gets sent over the WebSocket as it arrives.
The FoundationModelsClient exposes a streaming method that returns an AsyncThrowingStream. That is a Swift concurrency primitive that produces values over time. The actor iterates over the stream with a for-try-await loop and sends each chunk to the client immediately.
Text appears word by word. The bot starts responding the instant it has something to say.
That matters. A chatbot that makes you wait feels broken. A chatbot that starts talking immediately feels alive. A chatbot that streams feels like a conversation.
The FoundationModelsClient talks to the Perspective Intelligence Server using the Ollama-compatible API. It sends a POST request to the chat endpoint with streaming enabled.
The response comes back as newline-delimited JSON. Each line contains the next piece of generated text. The client buffers bytes, splits on newlines, decodes each line, and yields events through the stream.
The same client works with Ollama too. In the Docker deployment, we can point it at Ollama running Llama 3.2 as an alternative. The API is compatible. The interface is the same. Swap the URL and the model name in the environment variables and everything keeps working.
Follow-Up Suggestions
After every response, the chatbot suggests two follow-up questions as clickable buttons. This keeps conversations moving and helps people discover things they might not have thought to ask about.
Each agent’s system prompt instructs the model to generate follow-up suggestions using a marker format at the end of its response. The backend parses these markers with a regex, strips them from the displayed text, and sends them as structured JSON.
If the model does not produce any markers, the system falls back to sensible defaults for each agent. The Apps Specialist defaults to questions about Perspective Intelligence. The Courses Specialist defaults to questions about free courses and the All Access subscription. Every agent has its own fallback set.
The Frontend
The frontend is a single JavaScript class. No React. No Vue. No build step. Vanilla JavaScript.
A help chatbot does not need a framework. It needs a WebSocket connection, a message list, a text input, and DOM management. The class handles the connection lifecycle, message rendering, streaming updates, follow-up buttons, and the modal dialog.
Bot responses are rendered as markdown using marked.js. When someone asks about an app, the response includes a direct link to the App Store.
The chatbot markup lives in the base Leaf template, which means it appears on every page of the site. A floating button in the corner opens a dialog with the conversation.
Running It
The chatbot runs in Docker. The Vapor application is a Linux container. The Perspective Intelligence Server runs on Michael’s Mac. Docker Compose manages the Vapor container and a PostgreSQL database.
The Vapor container connects to the Perspective Intelligence Server over the local network. In development, the default points to Michael’s Mac. In Docker, it connects through host.docker.internal if the server runs on the same machine.
A visitor sends a message. It hits the Vapor container. The container calls the Perspective Intelligence Server on the Mac. Apple Foundation Models generate the response on Apple Silicon. The chunks stream back through the container to the browser.
No cloud. No API bill. No rate limits. Foundation Models running on a Mac, serving a chatbot on the web.
What I Learned
Building this taught me that Swift on the server is genuinely productive for someone who already thinks in Swift.
Actors work the same on the server as they do on iOS. AsyncThrowingStream integrates cleanly with WebSocket handlers. Codable makes JSON handling effortless. The type system catches mistakes at compile time that would be runtime errors in other languages.
If you are an iOS developer who has never tried Vapor, the gap is smaller than you think. The patterns transfer. The concurrency model transfers. The language is the same.
I also learned that multi-agent systems do not have to be complicated. Five agents with curated prompts and keyword routing is enough to build something genuinely useful. Not every problem needs a vector database or a retrieval pipeline. Sometimes the right answer is a keyword list and a well-structured JSON dataset.
Still Building
This is a work in progress. The routing will improve. More agents will join the system. The knowledge base will grow. The goal is a full support system that knows everything about Techopolis and can help anyone who visits the site.
We are not there yet. But the foundation is solid.
That is enough for now. More to come.

