Knowledge Base · Context that travels with every agent
Give Every AI Agent Your Company's Knowledge
Upload your docs. Connect Notion, Google Drive, and Confluence. Every AI agent in Arahi gains instant, grounded access to your company's real context — product specs, SOPs, policies, customer history — no copy-paste, no prompt stuffing.
The layer
What Is an AI Agent Knowledge Base?
A shared memory layer that turns your scattered company docs into on-demand context for every AI agent you run. Upload once, index automatically, and let every agent retrieve exactly what it needs — grounded in your data, not a model's best guess.
Upload or connect anything
Drop in PDFs, Word docs, spreadsheets, and slide decks — or connect Notion, Drive, Confluence, Slack, and websites directly. One knowledge base, every format, no manual glue code.
Auto-chunked & vectorized
Arahi chunks long documents intelligently, generates embeddings, and stores them in a managed vector index. No pipeline to maintain, no embedding model to pick, no reranker to tune.
Every agent, one source of truth
Any AI agent you build — chat, workflow, autonomous — can query the knowledge base on demand. Swap tools, add new agents, rebuild workflows: the knowledge layer stays put.
Permissions you can trust
Source-level permissions are preserved on retrieval. If a teammate can't see a doc in Drive, the agent can't surface it for them. No data leaks, no shadow access, full audit trail per query.
Sources
Plug in the places your knowledge already lives
No one wants to move their documentation into yet another tool. Arahi meets your knowledge where it is — files, SaaS, the open web, and custom systems — and makes it instantly retrievable by every agent.
PDFs & Docs
Drag in PDFs, Word docs, spreadsheets, and slide decks. Arahi parses layout, tables, and headings to preserve meaning during chunking.
Notion
Connect a workspace and sync selected databases or pages. Updates sync automatically so the agent never reads a stale spec.
Google Drive
Select folders or shared drives and let Arahi index Docs, Sheets, and Slides continuously — respecting each file's sharing settings.
Confluence
Pull in spaces, pages, and attachments from Confluence Cloud so your agents answer with the same runbooks your engineers read.
Slack
Index selected public channels to capture tribal knowledge — product decisions, customer escalations, and threads your wiki never captured.
Websites
Point Arahi at a sitemap or URL set and it will crawl, extract clean text, and re-index on a schedule you control.
Custom API
Bring your own source. Push documents, FAQs, or records into the knowledge base via a simple REST endpoint — no pipeline code required.
Databases
Connect Postgres, MySQL, or BigQuery and expose selected tables or views as structured knowledge agents can query in natural language.
How it works
From scattered docs to grounded agents in four steps
The knowledge base is the piece most teams skip — and pay for later with hallucinated answers and copy-pasted prompts. Arahi makes it the easy default.
Connect a source
Pick a source type — files, Notion, Drive, Confluence, Slack, website, database, or custom API. OAuth in with a click or drag documents in directly. No pipeline to wire up.
Auto-index & chunk
Arahi parses structure, splits long documents into semantically coherent chunks, and writes embeddings into a managed vector store. You don't pick a model or manage an index.
Agents query on demand
Any agent you build — chat, workflow, autonomous — retrieves the most relevant passages at runtime, cites its sources, and grounds every answer in your actual documentation.
Keep it fresh
Connected sources re-sync automatically when the underlying doc changes. Website crawls and API pushes run on a schedule. Stale pages get flagged so knowledge never rots.
Frequently asked questions
Almost anything text-based. On the file side, Arahi ingests PDFs, Word documents, Markdown files, plain text, CSVs, Excel spreadsheets, PowerPoint and Google Slides exports, HTML, and JSON. On the SaaS side, you can connect Notion workspaces, Google Drive folders, Confluence spaces, selected Slack channels, and any website via sitemap crawl. For structured data, you can point Arahi at Postgres, MySQL, or BigQuery tables and views so agents can query records in natural language. Need something custom? Push documents into the knowledge base via our REST API — useful for product catalogs, support ticket exports, or internal systems without a native connector. Arahi handles parsing, chunking, and embedding automatically regardless of format, so you don't have to normalize anything upfront. File sizes up to 100MB per document are supported on paid plans, and there is no hard cap on the number of documents in a single knowledge base.
Permissions are preserved end-to-end. When you connect a source like Google Drive, Notion, or Confluence, Arahi syncs each document's native ACL alongside its content. At query time, the agent only retrieves passages that the requesting user is authorized to see in the original system. If a teammate loses access to a Drive folder, those chunks stop appearing in their agent responses on the next sync — automatically. For uploaded files and custom API content, you control visibility with team-level and collection-level scopes: you can restrict a knowledge base to a specific team, role, or individual user. Every retrieval is logged with the user identity, the query, the returned sources, and the timestamp, so you always have an audit trail. This means your AI agents never become a backdoor around your existing document-level security model.
No. Your documents, embeddings, and query logs are never used to train foundation models — ours or any third-party provider's. Arahi uses enterprise API tiers of our model providers, which contractually prohibit training on customer data and enforce zero-day retention. Your content is stored in an isolated vector index scoped to your workspace, encrypted at rest with AES-256 and in transit with TLS 1.3. You can delete any source, collection, or the entire knowledge base at any time, and deletions propagate to the vector store immediately. For teams with strict compliance requirements, we offer data residency controls and a BAA on eligible plans. We're SOC 2 Type II in progress, with full documentation available under NDA. The short version: your knowledge stays your knowledge.
Freshness depends on the source, and you control the cadence. Native SaaS connectors (Notion, Google Drive, Confluence, Slack) use change-data-capture where the source supports it — updates typically propagate within minutes of an edit in the source of truth. Website crawls run on schedules you set, from hourly to weekly. Database connections query live at retrieval time, so records are always current. Uploaded files re-index immediately when you replace them. Arahi also tracks staleness metadata: if a document hasn't been updated in 180 days or references have drifted, the admin dashboard flags it for review so institutional knowledge doesn't rot quietly. You can also force an immediate re-sync on any source from the UI or via API, which is handy right after a big product update or policy change.
Yes — that's the whole design. A single knowledge base is a first-class, reusable asset in Arahi. Your support chat agent, your sales research workflow, your internal Q&A bot, and your autonomous onboarding agent can all query the same underlying collections. You can also scope things more finely: create a company-wide "general" knowledge base plus team-specific ones (e.g., product, legal, finance) and let each agent subscribe to the collections it actually needs. This avoids the classic mistake of re-uploading the same SOPs into five different agent configs and then forgetting to update one when the policy changes. Change a document in the source, and every agent subscribed to that collection gets the new version on the next query. One source of truth, many agents, no drift.
Knowledge base and Memory solve different problems, and most teams use both. A knowledge base is a shared, governed library of company-wide context — product docs, SOPs, policies, historical tickets — that every agent can retrieve from on demand. It's explicit, curated, and intended to be authoritative. Memory is personalized, per-agent, and accumulates from interactions over time: what a user prefers, how they phrase things, which accounts they care about, what they worked on last week. Knowledge base is "what does the company know"; Memory is "what has this agent learned about its user and its work." The two are complementary: Memory helps an agent remember that you always CC your manager on contract drafts, while the knowledge base makes sure the contract clauses it pulls are the current, legally-approved ones. Most production Arahi deployments pair them from day one.
Stop pasting context into every prompt.
Connect your first source in under two minutes. Every agent you build from there starts smarter — grounded in your docs, your data, your company's real way of working.

