Skip to main content

What is a Knowledge Base?

A knowledge base is the collection of information your knowledge agent can search and reference when answering questions or making decisions. Think of it as your agent’s memory - the more relevant knowledge you add, the more helpful and accurate your agent becomes. Unlike traditional chatbots that only know what they were trained on, knowledge agents use Retrieval Augmented Generation (RAG) to dynamically search your knowledge and incorporate it into responses.

How Knowledge Retrieval Works (RAG Simplified)

Here’s what happens when someone asks your knowledge agent a question:
User asks: "What's our refund policy?"

Agent converts question to a search query

Searches knowledge base for relevant content

Finds: "Refund Policy.pdf" - page 2, section 3

AI reads the relevant section

Generates answer using that specific information

Responds: "According to our refund policy, customers can..."
Key point: Your agent doesn’t memorize everything - it searches and retrieves relevant pieces on-demand. This means:
  • You can add lots of knowledge without “retraining”
  • Answers come from your actual documents
  • You can update knowledge anytime
  • Agent cites sources (useful for verification)

Supported Knowledge Sources

You can train your knowledge agent with six types of content:
Source TypeWhat It’s ForProcessing Time
FilesPDFs, Word docs, text files10-30 seconds
URLsWeb pages, articles, documentation15-45 seconds
YouTubeVideo transcripts20-60 seconds
Google DocsWorkspace documents10-30 seconds
Google SheetsSpreadsheet data10-30 seconds
Twitter/XTweets and threads15-45 seconds
LinkedInProfiles and posts20-60 seconds
All sources are automatically:
  • Chunked into searchable segments
  • Embedded as vectors for semantic search
  • Stored in your agent’s vector database
  • Instantly available for retrieval

Adding Knowledge: Step-by-Step

Files (PDF, DOCX, TXT)

Best for: Documentation, reports, guides, research papers How to add:
  1. Navigate to your knowledge agent builder
  2. Click the “Training” tab
  3. Click the “Files” sub-tab
  4. Click “Upload” or drag and drop files
  5. Wait for processing (progress bar shows status)
  6. File appears in the list when ready
Supported formats:
  • PDF (.pdf)
  • Microsoft Word (.doc, .docx)
  • Plain text (.txt)
  • Markdown (.md)
Tips:
  • PDFs work best when they contain actual text (not scanned images)
  • Remove unnecessary pages to improve relevance
  • File names help the agent understand context - use descriptive names
File size limit: 25MB per file. For larger documents, consider splitting them or using a URL if the content is available online.

Web URLs

Best for: Websites, blog posts, online documentation, public articles How to add:
  1. Go to the “Training” tab
  2. Click the “URLs” sub-tab
  3. Paste the full URL (starting with https://)
  4. Click “Add URL”
  5. Content is scraped and processed automatically
What gets extracted:
  • Main text content from the page
  • Headings and structure
  • Some metadata (title, author if available)
  • Not extracted: Images, videos, interactive elements
Special handling: Google Docs:
  • Paste the sharing link (make sure it’s accessible via link)
  • Agent automatically exports to readable format
  • Formatting is preserved
Google Sheets:
  • Paste the sharing link
  • Data is exported and indexed
  • Useful for product catalogs, pricing, data tables
Pro tip: For documentation sites with many pages, add the most important/overview pages. You don’t need every single page - the agent will direct users based on what you’ve added.

YouTube Videos

Best for: Tutorials, presentations, interviews, educational content How to add:
  1. Go to the “Training” tab
  2. Click the “YouTube Videos” sub-tab
  3. Paste the YouTube video URL
  4. Click “Add Video”
  5. Agent automatically extracts the transcript
What gets indexed:
  • Full transcript of spoken words
  • Video title and description
  • Channel information
  • Key moments/chapters (if available)
Important notes:
  • Video must have captions/subtitles (auto-generated works)
  • Videos without transcripts cannot be processed
  • Transcript language is auto-detected
Use cases:
  • Index your tutorial videos so agent can answer “how-to” questions
  • Add conference talks or presentations
  • Include product demos or walkthroughs
  • Reference expert interviews or talks

Twitter/X Posts

Best for: Twitter threads, announcements, thought leadership content How to add:
  1. Go to the “Training” tab
  2. Click the “Twitter” sub-tab
  3. Enter a Twitter username (without @) or paste a tweet URL
  4. Click “Add”
What gets indexed:
  • Tweet text content
  • Thread structure (if it’s a thread)
  • Author information
  • Timestamps
Use cases:
  • Add your own tweets to train agent on your thinking
  • Include industry expert threads
  • Reference announcement tweets
  • Capture Twitter-based discussions

LinkedIn Content

Best for: Professional profiles, thought leadership posts, company updates How to add:
  1. Go to the “Training” tab
  2. Click the “LinkedIn” sub-tab
  3. Enter a LinkedIn profile URL or post URL
  4. Click “Add”
What gets indexed:
  • Profile headline and about section
  • Recent posts and articles
  • Experience and background (for profiles)
  • Post content and engagement
Use cases:
  • Add your LinkedIn profile to train agent on your expertise
  • Include company LinkedIn posts
  • Reference industry leader profiles
  • Capture professional insights and articles

Managing Your Knowledge Base

Viewing Your Knowledge

In the Training tab, you’ll see all your knowledge sources listed with:
  • Source name/title
  • Type (file, URL, video, etc.)
  • Upload date
  • Processing status
  • File size or length

Refreshing Content

For URLs, YouTube videos, and social media sources, you can refresh the content to get updates:
  1. Find the source in the list
  2. Click the refresh icon next to it
  3. Agent re-fetches and re-processes the content
  4. Updated content replaces the old version
When to refresh:
  • Documentation has been updated
  • YouTube video captions were improved
  • Twitter thread was extended
  • Website content changed
Files can’t be refreshed - you’ll need to delete and re-upload if you have a newer version.

Deleting Knowledge

To remove a knowledge source:
  1. Find it in the Training tab list
  2. Click the delete icon (trash can)
  3. Confirm deletion
  4. Source is removed immediately from knowledge base
Important: Deleting knowledge affects all future conversations. Past conversations won’t change, but new chats won’t have access to that information anymore.

Organizing Your Knowledge

While there’s no folder structure, you can organize by:
  • Using clear, descriptive file names
  • Adding related content in batches
  • Keeping a separate document tracking what you’ve added
  • Deleting outdated content regularly

Knowledge Base Best Practices

Quality Over Quantity

Don’t do this:
  • Upload hundreds of barely relevant documents
  • Add your entire website including footer text and navigation
  • Include duplicate or very similar content
  • Add content “just in case”
Do this instead:
  • Curate high-quality, relevant sources
  • Include core documentation and key resources
  • Remove or don’t include boilerplate/duplicate content
  • Think “What do users actually need to know?”
Why: Too much irrelevant content can actually hurt performance. The agent might retrieve less relevant chunks if there’s too much noise.

Keep Content Fresh

  • Review quarterly: Check if knowledge is still accurate
  • Update when things change: New product features, policy changes, etc.
  • Remove outdated info: Delete deprecated content
  • Refresh URLs: Re-fetch content from living documents

Structure Matters

Good knowledge sources:
  • Well-organized with clear headings
  • Use bullet points and lists
  • Have logical flow
  • Include examples and specifics
Poor knowledge sources:
  • Wall of text with no structure
  • Overly vague or general
  • Lots of irrelevant tangents
  • Poorly formatted (weird spacing, encoding issues)

Match Your Use Case

For Q&A agents:
  • Add FAQs, help docs, policies
  • Include troubleshooting guides
  • Add product documentation
For research agents:
  • Add research papers and reports
  • Include industry analysis
  • Add expert content and thought leadership
For task-oriented agents:
  • Add process documentation
  • Include how-to guides
  • Add standard operating procedures

Test Your Knowledge

After adding knowledge, test if the agent can retrieve it:
  1. Ask direct questions from the content
  2. Ask questions that require combining multiple sources
  3. Try edge cases or less obvious questions
  4. Check if sources are cited correctly
If retrieval isn’t working:
  • Question may not match terminology in knowledge
  • Content may be too scattered or vague
  • May need more (or different) context
  • Try rephrasing the question

Troubleshooting Knowledge Issues

Symptoms: Agent gives generic answers instead of using uploaded contentPossible causes:
  1. Knowledge still processing (check for status indicator)
  2. Question doesn’t semantically match content
  3. System prompt doesn’t encourage knowledge use
  4. Content is too vague or poorly structured
Solutions:
  • Wait for all files to finish processing
  • Ask questions more directly related to your content
  • Update system prompt: “Always search your knowledge base first”
  • Restructure content with clear headings and sections
  • Try asking: “What do you know about [topic from your knowledge]?”
Symptoms: Agent cites sources but they’re not relevant to the questionPossible causes:
  1. Knowledge base has too much content
  2. Multiple sources with similar but different info
  3. Content lacks clear topic markers
  4. Semantic search matching wrong chunks
Solutions:
  • Remove less relevant sources
  • Add more specific/targeted knowledge
  • Use clearer headings in source documents
  • Be more specific in questions
  • Consider splitting large documents into focused pieces
Symptoms: File upload never completes or shows errorPossible causes:
  1. File too large (>25MB limit)
  2. File format not supported
  3. File is corrupted or password-protected
  4. Network connection issue
Solutions:
  • Check file size (compress or split if too large)
  • Convert to supported format (PDF, DOCX, TXT)
  • Remove password protection
  • Try uploading again with stable connection
  • For large documents, try URL if available online
Symptoms: Error when adding YouTube videoPossible causes:
  1. Video doesn’t have captions/transcripts
  2. Video is private or age-restricted
  3. Captions are disabled by creator
  4. Invalid YouTube URL
Solutions:
  • Check if video has captions (watch on YouTube first)
  • Use public, unrestricted videos
  • Ensure URL is correct YouTube format
  • Try a different video if captions unavailable
Symptoms: Can’t add Google Workspace contentPossible causes:
  1. Sharing settings not set to “Anyone with the link”
  2. Document is private
  3. Requires authentication to access
  4. Invalid share link
Solutions:
  • Change sharing to “Anyone with the link can view”
  • Copy the full sharing URL (should have /edit or /view)
  • Make sure document isn’t restricted to your organization
  • Test link in incognito browser to verify public access
Answer: As a builder, when you test your knowledge agent, you can see knowledge retrieval:
  • Look for [file search] indicator in responses
  • Agent may cite sources in its answer
  • Some responses show which documents were referenced
For end users, citations depend on how you’ve prompted the agent. You can encourage citations in system instructions: “Always cite which document you used.”

Knowledge Base Limits

Per agent:
  • No hard limit on number of sources
  • Recommended: 50-100 high-quality sources for best performance
  • Each file limited to 25MB
Processing:
  • Files process individually (can upload multiple at once)
  • URLs are processed on-demand
  • Large knowledge bases may have slightly slower retrieval
Storage:
  • Knowledge is stored in vector database
  • Counts toward your plan’s storage limits
  • Deleted knowledge is removed from storage

Advanced Tips

Create a “master FAQ” documentInstead of uploading 20 separate PDFs, create one well-structured FAQ document with all common questions. Use clear headings like ”## Pricing Questions” and ”## Feature Questions”. This helps retrieval accuracy.
Use knowledge categoriesName your files descriptively and consider prefixes:
  • “[POLICY] Refund Policy.pdf”
  • “[GUIDE] Getting Started Guide.pdf”
  • “[FAQ] Common Questions.pdf”
This helps both you and the agent understand context.
Test with “knowledge audit” questionsAfter adding knowledge, ask: “What do you know about [topic]?” or “What information do you have about [subject]?” This shows you what the agent can access.
Combine sources for depthFor comprehensive topics, add multiple source types:
  • Documentation (files)
  • Tutorial video (YouTube)
  • FAQ page (URL)
  • Expert thread (Twitter)
This gives the agent multiple perspectives and formats.
Keep a knowledge changelogTrack what you’ve added and when. This helps you:
  • Remember what’s in the knowledge base
  • Know when content was last updated
  • Identify gaps in coverage
  • Plan future additions

Next Steps

Now that you understand the knowledge base system:
Remember: Your knowledge base is living and evolving. Start with core content, test with real questions, and continuously refine based on what works. Quality, relevance, and organization matter more than quantity.