File
Learn how to upload and manage files to power your AI agents' knowledge.
Upload files to EbbotGPT Knowledge
The File source type lets you upload files such as CSV, DOCX and PDF files for your AI agents to use as knowledge.

Settings for source type: File
Note: these settings are not available for all file types. Continue to the specific file type you are using to learn what settings that are available and how to use them.
In the settings you are able to specify which fields to be used and which field is to be used as an ID column. You can also define some advanced settings like Separator and quoteChar.
Format
Select which file type you are uploading. Advanced options are only available for CSV type files.
Id Field
Please note that uploaded files need to have an ID-field in order to create documents. It can be any field in your file provided it has unique values per source.
Searchable Fields
These are the fields the embedder will search and find for EbbotGPT to generate answers from. When an end user asks a question, this is the information EbbotGPT will have available. Here you will add all field names from your file that you want to include in the documents created from this source.
Stringified Fields
If specified, only the Stringified fields will be used to generate responses and be visible in the "Show sources" panel in the widget. Searchable fields will still be used for the embedder to find documents, but responses will only be generated from the fields specified here. If not specified, all fields specified in searchable fields will be included.
Settings for source type: CSV
Note: these settings are only available when handling data in a CSV file. These settings enables you to specify the following:
Delimiter
The delimiter is a character used to separate individual fields within a record in your file. Commonly used delimiters include commas (,), semicolons (;), (\t). Choose the delimiter that matches the format of the file you are uploading to ensure that the data is parsed correctly into distinct fields. Comma (,) is used by default.
QuoteChar
The quote character (quoteChar) is used to encapsulate fields that contain delimiters or special characters, ensuring they are treated as single units. Double quotes (") are commonly used and set as default, but you can specify a different character if your data requires it. Enclosing complex fields in quotes prevents parsing errors during the upload process.
Encoding
Encoding determines how characters in your file are interpreted and stored. UTF-8 is the default encoding, supporting a wide range of characters from different languages. If your file uses a different character set, adjust the encoding setting accordingly to preserve the integrity of the data during conversion.
Settings for source type: JSON/NDJSON
In the settings you are able to specify which fields to be used and which field is to be used as an ID column.
Id Field
Please note that uploaded files need to have an ID-field in order to create documents. It can be any field in your file provided it has unique values per source.
Searchable fields
These are the fields the embedder will search and find for AI agent to generate answers from. When an end user asks a question, this is the information the AI agent will have available. Here you will add all field names from your file that you want to include in the documents created from this source.
Stringified fields
If specified, only the Stringified fields will be used to generate responses and be visible in the "Show sources" panel in the widget. Searchable fields will still be used for the embedder to find documents, but responses will only be generated from the fields specified here. If not specified, all fields specified in searchable fields will be included.
Settings for source type: PDF
There are no settings for using the PDF-scraper.
Our PDF scraper utilizes a sophisticated four-step pipeline designed to handle everything from standard digital documents to "broken" or complex files.
The purpose of this four-tiered architecture is to maximize data recovery and accuracy across a wide variety of document formats. By combining an advanced primary engine with automated repair and reliable fallback layers, the system ensures that even the most difficult files are successfully converted into structured, high-quality text.
The extraction workflow:
Primary extraction: Our lead engine for high-fidelity text and structural data.
Auto-repair & retry: If digital font errors are detected, the system automatically repairs the file and restarts the primary extraction.
High-reliability backup: If quality remains low, the system switches to a secondary reader to extract text while "borrowing" the original headings and structure.
Final recovery: A last-resort attempt for the most difficult files to ensure no document is left unread, while still maintaining basic organization.
Settings for source type: DOCX
Use the docx file option to convert Word documents (.docx) into EbbotGPT Knowledge. You can upload a Word document directly or provide a URL to your file through the interface.

Conversion Process
The system uses Pandoc to convert the Word document into a Markdown file, which is then integrated into EbbotGPT Knowledge as split documents. This makes the content searchable, usable for generating responses and e.g. displaying them as sources in the chat widget whenever the AI agent uses the content in its responses.
Settings for source type: Markdown (MD)
Ebbot handles Markdown files automatically without any configuration. Just upload your file, and the system will manage the import process for you.
To ensure that your AI agent provides the most accurate answers, we don't just "read" your files, we organize them. We use Intelligent Heading Alignment to break down your documents.
Instead of cutting text at a specific word count, which often splits a sentence or a paragraph in half, we split documents at logical headers (like headings or section names). This ensures the AI agent always captures the full context and never misses important information.
Last updated
Was this helpful?

