# File

## Upload files to EbbotGPT Knowledge

The File source type lets you upload files such as CSV, DOCX and PDF files for your AI agents to use as knowledge.&#x20;

<figure><img src="https://2117387010-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F3rWESGvwA3vHJ3zNiAG1%2Fuploads%2FzCClPluWHQ4zkea8bFFm%2Fimage.png?alt=media&#x26;token=0efa20ca-6b8c-4061-ab48-e644029190d4" alt=""><figcaption></figcaption></figure>

## Settings for source type: File

Note: these settings are not available for all file types. Continue to the specific file type you are using to learn what settings that are available and how to use them.

In the settings you are able to specify which fields to be used and which field is to be used as an ID column. You can also define some advanced settings like Separator and quoteChar.&#x20;

**Format**

Select which file type you are uploading. Advanced options are only available for CSV type files.

**Id Field**

Please note that uploaded files need to have an ID-field in order to create documents. It can be any field in your file provided it has unique values per source.

**Searchable Fields**

These are the fields the embedder will search and find for EbbotGPT to generate answers from. When an end user asks a question, this is the information EbbotGPT will have available. Here you will add all field names from your file that you want to include in the documents created from this source.

**Stringified Fields**

If specified, only the Stringified fields will be used to generate responses and be visible in the "Show sources" panel in the widget. Searchable fields will still be used for the embedder to find documents, but responses will only be generated from the fields specified here. If not specified, all fields specified in searchable fields will be included.

### Settings for source type: CSV

Note: these settings are only available when handling data in a CSV file. These settings enables you to specify the following:&#x20;

**Delimiter**

The delimiter is a character used to separate individual fields within a record in your file. Commonly used delimiters include commas (`,`), semicolons (`;`), (`\t`). Choose the delimiter that matches the format of the file you are uploading to ensure that the data is parsed correctly into distinct fields. Comma (`,`) is used by default.

**QuoteChar**

The quote character (quoteChar) is used to encapsulate fields that contain delimiters or special characters, ensuring they are treated as single units. Double quotes (`"`) are commonly used and set as default, but you can specify a different character if your data requires it. Enclosing complex fields in quotes prevents parsing errors during the upload process.

**Encoding**

Encoding determines how characters in your file are interpreted and stored. UTF-8 is the default encoding, supporting a wide range of characters from different languages. If your file uses a different character set, adjust the encoding setting accordingly to preserve the integrity of the data during conversion.

### Settings for source type: JSON/NDJSON

In the settings you are able to specify which fields to be used and which field is to be used as an ID column.

**Id Field**

Please note that uploaded files need to have an ID-field in order to create documents. It can be any field in your file provided it has unique values per source.

**Searchable fields**

These are the fields the embedder will search and find for AI agent to generate answers from. When an end user asks a question, this is the information the AI agent will have available. Here you will add all field names from your file that you want to include in the documents created from this source.

**Stringified fields**

If specified, only the Stringified fields will be used to generate responses and be visible in the "Show sources" panel in the widget. Searchable fields will still be used for the embedder to find documents, but responses will only be generated from the fields specified here. If not specified, all fields specified in searchable fields will be included.

### Settings for source type: PDF

There are no settings for using the PDF-scraper.

Our PDF scraper utilizes a sophisticated four-step pipeline designed to handle everything from standard digital documents to "broken" or complex files.

The purpose of this four-tiered architecture is to maximize data recovery and accuracy across a wide variety of document formats. By combining an advanced primary engine with automated repair and reliable fallback layers, the system ensures that even the most difficult files are successfully converted into structured, high-quality text.

The extraction workflow:

1. **Primary extraction**: Our lead engine for high-fidelity text and structural data.
2. **Auto-repair & retry**: If digital font errors are detected, the system automatically repairs the file and restarts the primary extraction.
3. **High-reliability backup**: If quality remains low, the system switches to a secondary reader to extract text while "borrowing" the original headings and structure.
4. **Final recovery**: A last-resort attempt for the most difficult files to ensure no document is left unread, while still maintaining basic organization.

### Settings for source type: DOCX

Use the docx file option to convert Word documents (.docx) into EbbotGPT Knowledge. You can upload a Word document directly or provide a URL to your file through the interface.

<figure><img src="https://2117387010-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F3rWESGvwA3vHJ3zNiAG1%2Fuploads%2FQL8Vr7FLgOXoVTRBRDx0%2Fimage.png?alt=media&#x26;token=dfcf84e1-6884-4916-8d44-409bcc4e0732" alt=""><figcaption></figcaption></figure>

**Conversion Process**

The system uses [Pandoc](https://pandoc.org/) to convert the Word document into a Markdown file, which is then integrated into EbbotGPT Knowledge as split documents. This makes the content searchable, usable for generating responses and e.g. displaying them as sources in the chat widget whenever the AI agent uses the content in its responses.

### Settings for source type: Markdown (MD)

Ebbot handles Markdown files automatically without any configuration. Just upload your file, and the system will manage the import process for you.

To ensure that your AI agent provides the most accurate answers, we don't just "read" your files, we organize them. We use Intelligent Heading Alignment to break down your documents.

Instead of cutting text at a specific word count, which often splits a sentence or a paragraph in half, we split documents at logical headers (like headings or section names). This ensures the AI agent always captures the full context and never misses important information.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ebbot.ai/ebbot-docs/core-capabilities/ebbotgpt/ebbotgpt-knowledge/source-types/file.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
