Upload, delete, and export documents
Upload additional documents
To upload additional documents in the corpus:
- In the Documents tab, list view, select Upload documents from the toolbar to the right.
- Select Add files then the files. You can upload single files from a folder or a set of zipped files.
- In the Settings tab:
- Switch on Pdf document view to process with the expert.ai Extract technology and view the PDF files in order to work on them in their original format. If not switched all the documents are converted with the Apache Tika toolkit and used as txt files.
- Select or deselect Enable or disable table and title detection to enable or disable table and title detection.
- Select or deselect Enable or disable OCR extraction to enable or disable OCR extraction.
- Switch off Autodetect language to disable automatic language detection. Select the preferred language manually from the Select language drop-down list.
- Switch off Autodetect encoding to disable the automatic character encoding detection. Select manually the preferred encoding from the Select encoding drop-down list.
- In the Documents tab:
- Display the document and folder list to upload. They can be deleted by clicking on the X button at the right of the file name.
- Select Upload.
Supported formats and limits
Supported document formats are those managed by the Apache Tika toolkit. Documents are automatically converted to plain text files during upload.
Documents are ignored if:
- They are empty.
- They mainly consist of nonsense words.
- Their language is unrecognized or not supported (in case of automatic language recognition).
They exceed the following values:
- 50MB for
- 50MB for
- 50MB for other file types.
- 50MB for
Delete a document in the Documents tab, list view
To delete a document in the list view of the Documents tab, select the document, then select Delete document on the toolbar .
Delete a document in the Documents tab, detail view
To delete a document in the detail view of the Documents tab, select the ellipsis above the document, then select Delete document .
When uploading or deleting documents, you will be asked to reload the corpus in the lower right corner to have the updated number of documents.
To export the corpus in
- In the Documents tab, list view, select Export from the toolbar to the right.
- In the Export documents window, Export tab, enter the filename in Filename or confirm the suggested one.
- Documents are pre-processed when uploaded and spacing is affected. For example, a sequence of empty lines is compressed to one. If you want to export documents with their original spacing, check Include original text spaced documents.
Select the Export filter to filter the documents set to export:
- Confirm All documents to download the whole corpus. (Set by default).
- Select Current list of filtered documents to download a previously filtered list of the documents.
- In the Download tab or in the notification in the lower right corner, select Download.
Copy a document to the clipboard
To copy the document text to the clipboard, in detail view, select the ellipsis above the document, then select Copy to clipboard.