Upload and download documents
Upload documents
To upload additional documents in a library:
- In the documents list view of the Documents tab, select Upload documents .
- Select Add files.
- Select the files to upload.
- In the Settings tab:
- Switch on Pdf document view to process with the expert.ai Extract technology and view the PDF files in order to work on them in their original format. If not switched, all the documents are converted with the Apache Tika toolkit and used as txt files.
- Select or deselect Enable or disable table and title detection to enable or disable table and title detection
- Select or deselect Enable or disable OCR extraction to enable or disable OCR extraction
- Switch off Autodetect language to disable automatic language detection. Select the preferred language manually from the Select language drop-down list.
- Switch off Autodetect encoding to disable the automatic character encoding detection. Select manually the preferred encoding from the Select encoding drop-down list.
- Select Save as corpus to save your documents as a corpus and type a name for it.
- In the Documents tab:
- Display the documents and folders list to upload. They can be deleted by clicking on the X button at the right of the file name.
- In the Settings tab:
-
Select Upload.
Note
To update the number of documents select Reload in the lower right corner notification.
Or:
- Upload documents from the library dashboard
Supported formats and limits
Supported document formats are those managed by the Apache Tika toolkit. Documents are automatically converted to plain text files during upload.
Documents are ignored if:
- They are empty.
- They mainly consist of nonsense words.
- Their language is unrecognized or not supported (in case of automatic language recognition).
-
They exceed the following values:
- 200 MB for
.zip
files. - 1 MB for
.txt
files. - 100 MB for other file types.
- 200 MB for
At most the first 50KB of text is considered for each document.
Download documents
- In the list view of the Documents tab, in toolbar to the right, select Export .
- In the Export documents window, Export panel, enter the filename or confirm the suggested one and confirm the Extension.
- Documents are pre-processed when uploaded and spacing is affected. For example, a sequence of empty lines is compressed to one. If you want to export documents with their original spacing, check Include original text spaced documents.
-
Select the Export filter to filter the document set to export:
- Confirm All documents to download the whole corpus (set by default).
Or:
-
Select Filtered documents to download a filtered list of the documents according to the following available filters:
- Documents with annotations.
- Documents with extractions.
- Current list of filtered documents.
- Select Export.
- In the Download tab, or in the notification in the lower right corner, select Download to download a documents set of your interest.