Skip to content

Upload, delete, and export documents

Upload additional documents

To upload additional documents in the corpus:

  1. In the Documents tab, list view, select Upload documents from the toolbar to the right.

    • Select Show advanced settings:

      • If you want to disable automatic language detection: turn off Autodetect language and choose the language from the Select language drop-down list.
      • If you want to disable automatic character encoding detection: turn off Autodetect encoding and choose the encoding from the Select encoding drop-down list.
      • When done, select Hide advanced settings.
  1. Select Add files.
  2. Select the files to upload. They will be displayed in a list and can be deleted by clicking on the X button at the right of each file name.
  3. Select Upload.
  4. In the notification in the lower right corner, select Reload corpus.

Supported formats and limits

Supported document formats are those managed by the Apache Tika toolkit. Documents are automatically converted to plain text files during upload.

Documents are ignored if:

  • They are empty.
  • They mainly consist of nonsense words.
  • (In case of automatic language recognition) Their language is unrecognized or not supported.
  • They exceed the following values:

    • 50MB for .zip files.
    • 50MB for .txt files.
    • 50MB for other file types.

Delete documents

Delete a document in the Documents tab, list view

To delete a document in the list view of the Documents tab, select the document—multiple selection with Ctrl-click or shift-click is not allowed—then select Delete document on the toolbar.

Delete a document in the Documents tab, detail view

To delete a document in the detail view of the Documents tab, select the ellipsis above the document, then select Delete document .

Note

When uploading or deleting documents, you will be asked to reload the corpus in the lower right corner to have the updated number of documents.

Export documents

To export the corpus in .zip format:

  1. In the Documents tab, list view, select Export from the toolbar on the right.
  2. In the Export documents window, Export tab, enter the filename in Filename or confirm the suggested one.
  3. Documents are pre-processed when uploaded and spacing is affected. For example, a sequence of empty lines is compressed to one. If you want to export documents with their original spacing, check Include original text spaced documents.
  4. Select the Export filter to filter the document set to export:

    • Confirm All documents to download the whole corpus. (Set by default).

    Or:

    • Select Current list of filtered documents to download a previously filtered list of the documents.
  5. Select Export.

  6. In the Download tab or in the notification in the lower right corner, select Download.

Copy a document to the clipboard

To copy the document text to the clipboard, in detail view, select the ellipsis above the document, then select Copy to clipboard.