Skip to content

URL Converter

Description

The URL Converter processor gets plain text from a Web page.

Block properties

URL Converter workflow blocks have the following properties:

  • Common block properties:

    • The unique block ID.
    • Block name: the block name, it can be edited.
    • Description: the description of the processor (read only).
  • Type Specific tab:

    • Timeout: execution timeout expressed in minutes (m) or seconds (s).
  • Deployment tab:

    • Replicas: number of required instances (3 maximum)
    • Memory: required memory
    • CPU: thousandths of a CPU required (for example: 1000 = 1 CPUs)

Input

Read the NL Flow API manual for the description of the JSON object to submit to a URL Converter block.

Output and output-input mapping

The output of a URL Converter block is a JSON object with the following structure:

{
    content: extracted text,
    description: value of the description <meta> tag,
    domain: domain name,
    image: URL of a descriptive image found in the page
    keywords: value of the keywords <meta> tag,
    language: ISO 639-1 code of the page language,
    title: page title,
    url: page URL
}

For example:

{
    content:"Saturday, January 29, 2022 - The incumbent President of Italy Sergio Mattarella was re-elected for a second seven-year term yesterday in the eighth round of voting for a potential successor.
    Aged 80, Mattarella repeatedly expressed his desire to leave the position, including renting an apartment in Rome in anticipation of a move from the presidential Quirinal Palace (Quirinale). However, he relented after key figures, including Prime Minister Mario Draghi , urged him to stay on for the "stability" of the Republic. His first term was set to expire on February 3.
    Parliamentarians who went to Quirinale to ask him to remain quoted Mattarella as saying "I had other plans, but if needed, I am at your disposition". Seven rounds of fruitless voting to determine a successor involved an electoral college of 1009 "grand electors". They comprise 321 Senators , 630 Members of the Chamber of Deputies (MPs) and 58 regional delegates.",
    description:"",
    domain:"en.wikinews.org",
    image:"https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Italy_%28orthographic_projection%29.svg/1200px-Italy_%28orthographic_projection%29.svg.png",
    keywords:"",
    language:"en",
    title:"Italian President Sergio Mattarella re-elected for second term, ending successor row",
    url:"https://en.wikinews.org/wiki/Italian_President_Sergio_Mattarella_re-elected_for_second_term,_ending_successor_row"
}

Typically, in the workflow, the URL Converter block is followed by a model block or more model blocks in parallel.
In these cases, inside the model block's configuration, the content property of the URL Converter block's output must be mapped to the text input property.