Skip to content

URL Converter processor

Description

The URL Converter processor gets plain text from a Web page.

Input

The processor requires the input JSON to contain this top level key:

"url": "url"

where:

  • url (string): URL of the Web page to get text from.

At runtime NL Flow will use an Internet connection to download the page.

Block properties

Block properties can be set by editing the block.
URL Converter workflow blocks have the following properties:

  • Basic properties:

    • Block name, it can be edited
    • Component version (read only)
    • Block ID (read only)
  • Deployment:

    • Replicas: number of required instances.
    • Memory: required memory.
    • CPU: thousandths of a CPU required (for example: 1000 = 1 CPU).
  • Type Specific:

    • Timeout: execution timeout expressed in minutes (m) or seconds (s).
  • Input

    These properties correspond to the top level key of the input JSON.
    They need to be set only when input mapping is necessary.

  • Output: read-only, this property is a navigable description of the structure of the output array.

Output and output-input mapping

The output of a URL Converter block is a JSON object with the following structure:

{
    content: extracted text,
    description: value of the description <meta> tag,
    domain: domain name,
    image: URL of a descriptive image found in the page
    keywords: value of the keywords <meta> tag,
    language: ISO 639-1 code of the page language,
    title: page title,
    url: page URL
}

For example:

{
    content:"Saturday, January 29, 2022 - The incumbent President of Italy Sergio Mattarella was re-elected for a second seven-year term yesterday in the eighth round of voting for a potential successor.
    Aged 80, Mattarella repeatedly expressed his desire to leave the position, including renting an apartment in Rome in anticipation of a move from the presidential Quirinal Palace (Quirinale). However, he relented after key figures, including Prime Minister Mario Draghi , urged him to stay on for the "stability" of the Republic. His first term was set to expire on February 3.
    Parliamentarians who went to Quirinale to ask him to remain quoted Mattarella as saying "I had other plans, but if needed, I am at your disposition". Seven rounds of fruitless voting to determine a successor involved an electoral college of 1009 "grand electors". They comprise 321 Senators , 630 Members of the Chamber of Deputies (MPs) and 58 regional delegates.",
    description:"",
    domain:"en.wikinews.org",
    image:"https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Italy_%28orthographic_projection%29.svg/1200px-Italy_%28orthographic_projection%29.svg.png",
    keywords:"",
    language:"en",
    title:"Italian President Sergio Mattarella re-elected for second term, ending successor row",
    url:"https://en.wikinews.org/wiki/Italian_President_Sergio_Mattarella_re-elected_for_second_term,_ending_successor_row"
}

Typically, in the workflow, the URL Converter block is followed by a model block or more model blocks in parallel.
In these cases, inside the model block's configuration, the content property of the URL Converter block's output must be mapped to the text input property.