Skip to content

URL Converter processor

Description

The URL Converter processor gets plain text from a Web page.
At runtime, the runtime in which the workflow is published will try and use an Internet connection to download the page.

Input

A URL Converter block has this input variable:

  • url (string): URL of the Web page to get text from

Block properties

Block properties can be set by editing the block.
URL Converter workflow blocks have the following properties:

  • Basic properties:

    • Block name, it can be edited
    • Component version (read only)
    • Block ID (read only)
  • Deployment:

    • Replicas: number of required instances.
    • Memory: required memory.
    • CPU: thousandths of a CPU required (for example: 1000 = 1 CPU).
  • Type Specific:

    • Timeout: execution timeout expressed in minutes (m) or seconds (s).
  • Input: this property corresponds to the input variable.
    The input property is read-only—so only descriptive of the expected input—when the block is the first in a flow and the workflow's input has not been explicitly described. In that case the workflow's input JSON must contain a key whose name and type matches those of the input variable.
    Otherwise, it is editable and must be set.

  • Output: read-only, this property is a navigable description of the structure of the output JSON.

Output and output-input mapping

The output of a URL Converter block is a JSON object with the following structure:

{
    content: extracted text,
    description: value of the description <meta> tag,
    domain: domain name,
    image: URL of a descriptive image found in the page
    keywords: value of the keywords <meta> tag,
    language: ISO 639-1 code of the page language,
    title: page title,
    url: page URL
}

For example:

{
    content:"Saturday, January 29, 2022 - The incumbent President of Italy Sergio Mattarella was re-elected for a second seven-year term yesterday in the eighth round of voting for a potential successor.
    Aged 80, Mattarella repeatedly expressed his desire to leave the position, including renting an apartment in Rome in anticipation of a move from the presidential Quirinal Palace (Quirinale). However, he relented after key figures, including Prime Minister Mario Draghi , urged him to stay on for the "stability" of the Republic. His first term was set to expire on February 3.
    Parliamentarians who went to Quirinale to ask him to remain quoted Mattarella as saying "I had other plans, but if needed, I am at your disposition". Seven rounds of fruitless voting to determine a successor involved an electoral college of 1009 "grand electors". They comprise 321 Senators , 630 Members of the Chamber of Deputies (MPs) and 58 regional delegates.",
    description:"",
    domain:"en.wikinews.org",
    image:"https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Italy_%28orthographic_projection%29.svg/1200px-Italy_%28orthographic_projection%29.svg.png",
    keywords:"",
    language:"en",
    title:"Italian President Sergio Mattarella re-elected for second term, ending successor row",
    url:"https://en.wikinews.org/wiki/Italian_President_Sergio_Mattarella_re-elected_for_second_term,_ending_successor_row"
}

Typically, in the workflow, the URL Converter block is followed by a model block or more model blocks in parallel.
In these cases, inside the model block's configuration, the content property of the URL Converter block's output must be mapped to the text input variable.