Uploaded image for project: 'UGENE'
  1. UGENE
  2. UGENE-6036

Add "Improve Classification with WEVOTE" workflow element

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: virogenesis
    • Fix Version/s: 1.31
    • Component/s: NGS, Workflow
    • Labels:
    • Story Points:
      8
    • Epic Link:
    • Sprint:
      DEV-30-4
    • Affect Type:
      Userdefined

      Description

      Element name and description

      • Name of the element: "Improve Classification with WEVOTE"
      • Description of the element on the Scene: "Ensemble classification data, produced by other tools."
      • Description of the element in the Property Editor:
        "WEVOTE (WEighted VOting Taxonomic idEntification) is a metagenome shortgun sequencing DNA reads classifier based on an ensemble of other classification methods (Kraken, CLARK, etc.)."

      Input data

      There is one input port:

      Item Value
      Port name in GUI Input classification CSV file
      Port description Input a CSV file in the following format:
      1) a sequence name
      2) taxID from the first tool
      3) taxID from the second tool
      4) etc.
      Port ID in UWL in
      Number of slots 1
      Slot #1 name in GUI Input URL
      Slot #1 ID in UWL url
      Slot #1 data type String

      Output data

      There is one output port:

      Item Value
      Port name in GUI WEVOTE-classified sequences
      Port description A map of sequence names with the associated taxonomy IDs.
      Port ID in UWL out
      Number of slots 1
      Slot #1 name in GUI Taxonomy classification data
      Slot #1 ID in UWL tax-data
      Slot #1 data type tax-classification

      Parameters

      # Parameter Description Value in GUI Default value
      1 Penalty Score penalty for disagreements (-k) A spin box with integer values. 2
      2 Number of agreed tools Specify the minimum number of tools agreed on WEVOTE decision (-a). A spin box with 32-bit integer values >= 0. 0
      3 Score threshold Score threshold (-s) A spin box with 32-bit integer values >= 0. 0
      4 Number of threads Use multiple threads (-n). A spin box with values from 1 to the number of available cores. Use the value from the Application Settings (the "Optimize for CPU count" option).
      5 Output file Specify the output text file name. A line edit with the browse button. The value is mandatory ("Required"). Auto (this equals to "input_file_name_WEVOTE_Details.txt"

      Data processing by the element

      • The element takes a CSV file as input like the output file from the "Ensemble Classification Data" workflow element (see UGENE-6035).
      • Use the common taxonomy data that goes with the framework.
      • Launch the WEVOTE executable with the specified parameters.
      • Rename the output file to the specified name. By default the name is generated from the input file name, for example "HC1_WEVOTE_Details.txt"
      • This file should appear on the WD dashboard.
      • Parse the last column of the output file and create a new classification data map. Send it to the output port of the element.

      Sample data

      See, for example, "HC1_ensemble.csv" and "HC1_WEVOTE_Details.txt" files on the file server (in the ".../virogenesis/tools_testing/wevote_without_classifiers" folder). The second file was produced from the first one by running:

      ./WEVOTE -i HC1_ensemble.csv -d ./taxonomy -p HC1 -n 4
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              atiunov Aleksey Tiunov [X] (Inactive)
              Reporter:
              oigl Olga Golosova
              Assigned Tester:
              Eugenia Pushkova [X] (Inactive)
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: