Uploaded image for project: 'UGENE'
  1. UGENE
  2. UGENE-1020

Revise multiple alignment similarity/dissimilarity measure

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.11.1
    • Component/s: Basic-MSA
    • Labels:
      None
    • Affect Type:
      Userdefined

      Description

      Currently the multiple alignment similarity measure is incorrect.
      Hamming distance measures a dissimilarity of two sequences and means "How many substitutions is needed to get one sequence from another".

      There must be two distance algorithms: 1) "Hamming distance" for dissimilarity and 2) "Simple similarity" for similarity.
      They use the following weight schemes:
      1)
      w("A", "T") = 1
      w("A", "-") = w ("-", "A") = 0 or 1 (depends on "Exclude gaps" option that will be added in the dialog)
      w("-", "-") = 0
      w("A", "A") = 0

      2) w("A", "T") = 0
      w("A", "-") = w ("-", "A") = 0
      w("-", "-") = 0
      w("A", "A") = 1

      A measure is a total weight of all pairs of characters in two sequences. It is recommended to align sequences to get a better value of a measure.

      There are two ways to show the measure: pure weight value and similarity/dissimilarity estimation in percent. In percentage case, the value must be calculated as weight value divided on min(len1, len2), where len1 is a number of non-gap characters in the first sequence and len2 is a number of non-gap characters in the second sequence.

      Also the distance matrix view must be revised. It must show similarity or dissimilarity depending on algorithm chosen.

        Attachments

          Activity

            People

            Assignee:
            vaskin Yura Vaskin
            Reporter:
            vaskin Yura Vaskin
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: