Uploaded image for project: 'UGENE'
  1. UGENE
  2. UGENE-6686

Correct sequence "common statistics"

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 33
    • Fix Version/s: 35
    • Component/s: Basic-Nucl
    • Labels:
      None
    • Story Points:
      3
    • Sprint:
      DEV-34-8, DEV-35-1, DEV-34-9, DEV-34-RELEASE
    • Affect Type:
      Userdefined

      Description

      Parameters names

      On English the parameters should be named as follows:

      • Length
      • GC content
      • Molecular weight (dsDNA)
      • Molecular weight (ssDNA)
      • Extinction coefficient (dsDNA)
      • Extinction coefficient (ssDNA)
      • Melting temperature
      • nmole/OD260 (dsDNA)
      • nmole/OD260 (ssDNA)
      • μg/OD260 (dsDNA)
      • μg/OD260 (ssDNA)
        (show "260" in subscript)

      "dsDNA"/"ssDNA" should be "dsRNA"/"ssRNA" in case of RNA alphabets.

      On Russian the parameters should be named:

      • Длина
      • GC-состав
      • Молекулярная масса (дцДНК)
      • Молекулярная масса (оцДНК)
      • Коэффициент экстинкции (дцДНК)
      • Коэффициент экстинкции (оцДНК)
      • Температура плавления
      • nmole/OD260 (дцДНК)
      • nmole/OD260 (оцДНК)
      • μg/OD260 (дцДНК)
      • μg/OD260 (оцДНК)

      (show "260" in subscript)

      " дцДНК"/"оцДНК" should be "дцРНК"/"оцРНК" in case of RNA alphabets.

      Signs near the parameters values

      Display the signs described below near the corresponding values. Use an extra space character (" ") between a value and a sign.

      • For "Length" use "nt" (on Russian - "нт") for nucleotide alphabet, "aa" ("аа") for amino acid alphabets, nothing otherwise.
      • For "GC content" use "%" sign.
      • For molecular weight parameters use "Da" (on Russian - "Да").
      • For extinction coefficient use "l/(mol * cm)" (on Russian - "л/(моль * см)")
      • For "Melting temperature" use "ºC".

      GC content

      This should be available for nucleotide alphabets only and calculated as follows:

      ((nG + nC + nS + 0.5*nM + 0.5*nK + 0.5*nR + 0.5*nY + (2/3)*nB + (1/3)*nD + (1/3)*nH + (2/3)*nV + 0.5*nN) / n ) * 100%
      

      Here nX is the number of corresponding character X in the sequence, n is the general number of characters. Gaps should be ignored.

      Calculation of molecular weight (ssDNA, ssRNA)

      For DNA alphabets:

      nA*wA + nT*wT + ... + nS*wS + ... + nD*wD + ... + nN*wN + (n-1)*61.97
      

      Here:

      • wA = weight of A = 251.24
      • wT = 242.23
      • wC = 227.22
      • wG = 267.24
      • For characters of the extended alphabet, an average value is used, e.g.:
        • wS = (wC + wG) / 2
        • wD = (wA + wG + wT) / 3
        • wN = (wA + wT + wC + wG) /4
      • nA, nT, nC, nG - number of characters "A", "T", "C" and "G" in the sequence correspondingly
      • n - number of all characters
        Note that gap characters are skipped.

      For RNA alphabets:

      Similarly, as for DNA alphabets:

      nA*wA + nU*wU + ... + nN*wN + (n-1)*61.97
      

      However, use the following weight values:

      • wA = 267.24
      • wU = 244.20
      • wC = 243.22
      • wG = 283.24

      Other alphabets:

      The parameter is not available for other alphabets (amino acid, raw).

      Calculation of molecular weight (dsDNA, dsRNA)

      Molecular weight of a double-stranded DNA is equal to the sum of molecular weights of the corresponding single-stranded DNA sequences.

      Although, RNA commonly has a single strand only, there might be exception (e.g. some RNA viruses), so in UGENE molecular weight (dsRNA) should be calculated also (similarly to DNA, this is sum of two ssRNA molecular weights).

      Calculation of extinction coefficient (ssDNA, ssRNA)

      For DNA alphabets:

      If we have only sequence from A, C, G, T characters, the formula is:

      sum(e of all dinucleotides) - sum(e of all INNER nucleotides)
      

      Here values "e for dinucleotides" and "e of nucleotides" should be used from the DNA table from here: http://www.owczarzy.net/extinctionDNA.htm

      • eAA = 27400
      • eAC = 21200
      • ...
      • eA = 15400
      • ...
      • eT = 8700

      Example:

      E(ATGCA) = E(AT) + E(TG) + E(GC) + E(CA) - E(T) - E(G) - E(C) = 22800 + 19000 + 17600 + 21200 - 8700 - 11500 - 7400 = 53000
      

      For extended DNA alphabet average values should be used. For example:

      eMR = e(A or C, A or G) = (eAA + eAG + eCA + eCG)/4
      eM = e(A or C) = (eA + eC)/2
      
      eNN = (eAA + eAC + ... + eTT)/16
      

      For RNA alphabets

      Use the same formula as for DNA, except use the values from the RNA table from the same page.

      • eAA = 27400
      • eAC = 21000
      • ...
      • eU = 9900

      Calculation of extinction coefficient (dsDNA, dsRNA)

      For a double-stranded DNA or RNA sequence the molar extinction coefficient is calculated as a sum of single-stranded extinction coefficients multiplied by a "hypochromicity":

      (eSEQ + eCOMPL_SEQ) * (1 - h)
      

      Here:

      h = 0.287 * SEQ_AT-content + 0.059 * SEQ_GC-content
      

        Attachments

          Activity

            People

            Assignee:
            atiunov Aleksey Tiunov [X] (Inactive)
            Reporter:
            oigl Olga Golosova
            Assigned Tester:
            Dmitrii Sukhomlinov
            Watchers:
            0 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: