Oligonucleotides: RNA
Nucleic Acid Sequences
OpenMS also supports the representation of RNA oligonucleotides using the NASequence
class:
1from pyopenms import *
2
3oligo = NASequence.fromString("AAUGCAAUGG")
4prefix = oligo.getPrefix(4)
5suffix = oligo.getSuffix(4)
6
7print(oligo)
8print(prefix)
9print(suffix)
10print()
11
12print("Oligo length", oligo.size())
13print("Total precursor mass", oligo.getMonoWeight())
14print(
15 "y1+ ion mass of",
16 str(prefix),
17 ":",
18 prefix.getMonoWeight(NASequence.NASFragmentType.YIon, 1),
19)
20print()
21
22seq_formula = oligo.getFormula()
23print("RNA Oligo", oligo, "has molecular formula", seq_formula)
24print("=" * 35)
25print()
26
27isotopes = seq_formula.getIsotopeDistribution(CoarseIsotopePatternGenerator(6))
28for iso in isotopes.getContainer():
29 print("Isotope", iso.getMZ(), ":", iso.getIntensity())
Which will output
AAUGCAAUGG
AAUG
AUGG
Oligo length 10
Total precursor mass 3206.4885302061
y1+ ion mass of AAUG : 1248.2298440331
RNA Oligo AAUGCAAUGG has molecular formula C97H119N42O66P9
===================================
Isotope 3206.4885302061 : 0.25567981600761414
Isotope 3207.4918850439003 : 0.31783154606819153
Isotope 3208.4952398817004 : 0.23069815337657928
Isotope 3209.4985947195 : 0.12306403368711472
Isotope 3210.5019495573 : 0.053163252770900726
Isotope 3211.5053043951 : 0.01956319250166416
The NASequence
object also allows iterations directly in Python:
1oligo = NASequence.fromString("AAUGCAAUGG")
2print(
3 "The oligonucleotide", str(oligo), "consists of the following nucleotides:"
4)
5for ribo in oligo:
6 print(ribo.getName())
Fragment Ions
Similarly to before for amino acid sequences, we can also generate internal fragment ions:
1oligo = NASequence.fromString("AAUGCAAUGG")
2suffix = oligo.getSuffix(4)
3
4oligo.size()
5oligo.getMonoWeight()
6
7charge = 2
8mass = suffix.getMonoWeight(NASequence.NASFragmentType.WIon, charge)
9w4_formula = suffix.getFormula(NASequence.NASFragmentType.WIon, charge)
10mz = mass / charge
11
12print("=" * 35)
13print("RNA Oligo w4++ ion", suffix, "has mz", mz)
14print("RNA Oligo w4++ ion", suffix, "has molecular formula", w4_formula)
Modified Oligonucleotides
Modified nucleotides can also be represented by the Ribonucleotide
class and
are specified using a unique string identifier present in the
RibonucleotideDB
in square brackets. For example, \(\ce{[m1A]}\) represents
1-methyl-adenosine. We can create a NASequence
object by parsing a modified
sequence as follows:
1oligo_mod = NASequence.fromString("A[m1A][Gm]A")
2seq_formula = oligo_mod.getFormula()
3print(
4 "RNA Oligo",
5 oligo_mod,
6 "has molecular formula",
7 seq_formula,
8 "and length",
9 oligo_mod.size(),
10)
11print("=" * 35)
12
13oligo_list = [oligo_mod[i].getOrigin() for i in range(oligo_mod.size())]
14print(
15 "RNA Oligo",
16 oligo_mod.toString(),
17 "has unmodified sequence",
18 "".join(oligo_list),
19)
20
21r = oligo_mod[1]
22r.getName()
23r.getHTMLCode()
24r.getOrigin()
25
26for i in range(oligo_mod.size()):
27 print(oligo_mod[i].isModified())
DNA, RNA and Protein
We can also work with DNA and RNA sequences in combination with the BioPython
library (you can install BioPython with pip install biopython
):
1from Bio.Seq import Seq
2from Bio.Alphabet import IUPAC
3bsa = FASTAEntry()
4bsa.sequence = 'ATGAAGTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCCAGGGGTGTGTTTCGT'
5bsa.description = "BSA Bovine Albumin (partial sequence)"
6bsa.identifier = "BSA"
7
8entries = [bsa]
9
10f = FASTAFile()
11f.store("example_dna.fasta", entries)
12
13coding_dna = Seq(bsa.sequence, IUPAC.unambiguous_dna)
14coding_rna = coding_dna.transcribe()
15protein_seq = coding_rna.translate()
16
17oligo = NASequence.fromString(str(coding_rna))
18aaseq = AASequence.fromString(str(protein_seq))
19
20print("The RNA sequence", str(oligo), "has mass", oligo.getMonoWeight(), "and \n"
21 "translates to the protein sequence", str(aaseq), "which has mass", aaseq.getMonoWeight() )