Oligonucleotides: RNA ===================== Nucleic Acid Sequences ********************** OpenMS also supports the representation of RNA oligonucleotides using the :py:class:`~.NASequence` class: .. code-block:: python :linenos: import pyopenms as oms oligo = oms.NASequence.fromString("AAUGCAAUGG") prefix = oligo.getPrefix(4) suffix = oligo.getSuffix(4) print(oligo) print(prefix) print(suffix) print() print("Oligo length", oligo.size()) print("Total precursor mass", oligo.getMonoWeight()) print( "y1+ ion mass of", str(prefix), ":", prefix.getMonoWeight(oms.NASequence.NASFragmentType.YIon, 1), ) print() seq_formula = oligo.getFormula() print("RNA Oligo", oligo, "has molecular formula", seq_formula) print("=" * 35) print() isotopes = seq_formula.getIsotopeDistribution(oms.CoarseIsotopePatternGenerator(6)) for iso in isotopes.getContainer(): print("Isotope", iso.getMZ(), ":", iso.getIntensity()) Which will output .. code-block:: output AAUGCAAUGG AAUG AUGG Oligo length 10 Total precursor mass 3206.4885302061 y1+ ion mass of AAUG : 1248.2298440331 RNA Oligo AAUGCAAUGG has molecular formula C97H119N42O66P9 =================================== Isotope 3206.4885302061 : 0.25567981600761414 Isotope 3207.4918850439003 : 0.31783154606819153 Isotope 3208.4952398817004 : 0.23069815337657928 Isotope 3209.4985947195 : 0.12306403368711472 Isotope 3210.5019495573 : 0.053163252770900726 Isotope 3211.5053043951 : 0.01956319250166416 The :py:class:`~.NASequence` object also allows iterations directly in Python: .. code-block:: python :linenos: oligo = oms.NASequence.fromString("AAUGCAAUGG") print( "The oligonucleotide", str(oligo), "consists of the following nucleotides:" ) for ribo in oligo: print(ribo.getName()) Fragment Ions ~~~~~~~~~~~~~ Similarly to before for amino acid sequences, we can also generate internal fragment ions: .. code-block:: python :linenos: oligo = oms.NASequence.fromString("AAUGCAAUGG") suffix = oligo.getSuffix(4) oligo.size() oligo.getMonoWeight() charge = 2 mass = suffix.getMonoWeight(oms.NASequence.NASFragmentType.WIon, charge) w4_formula = suffix.getFormula(oms.NASequence.NASFragmentType.WIon, charge) mz = mass / charge print("=" * 35) print("RNA Oligo w4++ ion", suffix, "has mz", mz) print("RNA Oligo w4++ ion", suffix, "has molecular formula", w4_formula) Modified Oligonucleotides ************************* Modified nucleotides can also be represented by the :py:class:`~.Ribonucleotide` class and are specified using a unique string identifier present in the :py:class:`~.RibonucleotideDB` in square brackets. For example, :chem:`[m1A]` represents 1-methyl-adenosine. We can create a :py:class:`~.NASequence` object by parsing a modified sequence as follows: .. code-block:: python :linenos: oligo_mod = oms.NASequence.fromString("A[m1A][Gm]A") seq_formula = oligo_mod.getFormula() print( "RNA Oligo", oligo_mod, "has molecular formula", seq_formula, "and length", oligo_mod.size(), ) print("=" * 35) oligo_list = [oligo_mod[i].getOrigin() for i in range(oligo_mod.size())] print( "RNA Oligo", oligo_mod.toString(), "has unmodified sequence", "".join(oligo_list), ) r = oligo_mod[1] r.getName() r.getHTMLCode() r.getOrigin() for i in range(oligo_mod.size()): print(oligo_mod[i].isModified()) DNA, RNA and Protein ******************** We can also work with DNA and RNA sequences in combination with the BioPython library (you can install BioPython with ``pip install biopython``): .. code-block:: pseudocode :linenos: from Bio.Seq import Seq from Bio.Alphabet import IUPAC bsa = oms.FASTAEntry() bsa.sequence = 'ATGAAGTGGGTGACTTTTATTTCTCTTCTCCTTCTCTTCAGCTCTGCTTATTCCAGGGGTGTGTTTCGT' bsa.description = "BSA Bovine Albumin (partial sequence)" bsa.identifier = "BSA" entries = [bsa] f = oms.FASTAFile() f.store("example_dna.fasta", entries) coding_dna = Seq(bsa.sequence, IUPAC.unambiguous_dna) coding_rna = coding_dna.transcribe() protein_seq = coding_rna.translate() oligo = oms.NASequence.fromString(str(coding_rna)) aaseq = oms.AASequence.fromString(str(protein_seq)) print("The RNA sequence", str(oligo), "has mass", oligo.getMonoWeight(), "and \n" "translates to the protein sequence", str(aaseq), "which has mass", aaseq.getMonoWeight() )