mummer3 output format

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

mummer3 output format

Albert Vilella
Hi,

I am trying to understand how to transform Mummer3's output format
into something I can pipe into another program, like MAF or similar.
How can I parse the results so that I can then do a write_aln into MAF
o similar?

Details:

If I run nucmer v.3.23 with the options below, I get an out.delta like this:

~/MUMmer3.23/nucmer -maxgap $g -l $l $ref $qry

------------------
Leishmania_major.LM2.12.dna.toplevel.fa
LtarParrotTarIIGenomic_TriTrypDB-4.0.fasta
NUCMER
>LmjF.34 ULAVAL|LtaPseq521 1866748 641
959335 959806 169 640 91 91 0
20
17
-3
-2
-183
5
0
>LmjF.12 ULAVAL|LtaPseq501 675346 1438
322990 324081 1436 342 178 178 0
-45
-1
-1
-1

This doesn't look like any of the formats in t/AlignIO/mummer.t to me.

I can also run:

~/MUMmer3.23/show-aligns out.delta $region1 $region2

Which gives me something that looks like a blast or exonerate output, like so:

------
Leishmania_major.LM2.12.dna.toplevel.fa
LtarParrotTarIIGenomic_TriTrypDB-4.0.fasta

============================================================
-- Alignments between LmjF.34 and ULAVAL|LtaPseq521

-- BEGIN alignment [ +1 959335 - 959806 | +1 169 - 640 ]


959335     cacacgcctcgtagaggtctccttgctttcgcgcggtgc.c.tcacttg
169        cacacgcctcgtagagatc.ccctgccttcgcgcgg.gctcttcacttg
                           ^  ^  ^   ^         ^  ^ ^

959382     cgcatgcggtagtagaagagaatgctgtgggcccacccagcgtagttgc
216        cgcatgcggtagtagaagagaatgctgtgtgcccacccagcgtagttgc
                                        ^

959431     caaacagcttccggaaggcctcctgaatgacgttatgatgccgctcgta
265        caaacagtttccagaaggcatcctggataacattatgatgccgttcgta
                  ^    ^      ^     ^  ^  ^           ^

959480     caagggtgggacaggcgtttttcgtgaggcgcgcagcggggctgctgca
314        caggggcggcacaggtgttttccgtaaggcacgtgaagaggtcgttgca
             ^   ^  ^     ^     ^   ^    ^  ^^^^ ^  ^^ ^

959529     gagcttccaccttcctctatcgccttta.cggtcgctggcgacacgcct
363        gagcctccgtttcccttcaccgcccgcagcgat.gatgatgtcactcct
               ^   ^^^ ^   ^^ ^    ^^^ ^  ^ ^ ^  ^^ ^   ^

959577     ttcttaaccttgagaacctccgcctgcttcctccactccagcagcagat
411        ttcttcaccttgagagcctccgcctggttcttccactccaggagaagat
                ^         ^          ^   ^          ^  ^

959626     tatcccgtgagcgggcttcctcttcgggcaacggacaccctggacgaga
460        cagtgggtgcgcagacttcttcttcgcgcagtagagaccctgagcgaga
           ^ ^^^^   ^  ^ ^    ^      ^   ^^^  ^      ^^

959675     gcgcttacgacccaccgccgtcgcggcgcttggtgcggcaaggtactcc
509        acgctttcgacccgccgatgtcacggtgcttgcggtggcaagatactcc
           ^     ^      ^   ^^   ^   ^     ^^ ^      ^

959724     accgcaacttgcgccatgtgcgtgtccacggggacaatgtgggtgcggt
558        accgaaacctgcgccatgtgtgtgtccacggggacgatgtgggtgcggt
               ^   ^           ^              ^

959773     tgagcgcgaagagcgccacgcagtcagcaacttt
607        tgagagcaaagagcgccacgcaatccgccacttt
               ^  ^              ^  ^  ^


--   END alignment [ +1 959335 - 959806 | +1 169 - 640 ]

============================================================

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: mummer3 output format

Roy Chaudhuri-3
Hi Albert,

The show-coords program converts the delta file into a coords file which
is much easier to parse. It is run automatically if you provide the
--coords flag to nucmer/promer.

There was talk of a BioPerl MUMmer parser a while back but I'm not sure
if it got anywhere.

You might also look at Mugsy, which uses MUMmer and outputs MAF, so may
contain some code that can be recycled - it is written in Perl I think.

Cheers,
Roy.

On 01/03/2012 15:45, Albert Vilella wrote:

> Hi,
>
> I am trying to understand how to transform Mummer3's output format
> into something I can pipe into another program, like MAF or similar.
> How can I parse the results so that I can then do a write_aln into MAF
> o similar?
>
> Details:
>
> If I run nucmer v.3.23 with the options below, I get an out.delta like this:
>
> ~/MUMmer3.23/nucmer -maxgap $g -l $l $ref $qry
>
> ------------------
> Leishmania_major.LM2.12.dna.toplevel.fa
> LtarParrotTarIIGenomic_TriTrypDB-4.0.fasta
> NUCMER
>> LmjF.34 ULAVAL|LtaPseq521 1866748 641
> 959335 959806 169 640 91 91 0
> 20
> 17
> -3
> -2
> -183
> 5
> 0
>> LmjF.12 ULAVAL|LtaPseq501 675346 1438
> 322990 324081 1436 342 178 178 0
> -45
> -1
> -1
> -1
>
> This doesn't look like any of the formats in t/AlignIO/mummer.t to me.
>
> I can also run:
>
> ~/MUMmer3.23/show-aligns out.delta $region1 $region2
>
> Which gives me something that looks like a blast or exonerate output, like so:
>
> ------
> Leishmania_major.LM2.12.dna.toplevel.fa
> LtarParrotTarIIGenomic_TriTrypDB-4.0.fasta
>
> ============================================================
> -- Alignments between LmjF.34 and ULAVAL|LtaPseq521
>
> -- BEGIN alignment [ +1 959335 - 959806 | +1 169 - 640 ]
>
>
> 959335     cacacgcctcgtagaggtctccttgctttcgcgcggtgc.c.tcacttg
> 169        cacacgcctcgtagagatc.ccctgccttcgcgcgg.gctcttcacttg
>                             ^  ^  ^   ^         ^  ^ ^
>
> 959382     cgcatgcggtagtagaagagaatgctgtgggcccacccagcgtagttgc
> 216        cgcatgcggtagtagaagagaatgctgtgtgcccacccagcgtagttgc
>                                          ^
>
> 959431     caaacagcttccggaaggcctcctgaatgacgttatgatgccgctcgta
> 265        caaacagtttccagaaggcatcctggataacattatgatgccgttcgta
>                    ^    ^      ^     ^  ^  ^           ^
>
> 959480     caagggtgggacaggcgtttttcgtgaggcgcgcagcggggctgctgca
> 314        caggggcggcacaggtgttttccgtaaggcacgtgaagaggtcgttgca
>               ^   ^  ^     ^     ^   ^    ^  ^^^^ ^  ^^ ^
>
> 959529     gagcttccaccttcctctatcgccttta.cggtcgctggcgacacgcct
> 363        gagcctccgtttcccttcaccgcccgcagcgat.gatgatgtcactcct
>                 ^   ^^^ ^   ^^ ^    ^^^ ^  ^ ^ ^  ^^ ^   ^
>
> 959577     ttcttaaccttgagaacctccgcctgcttcctccactccagcagcagat
> 411        ttcttcaccttgagagcctccgcctggttcttccactccaggagaagat
>                  ^         ^          ^   ^          ^  ^
>
> 959626     tatcccgtgagcgggcttcctcttcgggcaacggacaccctggacgaga
> 460        cagtgggtgcgcagacttcttcttcgcgcagtagagaccctgagcgaga
>             ^ ^^^^   ^  ^ ^    ^      ^   ^^^  ^      ^^
>
> 959675     gcgcttacgacccaccgccgtcgcggcgcttggtgcggcaaggtactcc
> 509        acgctttcgacccgccgatgtcacggtgcttgcggtggcaagatactcc
>             ^     ^      ^   ^^   ^   ^     ^^ ^      ^
>
> 959724     accgcaacttgcgccatgtgcgtgtccacggggacaatgtgggtgcggt
> 558        accgaaacctgcgccatgtgtgtgtccacggggacgatgtgggtgcggt
>                 ^   ^           ^              ^
>
> 959773     tgagcgcgaagagcgccacgcagtcagcaacttt
> 607        tgagagcaaagagcgccacgcaatccgccacttt
>                 ^  ^              ^  ^  ^
>
>
> --   END alignment [ +1 959335 - 959806 | +1 169 - 640 ]
>
> ============================================================
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: mummer3 output format

Roy Chaudhuri-3
Sorry, I'd completely missed Bio::AlignIO::mummer. However this seems to
be aimed at parsing the output of the mummer program (as opposed to
nucmer/promer) so I guess the advice about show-coords still stands.

On 01/03/2012 15:56, Roy Chaudhuri wrote:

> Hi Albert,
>
> The show-coords program converts the delta file into a coords file which
> is much easier to parse. It is run automatically if you provide the
> --coords flag to nucmer/promer.
>
> There was talk of a BioPerl MUMmer parser a while back but I'm not sure
> if it got anywhere.
>
> You might also look at Mugsy, which uses MUMmer and outputs MAF, so may
> contain some code that can be recycled - it is written in Perl I think.
>
> Cheers,
> Roy.
>
> On 01/03/2012 15:45, Albert Vilella wrote:
>> Hi,
>>
>> I am trying to understand how to transform Mummer3's output format
>> into something I can pipe into another program, like MAF or similar.
>> How can I parse the results so that I can then do a write_aln into MAF
>> o similar?
>>
>> Details:
>>
>> If I run nucmer v.3.23 with the options below, I get an out.delta like this:
>>
>> ~/MUMmer3.23/nucmer -maxgap $g -l $l $ref $qry
>>
>> ------------------
>> Leishmania_major.LM2.12.dna.toplevel.fa
>> LtarParrotTarIIGenomic_TriTrypDB-4.0.fasta
>> NUCMER
>>> LmjF.34 ULAVAL|LtaPseq521 1866748 641
>> 959335 959806 169 640 91 91 0
>> 20
>> 17
>> -3
>> -2
>> -183
>> 5
>> 0
>>> LmjF.12 ULAVAL|LtaPseq501 675346 1438
>> 322990 324081 1436 342 178 178 0
>> -45
>> -1
>> -1
>> -1
>>
>> This doesn't look like any of the formats in t/AlignIO/mummer.t to me.
>>
>> I can also run:
>>
>> ~/MUMmer3.23/show-aligns out.delta $region1 $region2
>>
>> Which gives me something that looks like a blast or exonerate output, like so:
>>
>> ------
>> Leishmania_major.LM2.12.dna.toplevel.fa
>> LtarParrotTarIIGenomic_TriTrypDB-4.0.fasta
>>
>> ============================================================
>> -- Alignments between LmjF.34 and ULAVAL|LtaPseq521
>>
>> -- BEGIN alignment [ +1 959335 - 959806 | +1 169 - 640 ]
>>
>>
>> 959335     cacacgcctcgtagaggtctccttgctttcgcgcggtgc.c.tcacttg
>> 169        cacacgcctcgtagagatc.ccctgccttcgcgcgg.gctcttcacttg
>>                              ^  ^  ^   ^         ^  ^ ^
>>
>> 959382     cgcatgcggtagtagaagagaatgctgtgggcccacccagcgtagttgc
>> 216        cgcatgcggtagtagaagagaatgctgtgtgcccacccagcgtagttgc
>>                                           ^
>>
>> 959431     caaacagcttccggaaggcctcctgaatgacgttatgatgccgctcgta
>> 265        caaacagtttccagaaggcatcctggataacattatgatgccgttcgta
>>                     ^    ^      ^     ^  ^  ^           ^
>>
>> 959480     caagggtgggacaggcgtttttcgtgaggcgcgcagcggggctgctgca
>> 314        caggggcggcacaggtgttttccgtaaggcacgtgaagaggtcgttgca
>>                ^   ^  ^     ^     ^   ^    ^  ^^^^ ^  ^^ ^
>>
>> 959529     gagcttccaccttcctctatcgccttta.cggtcgctggcgacacgcct
>> 363        gagcctccgtttcccttcaccgcccgcagcgat.gatgatgtcactcct
>>                  ^   ^^^ ^   ^^ ^    ^^^ ^  ^ ^ ^  ^^ ^   ^
>>
>> 959577     ttcttaaccttgagaacctccgcctgcttcctccactccagcagcagat
>> 411        ttcttcaccttgagagcctccgcctggttcttccactccaggagaagat
>>                   ^         ^          ^   ^          ^  ^
>>
>> 959626     tatcccgtgagcgggcttcctcttcgggcaacggacaccctggacgaga
>> 460        cagtgggtgcgcagacttcttcttcgcgcagtagagaccctgagcgaga
>>              ^ ^^^^   ^  ^ ^    ^      ^   ^^^  ^      ^^
>>
>> 959675     gcgcttacgacccaccgccgtcgcggcgcttggtgcggcaaggtactcc
>> 509        acgctttcgacccgccgatgtcacggtgcttgcggtggcaagatactcc
>>              ^     ^      ^   ^^   ^   ^     ^^ ^      ^
>>
>> 959724     accgcaacttgcgccatgtgcgtgtccacggggacaatgtgggtgcggt
>> 558        accgaaacctgcgccatgtgtgtgtccacggggacgatgtgggtgcggt
>>                  ^   ^           ^              ^
>>
>> 959773     tgagcgcgaagagcgccacgcagtcagcaacttt
>> 607        tgagagcaaagagcgccacgcaatccgccacttt
>>                  ^  ^              ^  ^  ^
>>
>>
>> --   END alignment [ +1 959335 - 959806 | +1 169 - 640 ]
>>
>> ============================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://lists.open-bio.org/mailman/listinfo/bioperl-l