Conversion of Phred 33 -> Phred 64 quality

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Conversion of Phred 33 -> Phred 64 quality

Mark A. Jensen
Hi folks,
I know I could RTFM, but maybe someone knows off the top of their head:
I understand that Illumina at one time made a switch in the constant
added to quality scores to generate the FASTQ that comes off their
instruments. This leads to a certain incomparability of data before and
after that switch. This is about all I know of the issue; does anyone
here have experience with this? Are there any BP modules that do this
translation?
much appreciated-
MAJ
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Conversion of Phred 33 -> Phred 64 quality

Peter Cock
Yes, BioPerl's SeqIO understands the two legacy formats "fastq-solexa"
and "fastq-illumina" plus the original and now universal standard
"fastq-sanger".

See also http://dx.doi.org/10.1093/nar/gkp1137

Peter

On Mon, Jan 26, 2015 at 2:23 PM, Mark A. Jensen <[hidden email]> wrote:

> Hi folks,
> I know I could RTFM, but maybe someone knows off the top of their head: I
> understand that Illumina at one time made a switch in the constant added to
> quality scores to generate the FASTQ that comes off their instruments. This
> leads to a certain incomparability of data before and after that switch.
> This is about all I know of the issue; does anyone here have experience with
> this? Are there any BP modules that do this translation?
> much appreciated-
> MAJ
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Conversion of Phred 33 -> Phred 64 quality

Roy Chaudhuri-3
In reply to this post by Mark A. Jensen
Hi Mark,

Here's the relevant bit of the manual:
http://search.cpan.org/~cjfields/BioPerl/Bio/SeqIO/fastq.pm#FASTQ_and_Bio::Seq::Quality_mapping

There's also this article, which goes into the issue in depth:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217

A short Illumina->Sanger fastq converter would be:

use Bio::SeqIO;
my $in=Bio::SeqIO->new(-file=>$ARGV[0], -format=>'fastq',
-variant=>'illumina');
my $out=Bio::SeqIO->new(-format=>'fastq', -variant=>'sanger');
$out->write_seq($_) while $_=$in->next_seq;

Cheers,
Roy.

On 26/01/2015 14:23, Mark A. Jensen wrote:

> Hi folks,
> I know I could RTFM, but maybe someone knows off the top of their head:
> I understand that Illumina at one time made a switch in the constant
> added to quality scores to generate the FASTQ that comes off their
> instruments. This leads to a certain incomparability of data before and
> after that switch. This is about all I know of the issue; does anyone
> here have experience with this? Are there any BP modules that do this
> translation?
> much appreciated-
> MAJ
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Conversion of Phred 33 -> Phred 64 quality

Mark A. Jensen

Roy, you are awesome, as usual-
Much appreciated! MAJ


On Mon, Jan 26, 2015 at 9:51 AM, Roy Chaudhuri <[hidden email]> wrote:

Hi Mark,

Here's the relevant bit of the manual:
http://search.cpan.org/~cjfields/BioPerl/Bio/SeqIO/fastq.pm#FASTQ_and_Bio::Seq::Quality_mapping

There's also this article, which goes into the issue in depth:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217

A short Illumina->Sanger fastq converter would be:

use Bio::SeqIO;
my $in=Bio::SeqIO->new(-file=>$ARGV[0], -format=>'fastq',
-variant=>'illumina');
my $out=Bio::SeqIO->new(-format=>'fastq', -variant=>'sanger');
$out->write_seq($_) while $_=$in->next_seq;

Cheers,
Roy.

On 26/01/2015 14:23, Mark A. Jensen wrote:
> Hi folks,
> I know I could RTFM, but maybe someone knows off the top of their head:
> I understand that Illumina at one time made a switch in the constant
> added to quality scores to generate the FASTQ that comes off their
> instruments. This leads to a certain incomparability of data before and
> after that switch. This is about all I know of the issue; does anyone
> here have experience with this? Are there any BP modules that do this
> translation?
> much appreciated-
> MAJ
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Conversion of Phred 33 -> Phred 64 quality

Fields, Christopher J
In reply to this post by Mark A. Jensen
This would be the so-called fastq-illumina’ or ‘fastq-solexa’ variants.  FASTQ does handle these but the output for modern Illumina pipelines is to generate Sanger-based FASTQ.  

Bio::SeqIO::fastq can interconvert these but it’s painfully slow last time I checked (mainly in part to the underlying sequence quality classes).  They’re not really designed for tons of data and needs refactoring to deal with large sequence data sets.  Lately, frankly, I’ve been using non* methods for doing this, namely seqtk:

    https://github.com/lh3/seqtk

chris

> On Jan 26, 2015, at 8:23 AM, Mark A. Jensen <[hidden email]> wrote:
>
> Hi folks,
> I know I could RTFM, but maybe someone knows off the top of their head: I understand that Illumina at one time made a switch in the constant added to quality scores to generate the FASTQ that comes off their instruments. This leads to a certain incomparability of data before and after that switch. This is about all I know of the issue; does anyone here have experience with this? Are there any BP modules that do this translation?
> much appreciated-
> MAJ
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Conversion of Phred 33 -> Phred 64 quality

Fields, Christopher J
In reply to this post by Peter Cock
+1

Though, see my more detailed reply.  There are performance issues that need to be addressed.

chris

> On Jan 26, 2015, at 8:35 AM, Peter Cock <[hidden email]> wrote:
>
> Yes, BioPerl's SeqIO understands the two legacy formats "fastq-solexa"
> and "fastq-illumina" plus the original and now universal standard
> "fastq-sanger".
>
> See also http://dx.doi.org/10.1093/nar/gkp1137
>
> Peter
>
> On Mon, Jan 26, 2015 at 2:23 PM, Mark A. Jensen <[hidden email]> wrote:
>> Hi folks,
>> I know I could RTFM, but maybe someone knows off the top of their head: I
>> understand that Illumina at one time made a switch in the constant added to
>> quality scores to generate the FASTQ that comes off their instruments. This
>> leads to a certain incomparability of data before and after that switch.
>> This is about all I know of the issue; does anyone here have experience with
>> this? Are there any BP modules that do this translation?
>> much appreciated-
>> MAJ
>> _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l