Fwd: [Genbank-bb] Change to sequence display formats : Removal of GIs by June 2016

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fwd: [Genbank-bb] Change to sequence display formats : Removal of GIs by June 2016

Fields, Christopher J
Something to keep in mind if parsing breaks (though we should be okay).  I’m more concerned about BLAST+ XML changes...

chris

Begin forwarded message:

From: "Cavanaugh, Mark (NIH/NLM/NCBI) [E]" <[hidden email]>
Date: June 26, 2015 at 5:13:50 PM CDT
Subject: [Genbank-bb] Change to sequence display formats : Removal of GIs by June 2016

Greetings GenBank Users,

 A very significant change which impacts the GenBank, GenPept, and FASTA
display formats for sequence records at NCBI was announced in the June 2015
GenBank release notes : The removal of GI sequence identifiers.

 This change could have many impacts, so it seems prudent to announce it
independently, to ensure that as many users are aware of the change as
possible. So Section 1.4.1 of the June release notes are reproduced below.

Mark Cavanaugh
GenBank
NCBI/NLM/NIH/HHS


1.4.1 GI sequence identifiers to be removed from GenBank/GenPept/FASTA formats

 As of 06/15/2016, the integer sequence identifiers known as "GIs" will no
longer be included in the GenBank, GenPept, and FASTA formats supported by
NCBI for the display of sequence records.

 As first described in the Release Notes for GenBank 199.0 in December 2013,
NCBI is in the process of moving to storage solutions which utilize only
Accession.Version identifiers. See Section 1.4.2 of these release notes for
additional background information about those developments.

 Although GI sequence identifiers served their purpose well for many years,
the Accession.Version system is completely equivalent (and much more
human-readable).

 And given the shift to non-GI-based systems, the importance of using
Accession.Version identifiers cannot be overstated. So as an initial step, NCBI
will cease the display of GI identifiers in the flatfile and FASTA views of
all sequence records.

 Previously-assigned GI identifiers will continue to exist 'behind the scenes',
and NCBI services (including URLs, APIs, etc) which accept GIs as inputs/arguments
will be supported, for those sequence records that have GIs, for the foreseeable
future.

 Over the next year NCBI will identify all such services that do not yet
support Accession.Version identifiers, and add that support. Users of those
services will then be encouraged to make use of Accession.Version rather than GIs.
Of course, for those services that already support Accession.Version, NCBI
encourages users to begin transitioning away from GI as soon as is practical.

 In the sample record below, nucleotide sequence AF123456 has been assigned a
GI of 6633795, and the protein translation of its coding region feature has
been assigned a GI of 6633796 :

LOCUS       AF123456                1510 bp    mRNA    linear   VRT 12-APR-2012
DEFINITION  Gallus gallus doublesex and mab-3 related transcription factor 1
           (DMRT1) mRNA, partial cds.
ACCESSION   AF123456
VERSION     AF123456.2  GI:6633795
....
    CDS             <1..936
                    /gene="DMRT1"
                    /note="cDMRT1"
                    /codon_start=1
                    /product="doublesex and mab-3 related transcription factor
                    1"
                    /protein_id="AAF19666.1"
                    /db_xref="GI:6633796"
                    /translation="PAAGKKLPRLPKCARCRNHGYSSPLKGHKRFCMWRDCQCKKCSL
                    IAERQRVMAVQVALRRQQAQEEELGISHPVPLPSAPEPVVKKSSSSSSCLLQDSSSPA
                    HSTSTVAAAAASAPPEGRMLIQDIPSIPSRGHLESTSDLVVDSTYYSSFYQPSLYPYY
                    NNLYNYSQYQMAVATESSSSETGGTFVGSAMKNSLRSLPATYMSSQSGKQWQMKGMEN
                    RHAMSSQYRMCSYYPPTSYLGQGVGSPTCVTQILASEDTPSYSESKARVFSPPSSQDS
                    GLGCLSSSESTKGDLECEPHQEPGAFAVSPVLEGE"

 After June 15 2016, the GI value on the VERSION line and the GI /db_xref
qualifier for the coding region feature will no longer be displayed:

LOCUS       AF123456                1510 bp    mRNA    linear   VRT 12-APR-2012
DEFINITION  Gallus gallus doublesex and mab-3 related transcription factor 1
           (DMRT1) mRNA, partial cds.
ACCESSION   AF123456
VERSION     AF123456.2
....
    CDS             <1..936
                    /gene="DMRT1"
                    /note="cDMRT1"
                    /codon_start=1
                    /product="doublesex and mab-3 related transcription factor
                    1"
                    /protein_id="AAF19666.1"
                    /translation="PAAGKKLPRLPKCARCRNHGYSSPLKGHKRFCMWRDCQCKKCSL
                    IAERQRVMAVQVALRRQQAQEEELGISHPVPLPSAPEPVVKKSSSSSSCLLQDSSSPA
                    HSTSTVAAAAASAPPEGRMLIQDIPSIPSRGHLESTSDLVVDSTYYSSFYQPSLYPYY
                    NNLYNYSQYQMAVATESSSSETGGTFVGSAMKNSLRSLPATYMSSQSGKQWQMKGMEN
                    RHAMSSQYRMCSYYPPTSYLGQGVGSPTCVTQILASEDTPSYSESKARVFSPPSSQDS
                    GLGCLSSSESTKGDLECEPHQEPGAFAVSPVLEGE"

 Similarly,  the GI value will be removed from the VERSION line of the GenPept
format. Currently:

LOCUS       AAF19666                 311 aa            linear   VRT 12-APR-2012
DEFINITION  doublesex and mab-3 related transcription factor 1, partial [Gallus
           gallus].
ACCESSION   AAF19666
VERSION     AAF19666.1  GI:6633796
DBSOURCE    accession AF123456.2
....
    CDS             1..311
                    /gene="DMRT1"
                    /coded_by="AF123456.2:<1..936"

As of 06/15/2016:

LOCUS       AAF19666                 311 aa            linear   VRT 12-APR-2012
DEFINITION  doublesex and mab-3 related transcription factor 1, partial [Gallus
           gallus].
ACCESSION   AAF19666
VERSION     AAF19666.1
DBSOURCE    accession AF123456.2
....
    CDS             1..311
                    /gene="DMRT1"
                    /coded_by="AF123456.2:<1..936"

 Note that the coding region feature for GenPept format has never included
the display of nucleotide GI values.

For FASTA format, GI values will be removed from the FASTA header/defline:

Currently:

gi|6633795|gb|AF123456.2| Gallus gallus doublesex and mab-3 related transcription factor 1 (DMRT1) mRNA, partial cds
CCGGCGGCGGGCAAGAAGCTGCCGCGTCTGCCCAAGTGTGCCCGCTGCCGCAACCACGGCTACTCCTCGC
CGCTGAAGGGGCACAAGCGGTTCTGCATGTGGCGGGACTGCCAGTGCAAGAAGTGCAGCCTGATCGCCGA
[....]

gi|6633796|gb|AAF19666.1| doublesex and mab-3 related transcription factor 1, partial
[Gallus gallus]
PAAGKKLPRLPKCARCRNHGYSSPLKGHKRFCMWRDCQCKKCSLIAERQRVMAVQVALRRQQAQEEELGI
SHPVPLPSAPEPVVKKSSSSSSCLLQDSSSPAHSTSTVAAAAASAPPEGRMLIQDIPSIPSRGHLESTSD
LVVDSTYYSSFYQPSLYPYYNNLYNYSQYQMAVATESSSSETGGTFVGSAMKNSLRSLPATYMSSQSGKQ
WQMKGMENRHAMSSQYRMCSYYPPTSYLGQGVGSPTCVTQILASEDTPSYSESKARVFSPPSSQDSGLGC
LSSSESTKGDLECEPHQEPGAFAVSPVLEGE

As of 06/15/2016:

gb|AF123456.2| Gallus gallus doublesex and mab-3 related transcription factor 1 (DMRT1) mRNA, partial cds
CCGGCGGCGGGCAAGAAGCTGCCGCGTCTGCCCAAGTGTGCCCGCTGCCGCAACCACGGCTACTCCTCGC
CGCTGAAGGGGCACAAGCGGTTCTGCATGTGGCGGGACTGCCAGTGCAAGAAGTGCAGCCTGATCGCCGA
[....]

gb|AAF19666.1| doublesex and mab-3 related transcription factor 1, partial
[Gallus gallus]
PAAGKKLPRLPKCARCRNHGYSSPLKGHKRFCMWRDCQCKKCSLIAERQRVMAVQVALRRQQAQEEELGI
SHPVPLPSAPEPVVKKSSSSSSCLLQDSSSPAHSTSTVAAAAASAPPEGRMLIQDIPSIPSRGHLESTSD
LVVDSTYYSSFYQPSLYPYYNNLYNYSQYQMAVATESSSSETGGTFVGSAMKNSLRSLPATYMSSQSGKQ
WQMKGMENRHAMSSQYRMCSYYPPTSYLGQGVGSPTCVTQILASEDTPSYSESKARVFSPPSSQDSGLGC
LSSSESTKGDLECEPHQEPGAFAVSPVLEGE

Please direct any inquiries about these changes to the NCBI Service Desk:

 [hidden email]



_______________________________________________
Genbankb mailing list
[hidden email]
http://www.bio.net/biomail/listinfo/genbankb


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Loading...