Re: Can't find gene sequence in choromosome sequence

classic Classic list List threaded Threaded
3 messages Options
hz5
Reply | Threaded
Open this post in threaded view
|

Re: Can't find gene sequence in choromosome sequence

hz5
NM is mRNA, should be separated by intron on genomic sequences, did you
consider this when you search?

Quoting Sam Al-Droubi <[hidden email]>:

> All,
>
> I downloaded the fasta sequence for a mouse gene from
> genbank with accession number NM_01167.  I also
> downloaded the Mouse chromosome 3 fasta file from from
> ncbi
>
(ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/Assembled_chromosomes/mm_chr3.fa.
gz).

> The problem is that I can not find the gene sequence
> in chromosome sequence. I used Perl
> index($chr_obj->seq,$seq_obj->seq) and I get -1,
> meaning no match.  I then searched by hand using grep
> and emacs and to my surprise, the gene sequence is not
> in the mm_chr3.fa file. What am I doing wrong?  Do I
> have the wrong chromosome file?  I am positive that
> this gene is in this chromosome according to genbank.
> By the way, I am doing this so that I can extract the
> promoter region right before the gene starts on the
> chromosome.
>
> Thank you in advance.
>
>  
>
> Sincerely,
> Sam Al-Droubi, M.S.
> [hidden email]
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>



=========================================================
Haibo Zhang, PhD
Computational Biology
http://www.cyberpostdoc.org/
Share postdoc information in cyberspace. Welcome your stories, suggestions and
advice!
_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Can't find gene sequence in choromosome sequence

Andrew Walsh
If you look at the entry in the .gbs file (release 34.1), the exon
coordinates for that mRNA are on the negative strand.  Are you using the
transcript sequence or the gene sequence?  If you are using the gene
sequence, reverse complementing should do the trick.  If you are using
the transcript sequence, this will not work since you are missing the
introns.

Andrew


[hidden email] wrote:

> NM is mRNA, should be separated by intron on genomic sequences, did you
> consider this when you search?
>
> Quoting Sam Al-Droubi <[hidden email]>:
>
>
>>All,
>>
>>I downloaded the fasta sequence for a mouse gene from
>>genbank with accession number NM_01167.  I also
>>downloaded the Mouse chromosome 3 fasta file from from
>>ncbi
>>
>
> (ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/Assembled_chromosomes/mm_chr3.fa.
> gz).
>
>>The problem is that I can not find the gene sequence
>>in chromosome sequence. I used Perl
>>index($chr_obj->seq,$seq_obj->seq) and I get -1,
>>meaning no match.  I then searched by hand using grep
>>and emacs and to my surprise, the gene sequence is not
>>in the mm_chr3.fa file. What am I doing wrong?  Do I
>>have the wrong chromosome file?  I am positive that
>>this gene is in this chromosome according to genbank.
>>By the way, I am doing this so that I can extract the
>>promoter region right before the gene starts on the
>>chromosome.
>>
>>Thank you in advance.
>>
>>
>>
>>Sincerely,
>>Sam Al-Droubi, M.S.
>>[hidden email]
>>_______________________________________________
>>Bioperl-l mailing list
>>[hidden email]
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
>
> =========================================================
> Haibo Zhang, PhD
> Computational Biology
> http://www.cyberpostdoc.org/
> Share postdoc information in cyberspace. Welcome your stories, suggestions and
> advice!
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>


--
------------------------------------------------------------------
Andrew Walsh, M.Sc.
Senior Bioinformatics Software Engineer
IT Unit
Cenix BioScience GmbH
Tatzberg 47
01307 Dresden
Germany
Tel. +49-351-4173 137
Fax  +49-351-4173 109

public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg
------------------------------------------------------------------

_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l
Reply | Threaded
Open this post in threaded view
|

Re: Can't find gene sequence in choromosome sequence

Sean Davis-3



On 1/6/06 3:35 AM, "Andrew Walsh" <[hidden email]> wrote:

> If you look at the entry in the .gbs file (release 34.1), the exon
> coordinates for that mRNA are on the negative strand.  Are you using the
> transcript sequence or the gene sequence?  If you are using the gene
> sequence, reverse complementing should do the trick.  If you are using
> the transcript sequence, this will not work since you are missing the
> introns.

Another possibility that is readily available and more robust is to use BLAT
at the UCSC genome browser.  It is really a pretty simple matter to drop
this sequence into the UCSC genome browser and BLAT it.  In addition to the
complexities already noted, note that mRNA sequence does NOT necessarily
match the associated genomic sequence base-for-base because of SNPs, lower
quality sequence reads, etc.  Finally, if you have the Accession (which you
do), you could simply look that up at UCSC and get the (curated) results of
the blat on the refseq track.

Sean


>
> [hidden email] wrote:
>> NM is mRNA, should be separated by intron on genomic sequences, did you
>> consider this when you search?
>>
>> Quoting Sam Al-Droubi <[hidden email]>:
>>
>>
>>> All,
>>>
>>> I downloaded the fasta sequence for a mouse gene from
>>> genbank with accession number NM_01167.  I also
>>> downloaded the Mouse chromosome 3 fasta file from from
>>> ncbi
>>>
>>
>> (ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/Assembled_chromosomes/mm_chr3.
>> fa.
>> gz).
>>
>>> The problem is that I can not find the gene sequence
>>> in chromosome sequence. I used Perl
>>> index($chr_obj->seq,$seq_obj->seq) and I get -1,
>>> meaning no match.  I then searched by hand using grep
>>> and emacs and to my surprise, the gene sequence is not
>>> in the mm_chr3.fa file. What am I doing wrong?  Do I
>>> have the wrong chromosome file?  I am positive that
>>> this gene is in this chromosome according to genbank.
>>> By the way, I am doing this so that I can extract the
>>> promoter region right before the gene starts on the
>>> chromosome.
>>>
>>> Thank you in advance.
>>>
>>>
>>>
>>> Sincerely,
>>> Sam Al-Droubi, M.S.
>>> [hidden email]
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> [hidden email]
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>>
>> =========================================================
>> Haibo Zhang, PhD
>> Computational Biology
>> http://www.cyberpostdoc.org/
>> Share postdoc information in cyberspace. Welcome your stories, suggestions
>> and
>> advice!
>> _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>


_______________________________________________
Bioperl-l mailing list
[hidden email]
http://portal.open-bio.org/mailman/listinfo/bioperl-l