Bioinformatics - protocols  D

To locate the required tools, look for a golden D in the list of links or use the shortcuts provided.

Basic sequence manipulation

You have just sequenced the following  Arabidopsis cDNA fragment. Note the format of the sequence: a header line (>mysequence) followed by the rest of the sequence, which means FASTA format.

>cDNA_C_At
CAAGCAACCTTTACACATATAGAAGAAGAAAAACACTTCTTTGTTTCTGTCATTAATTCCCTCCCTCTAT
ATATATATATTTAAATCTATTATGACAAAACAATCCAATACTGGATACTTTTTACAACAACATGCACAGA
CAAAGCTGAGATCTCACCTTAAGAAACTAATGAGATTGAGTTATGTTCGTCTTCATCAGACGAAGAGGTT
GAAGACGACGACGAAGAAGAAGAAGATTGTCTTCGTCCAACGAGTCCAGGAAGAGGTTGTGGCATCATTG
GATTCACTGGAACAGGAAACTTATGAGCAGAACTAACCATTGTTCTTTCGTTTATCATCCCTACTTCTTT
GCAAACTCTGTCTACTACTCCAAGGAAGTCTCTAACCACCAAGAATATTCTAAACGGATGCGCTTCTTCT
TTAGCCGAGTTTCCATGGAAATACTCTGTGATTTCTTTTACAAGTGATAACGCTACGCTCTCTTGAGCTT
GTACTCTGATGATCTCTTCCTCAGCTCTTTTCAGAAACNTTTTCATCGATTCCGNNAACCTCTGACTGTT
GCTTTCTTCTGTGATTGTTGATTGGACTTGGATTGCTTCGTTGATCTTGGCAATGNNTTGAGGAAAGCTT
GGAGANGTAGCTGCTTAGTACTTCTGAGTCCATCNNAGC

Sequence similarity searches and domain structure analysis

Construction and interpretation of a protein sequence alignment

Create an alignment of the sequences you have downloaded from the FASTA output using MACAW.

Gene building: searching for coding sequences in chromosomal DNA

The following DNA sequence corresponds to the sense strand of the A. thaliana genomic locus from which your initial cDNA has been derived:

>D_gene
TTTAATAAAATAAAAATCCACTCGCATTTTTATTTTCAACATTGTGCGTA
CGGTGCAATTCAATGAACAGTGTTTACTTTCAGTGTGTACACTTCTGCGG
ACTATTACAAAGTCCACGTCTTATCCTACGTGTTATAATCTCATATGTTA
CTGTCTGAAATGGACCCCACTACGTAAAAATAAAATTAAGAATCAACCAC
TCTTCTTCCATCACCTCTTTTGGCTTTCTCTCTACTCTCTCTACTACTCT
CTCACCATCACTGAGTTAAGAGAACAAACCAAAAACAAAATTATCAAACC
ATCACCAGCAGAATCTTAGCTGGATTCATCACTCTATTCAAAAAGTTTCT
CTCTTCTCTTTTCTCAGATCTTGAACTCTTGAAGAAGAAAGAAGAAGATA
ACACAATGCTCTTCTTCTTATTCTTCTTCTACTTACTCTTATCTTCATCC
TCCGATCTAGTCTTCGCCGACCGTCGTGTACTCCACGAACCATTCTTCCC
TATAGATTCACCACCACCGTCACCACCATCACCACCACCACTTCCTAAAC
TACCATTCTCTTCAACCACTCCTCCATCTTCATCAGACCCAAATGCTTCT
CCTTTCTTCCCTTTATACCCTTCATCTCCACCACCACCTTCTCCAGCCTC
CTTCGCTTCTTTTCCGGCGAATATCTCATCTCTAATCGTCCCTCACGCCA
CTAAATCCCCACCTAACTCCAAAAAACTCCTTATCGTCGCTATCTCCGCC
GTTTCCTCCGCTGCTTTAGTCGCTCTACTTATCGCTTTACTCTATTGGCG
AAGAAGCAAACGTAACCAAGATCTTAACTTCTCCGATGATAGCAAAACAT
ACACCACCGACAGTAGCCGCCGTGTCTACCCTCCTCCTCCGGCAACGGCG
CCTCCAACACGACGCAATGCGGAGGCTAGAAGTAAACAGAGGACCACCAC
GAGCTCCACCAATAACAACAGCTCTGAGTTTCTTTACTTAGGAACAATGG
TGAATCAAAGAGGAATCGATGAACAATCTCTTAGTAATAATGGATCAAGC
TCAAGAAAACTTGAATCTCCAGATCTTCAACCACTTCCTCCATTGATGAA
ACGAAGTTTCCGTTTAAATCCAGATGTTGGTTCAATCGGAGAAGAAGATG
AAGAAGATGAGTTTTACTCTCCACGTGGCTCACAAAGCGGGCGAGAACCG
TTAAACCGGGTCGGACTTCCGGGTCAAAATCCTAGATCTGTTAACAATGA
CACTATCTCTTGCTCATCTTCAAGCTCTGGTTCACCAGGAAGATCAACAT
TTATCAGTATCTCTCCTTCAATGAGTCCTAAGAGATCTGAACCAAAACCG
CCGGTTATCTCCACACCAGAACCGGCGGAGTTAACCGATTATAGATTTGT
TCGGTCTCCGTCACTGTCGTTAGCTTCTTTATCGTCGGGATTGAAAAACT
CCGATGAAGTAGGATTGAATCAAATCTTTAGATCTCCGACGGTTACATCT
CTAACAACTTCACCGGAGAATAACAAAAAAGAGAACTCTCCATTATCATC
TACTTCAACTTCACCGGAACGACGACCAAATGATACACCAGAAGCTTACT
TGAGATCTCCGTCGCATTCTTCTGCTTCTACATCACCGTATAGATGTTTT
CAGAAATCTCCGGAGGTCTTACCGGCGTTTATGAGTAATCTCCGGCAAGG
TTTGCAATCTCAGTTACTATCTTCTCCTTCTAACTCTCATGGAGGACAAG
GTTTCCTTAAGCAGTTAGATGCATTACGTTCTCGTTCACCGTCGTCGTCT
TCTTCTTCTGTTTGTTCTTCACCGGAGAAAGCTTCTCATAAGTCACCAGT
TACATCTCCTAAGTTATCTTCCCGGAATTCGCAGTCTCTATCATCTTCTC
CGGATAGAGATTTTAGTCATAGCTTAGATGTATCACCACGGATATCGAAC
ATTTCACCTCAAATTTTACAGTCTCGTGTGCCTCCGCCTCCTCCTCCTCC
CCCACCGTTGCCGTTGTGGGGACGACGGAGTCAGGTGACTACTAAAGCGG
ACACAATCTCGAGACCGCCTTCTCTTACACCGCCTTCACATCCTTTTGTG
ATCCCATCTGAAAACTTACCAGTGACTTCGTCTCCTATGGAGACTCCAGA
GACGGTTTGTGCGAGTGAGGCGGCGGAGGAAACTCCGAAACCGAAGCTAA
AGGCGTTACATTGGGATAAAGTTAGAGCAAGTTCGGATCGTGAGATGGTT
TGGGATCATCTTCGATCAAGCTCTTTCAAGTGAGTTAATGTGACATACTC
GTTTATATGATACTATATGCTTTTAGTGAGAATGTGGTTGTTGAGATTAT
GAATGTGGTTTGCAGATTAGATGAGGAGATGATTGAGACGTTGTTTGTGG
CGAAGTCGTTAAACAACAAACCAAATCAGAGTCAGACAACTCCAAGATGT
GTTCTCCCGAGCCCGAACCAAGAGAACAGAGTCCTGGACCCGAAGAAGGC
TCAGAATATTGCCATCTTGCTTCGTGCACTTAATGTCACTATAGAAGAAG
TTTGTGAGGCTCTTCTTGAAGGTAAACTATGCTGTCACATACATAGTTTC
TCATTTTCTTCTCCTTTGATCTCCAGAATTAGAGTTCTTATGCATTTGTT
AATGGTTTTTCGATGATATGGTTGAGTTATTCTGAAAGCTTTGCTTCTTT
GATGGTGTGGAGATTCTTGGTTACATTGATGTTCTTAGTTATGCTTTTTC
AGGCAATGCTGATACACTGGGGACTGAACTTCTTGAGAGCTTACTGAAGA
TGGCACCGACAAAAGAAGAAGAGCGCAAGTTGAAAGCGTACAATGATGAT
TCGCCTGTTAAGCTTGGACATGCTGAGAAATTCCTTAAGGCAATGTTGGA
CATCCCTTTCGCCTTTAAAAGAGTTGATGCAATGCTCTATGTAGCCAACT
TTGAGTCCGAGGTTGAATACTTGAAGAAATCTTTTGAGACTCTTGAGGTA
TATATTACAAGCTATTCTCTCTCTTTTTACCATATGGTTGTATTGTAACA
GATTATGACTTCATTTCTATTGTTTGTGTAGGCTGCTTGTGAAGAACTGA
GGAACAGTAGGATGTTCTTAAAGCTTCTTGAAGCGGTTCTAAAGACAGGA
AACCGTATGAACGTTGGAACAAACCGAGGAGATGCACATGCGTTCAAGCT
TGATACACTTCTCAAGCTAGTCGATGTCAAAGGCGCTGATGGGAAAACAA
CTCTCTTGCATTTCGTTGTACAAGAGATAATCCGAGCAGAAGGCACACGT
CTCTCAGGTAACAATACACAAACAGATGACATTAAATGCCGGAAACTAGG
TCTCCAAGTTGTATCAAGTCTCTGTTCTGAGCTTAGTAACGTCAAGAAAG
CTGCTGCGATGGACTCAGAAGTACTAAGCAGCTACGTCTCCAAGCTTTCT
CAAGGCATTGCCAAGATCAACGAAGCAATCCAAGTCCAATCAACAATCAC
AGAAGAAAGCAACAGTCAGAGGTTTTCGGAATCGATGAAAACGTTTCTGA
AAAGAGCTGAGGAAGAGATCATCAGAGTACAAGCTCAAGAGAGCGTAGCG
TTATCACTTGTAAAAGAAATCACAGAGTATTTCCATGGAAACTCGGCTAA
AGAAGAAGCGCATCCGTTTAGAATATTCTTGGTGGTTAGAGACTTCCTTG
GAGTAGTAGACAGAGTTTGCAAAGAAGTAGGGATGATAAACGAAAGAACA
ATGGTTAGTTCTGCTCATAAGTTTCCTGTTCCAGTGAATCCAATGATGCC
ACAACCTCTTCCTGGACTCGTTGGACGAAGACAATCTTCTTCTTCTTCGT
CGTCGTCTTCAACCTCTTCGTCTGATGAAGACGAACATAACTCAATCTCA
TTAGTTTCTTAAGGTGAGATCTCAGCTTTGTCTGTGCATGTTGTTGTAAA
AAGTATCCAGTATTGGATTGTTTTGTCATAATAGATTTAAATATATATAT
ATAGAGGGAGGGAATTAATGACAGAAACAAAGAAGTGTTTTTCTTTTCTG
CATTTGTGTAAAAAAAATAATATAGGTTTACCTTAAAATTTGTTCATCTT
AAATTAATAATTTAAGAATCAAATAAATTTGTTTATCTGAACCGTGTGTA
CCACGAAAGAATGTGAGAGCAAACATATTACTTACTTACCCTTCGTTGCT
GAATATAATGATCATTATAAATCACTACCTCCAGTACCTTCTACCTTCTT
CAAAGAACCTTGTTGGATTTGAACCAAAGTTGGAACATAATTGACGAGAG
GTGAGCATCTAGATTCTGCATCGTGATGATGATCCACTTTTATCTATTTA

Try to predict the exon-intron structure using the MIT GenScan server (don`t forget to select organism = Arabidopsis!) and Gene Finder. (If any of the servers down, the results can be found on the  SOS page.). Compare the localisation of introns and exons found by both programs, as well as NetGene2/WebGene (contact group E or use links to results, if possible).
 

back to top
back to the  bioinformatics excercise top page