Bioinformatics - protocols  A

To locate the required tools, look for a red A in the list of links or use the shortcuts provided.

Basic sequence manipulation

You have just sequenced an Arabidopsis cDNA fragment cloned in the EcoRI site of pBluescriptII SK- vector and obtained the following raw sequence:

CCGCGGTGGCGGCCGCTCTAGAACTAGTGGATCCCCCGGGCTGCAGGAAT
TCGGTGACCCCGGCAAAGCTTGCTTAATCCGAAGACGTTTCTGTTTCATC
TTCTTAAATCCGGGCCAACNGCGTTTACGAGACTAAACGCGTTTCTCTTT
AGGGCTTAATTATTATCCAGAGATGGCTCATCATAGCAAATGTTTACAAA
CGTTGGATTTAGCTTGTAAAGAGCTGAGATCTCGTGGCTTGTTTGTGAAG
CTTTTGGAGGCAATACTTAAAGCTGGAAACAGAATGAACGCGGGTACCGC
GAGAGGAAACGCTCAAGCGTTTAATCTAACCGCGCTTTTGAAGCTTTCGG
ATGTTAAAAGCGTTGATGGGAAGACTTCTTTGCTTAACTTTGTAGTGGAG
GAAGTTGTTAGATCGGAAGGAAAACGTTGTGTTATGAATAGAAGAAGCCA
TAGCTTAACACGAAGCGGTAGTAGTAACTACAATGGTGGTAATAGTAGTC
TTCAGGTTATGTCGAAAGAAGAGCAAGAGAAAGAGTACTTGAAGCTTGGT
TTACCAGTTGTTGGTGGATTGAGCTCTGAGTTTTCAAACGTGAAGAAAGC
TGCTTGTGTGGACTATGAAACGGTTGTTGCAACTTGTTCTGCTCTTGCGG
TTAGAGCGAAAGATGCGAAAACGGTGATTGGAGAATGTGAAGATGGAGAA
GGAGGGAGGTTTGTGAAAACGATGATGACGTTTCNTGATTCGGTAGAGGA
AGAGGTGAAAATAGCGAAAGGTGAAGAGAGGAAAGTGATCCCTGA

Sequence similarity searches and domain structure analysis

Gene building: searching for coding sequences in chromosomal DNA

The following DNA sequence corresponds to the sense strand of the locus in the A. thaliana Chromosome 1 from which your initial cDNA has been derived:

>locusA
cccctataaaaagtattaaaaaggactgatacaataatgtatataaatat
cctaaaagatcttaattttgtaaatttattgttgtatattctaaacccgc
aatattagaatgatgatttagtaaacaagaaagacaaaataaataattaa
ttttagctagaaaagatgaaataaacactcatgatttaagccatacaaat
cgaagccccttgggttcagcatttctcaccaagtaaataccatcacctct
ggaaacccatttacgtacttgaccacatcttttattagcggctcctctgt
atgctctccatatgttatacacactatgatgccttaagatttattcacga
cgatttaatcagatacgcttatggattgccaaagatgatgccatctactt
agagaaaaacaatggaaagcgagaacgcatgtataattggaataaaaatt
aatatggttttcatatatctaaaaaattggacatttgaagccttaataaa
ttatactatgtaaaaatacttgtttatgaatgtaaattataataaattac
gatttaattagggaaatattgactatatatttcacccaaatattgaatgt
aaattttattttccaatacttttgcacatttaagaaattttcggatgtat
ttcctaaagaatattaccttttttgttttttaaaccatgcctttttgttt
tacacgttcataaatgcatgttccatacgcattaccataatttaatttga
acttaattttctctaggaatggtgatgatccactaccactatcattgatt
tcattccatattcctttgaccgactgaaattacgttggaaatagtatatt
ttgatgaataatttatttactcggaaaaaagaggtcaagttattaatagt
aagtacatatacattatcaattaagaattcaattgagttttaaggaaaat
cctattaatttgtttggtattcggtatttgttagttctaaggaattgaat
ttcccgattatacatcattataacgttctcaagttccaaacttgcaaccc
acattttgtcgatattctcaaatgtgaattcattcaatttcccatagaaa
acataaatttgcacttaaagttaacaattgaaatcgtatctaaatgggaa
tgtttttggcttttagtgttagacttccaaagcgtcaaaaatatttctag
aaagagcacaaaaaataagcaacgccactacttttggacaaagtcaacga
taacacacatcaaccgcaccagctccataaaagtccatctcacgaaaacg
attctagtcaaactacctaaaacacccttatatttacatacaacccaatc
ccactaacaagggtattttcgtcaatcacaaaatttatcaccgacccggg
aagaagaagaagaacagatcaactaatttctgctttcaactccacattaa
accaaaacctccaaaaagaatcatttatttaaattatcttcccgttttaa
gttcctgagatttttgggaattgtaaatttgaagaaaattaaacaaagac
gtgttttcatttttttttttgtttcctttattgatctctctctatctctc
taaatgagctaaatcgttaatggctgccatgtttaatcatccatggccta
atttaaccctaatttacttcttcttcatcgtcgttttaccattccaatca
ctttctcaatttgattctcctcaaaatatcgaaactttcttccccatctc
ttcactctcccctgttccaccaccgcttcttccaccttcgtcaaacccat
ctccgccgtcgaataattcatcatcttcggataaaaaaacaatcaccaaa
gctgtccttataacagcagcaagtactttacttgtagctggagttttctt
cttctgcctccaaagatgtatcatcgcacggagacggagagacagagttg
gaccagtcagagtcgaaaacactttacctccgtatcctcctcctccgatg
acgtcggcggcggtgactacgactactttggctagagaaggattcacgag
gtttggtggtgtgaaaggtttgattcttgatgagaatggtcttgatgtgt
tgtattggagaaagctacagagtcagagagaaagaagtgggagtttcagg
aaacagatcgtcaccggagaagaagaagacgagaaagaagttatttatta
caagaacaagaagaaaacagagcccgttacagagattcctcttcttagag
gaagatcatctacttctcacagtgttatccataacgaagatcatcagccg
ccaccgcaggtgaaacagagtgaaccaacaccaccaccgccaccaccgtc
aattgcggtgaaacagagtgcaccaacgccatcgccacctcctccgatta
agaagggttcttcaccatcgccaccgccacctccaccggtgaaaaaggtt
ggagctttatcatcatcagcttcgaaaccaccacctgcgccggttagagg
agcaagtggaggagagacttcgaaacaagtaaagttgaagcctttacatt
gggataaagtaaaccctgattccgatcattcaatggtttgggacaaaatc
gatcgtggatcattcaggtatatatttatttcgaaagttagggcttttgc
ttcaatcaattgaaaaaaccctaatttgtttttgtttcttctcagtttcg
atggcgatttaatggaagctctgtttggatacgttgccgtggggaagaaa
tcaccagaacaaggcgatgagaaaaaccctaaatcaacgcaaatattcat
acttgatccgagaaagtctcaaaacacagcgattgtgctcaaatcattag
gtatgacacgtgaagagcttgttgaatcactcatagaaggaaacgatttc
gtgccagacactcttgagaggttagctagaatagctccaacgaaagaaga
acaatcagccattcttgaattcgacggtgacacggcaaagcttgctgatg
cggagacgtttctgtttcatcttcttaaatccgtgccaaccgcgtttacg
agactaaacgcgtttctctttagggctaattattatccagagatggctca
tcatagcaaatgtttacaaacgttggatttagcttgtaaagagctgagat
ctcgtggcttgtttgtgaagcttttggaggcaatacttaaagctggaaac
agaatgaacgcgggtaccgcgagaggaaacgctcaagcgtttaatctaac
cgcgcttttgaagctttcggatgttaaaagcgttgatgggaagacttctt
tgcttaactttgtagtggaggaagttgttagatcggaaggaaaacgttgt
gttatgaatagaagaagccatagcttaacacgaagcggtagtagtaacta
caatggtggtaatagtagtcttcaggttatgtcgaaagaagagcaagaga
aagagtacttgaagcttggtttaccagttgttggtggattgagctctgag
ttttcaaacgtgaagaaagctgcttgtgtggactatgaaacggttgttgc
aacttgttctgctcttgcggttagagcgaaagatgcgaaaacggtgattg
gagaatgtgaagatggagaaggagggaggtttgtgaaaacgatgatgacg
tttcttgattcggtagaggaagaggtgaaaatagcgaaaggtgaagagag
gaaagtgatggagcttgtgaaacgtacaacggattattatcaagcaggag
ctgttacaaaggggaagaatccacttcatttgtttgttatcgttagagat
tttcttgccatggttgataaagtttgcttagatattatgagaaatatgca
gaggaggaaggttggtagtccgatatcgccttcttcgcagcggaatgcgg
tgaaattcccggttttgcctccgaatttcatgtcggacagagcttggagt
gattctggtgggtcggattctgatatgtgagagtcaagatttgttatatg
taaatactaaatagtagaagcattttgggtattgattagcattgaaagat
gttgaattgtttatagatttatcagtccaaagcattggacttgagtataa
tttgttccttgtataaataaacaattttgctttaagacctttccatgttt
atgaacatgtcttctttaacttcacatagaccttttgtttacgtaagaac
taataatactaaattgtttgataattctaaatgtgaaagtgaaccactat
atagtgtgaacttggctttattgaattctttttaaaaaaatttctccaga
gctttagatgtaggagttaatattttcacctaacatagcctcttttttat
gtttctctatcaactaacactaaatttgtggatgaagactaaattaacat
aagtttatctattaactaacaacctaccagtttgatgcttgtaaatatga
aacttcaacgttataaagactatatggtgtgaactttttatccatcttta
ttgacttttaaaattttcttaatttgagtaaacaaaagcagaagcttttt
aaaggatgcaggagttgatttttgtatatgaacaaaacatatacttctcc
cttagacgaatttggagctatcattcttggtttcaaactttttaataatt
tgagctttaaagcaaaatggcaactttatattgatcactagtccacaaca
ctttctctgccttttcctcaatagcaacgcgtagtcaagaagaagaacgt
gtttaacatggaccaatcttgattaagataatagtatgatcaaatgctta
tataaacacactaaaaaggaatcaaatttaaccattccacaaatcaccaa
caaaatttaatgaatcatgtctctgcttctaaagatgttattattttcct
tattcttcttctatatggcttcaatttctcaatgctcagacccaaccggt
ggacagtttagcttcaacggttacttgtacaccgatggagttgcggatct
aaacccggacggtttgttcaaactcataacttcaaagaca

Try to predict the exon-intron structure using the MIT GenScan server (don`t forget to select organism = Arabidopsis!) and Gene Finder. (If any of the servers down, the results can be found on the  SOS page.). Compare the localisation of introns and exons found by both programs, as well as NetGene2/WebGene (contact group B or use links to results, if possible).
 

Construction and interpretation of a protein sequence alignment

Create an alignment of the Arabidopsis protein sequences obtained from your BLAST search. Below you find the sequence of a conserved domain from several related metazoan and fungal proteins. Save these sequences in an *.aa file, import them into your Macaw alignment and try to align it to your results. (Save your alignment as a new file so that you do not lose the data in case of crash).
>p140mDia
YKPEVQLRRPNWSKFVAEDLSQDCFWTKVKEDRFENNELFAKLTLAFSAQTKTSKAKKDQEGGEEKKSVQ
KKKVKELKVLDSKTAQNLSIFLGSFRMPYQEIKNVILEVNEAVLTESMIQNLIKQMPEPEQLKMLSELKE
EYDDLAESEQFGVVMGTVPRLRPRLNAILFKLQFSEQVENIKPEIVSVTAACEELRKSENFSSLLELTLL
VGNYMNAGSRNAGAFGFNISFLCKLRDTKSADQKMTLLHFLAELCENDHPEVLKFPDELAHVEKASRVSA
ENLQKSLDQMKKQIADVERDVQNFPAATDEKDKFVEKMTSFVKDAQEQYNKLRMMHSNMETLYKELGDYF
VFDPKKLSVEEFFMDLHNFRNMFLQAVKENQKRRETEEKMRRAKLAKEKAEKERLEKQQKREQLIDMNAE
GDETGVMDSLLEALQSGAAFRRKR
>Diaphanous Drosophila
WDVKNPMKRANWKAIVPAKMSDKAFWVKCQEDKLAQDDFLAELAVKFSSKPVKKEQKDAVDKPTTLTKKN
VDLRVLDSKTAQNLAIMLGGSLKHLSYEQIKICLLRCDTDILSSNILQQLIQYLPPPEQLKRLQEIKAKG
EPLPPIEQFAATIGEIKRLSPRLHNLNFKLTYADMVQDIKPDIVAGTAACEEIRNSKKFSKILELILLLG
NYMNSGSKNEAAFGFEISYLTKLSNTKDADNKQTLLHYLADLVEKKFPDALNFYDDLSHVNKASRVNMDA
IQKAMRQMNSAVKNLETDLQNNKVPQCDDDKFSEVMGKFAEECRQQVDVLGKMQLQMEKLYKDLSEYYAF
DPSKYTMEEFFADIKTFKDAFQAAHNDNVRVREELEKKRRLQEAREQSAREQQERQQRKKAVVDMDAPQT
QEGVMDSLLEALQTGSAFGQRNRQARRQRPAGAERRAQLSRSRSRTRVTNGQLMTREMILNEVLGSA
>Fugu formin
IKTKFRLPVFNWTALKPNQINGTVFNEIDDERELELERFEELFKTRAQGPIMDLSCTKSKVAQKAVNKVT
ILDANRSKNLAITLRKANKTFDLKTLPVDFVECLMRFLPTEMEVKALRQYERERRPLDQLAEEDRFMLLF
SKIERLTQRMNIITFIGNFSDNVAMLTPQLNAIIAASASVKSSPKLKRMLEIILALGNYMNSSKRGCVYG
FKLQSLDLLLDTKSTDRKMTLLHYIALIVKEKYPELANFYNELHFVDKAAAVSLENVLLDVRELGKGMDL
IRRECSLHDHSVLKGFLQASDTQLDKVQRDAKTAEEAFNNVVNYFGESAKTAPPSVFFPVFVRFLKAYKD
AVEENELRKKQEQAMREKLLAEEAKQQDPKVQAQKKRQQQHELIAELRKRQAKDHRPVYEGKDGTIEDII
TVLK
>Cappuccino Drosophila
PPTAPPATKEIWTEIEETPLDNIDEFTELFSRQAIAPVSKPKELKVKRAKSIKVLDPERSRNVGIIWRSL
HVPSSEIEHAIYHIDTSVVSLEALQHMSNIQATEDELQRIKEAAGGDIPLDHPEQFLLDISLISMASERI
SCIVFQAEFEESVTLLFRKLETVSQLSQQLIESEDLKLVFSIILTLGNYMNGGNRQRGQADGFNLDILGK
LKDVKSKESHTTLLHFIVRTYIAQRRKEGVHPLEIRLPIPEPADVERAAQMDFEEVQQQIFDLNKKFLGC
KRTTAKVLAASRPEIMEPFKSKMEEFVEGADKSMAKLHQSLDECRDLFLETMRFYHFSPKACTLTLAQCT
PDQFFEYWTNFTNDFKDIWKKEITSLLNELMKKSKQAQIESRRNVSTKVEKSGRISLKERMLMRRSKN
>Bni1 yeast
PRPHKKLKQLHWEKLDCTDNSIWGTGKAEKFADDLYEKGVLADLEKAFA
AREIKSLASKRKEDLQKITFLSRDISQQFGINLHMYSSLSVADLVKKILN
CDRDFLQTPSVVEFLSKSEIIEVSVNLARNYAPYSTDWEGVRNLEDAKPP
EKDPNDLQRADQIYLQLMVNLESYWGSRMRALTVVTSYEREYNELLAKLR
KVDKAVSALQESDNLRNVFNVILAVGNFMNDTSKQAQGFKLSTLQRLTFI
KDTTNSMTFLNYVEKIVRLNYPSFNDFLSELEPVLDVVKVSIEQLVNDCK
DFSQSIVNVERSVEIGNLSDSSKFHPLDKVLIKTLPVLPEARKKGDLLED
EVKLTIMEFESLMHTYGEDSGDKFAKISFFKKFADFINEYKKAQAQNLAA
EEEERLYIKHKKIVEEQQKRAQEKEKQKENSNSPSSEGNEEDEAEDRRAV
MDKLLEQLKNA
  • If you had difficulties with the alignment (which is likely), try a different scoring matrix: save the alignment, create a new project with the PAM250 or BLOSUM80 matrix, and open your file from within the new project (MACAW will now remember the matrix setting).
  • In the resulting alignment, try to select regions that would be suitable for phylogenetic analysis.
  • back to top
    back to the  bioinformatics excercise top page