Personal tools
You are here: Home Bioinformatics Standards Documents FASTA format description
Document Actions

FASTA format description

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data.
The description line is distinguished from the sequence data by a greater-than (">") symbol at the beginning of the line.
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, U and * are acceptable letters. N is used for unknown nucleic acid residue and X is used for unknown amino acid residue.

An example sequence in FASTA format is:
>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK

Powered by Plone, the Open Source Content Management System