Skip to main content

Table 3 Mostly commonly used data formats in bioinformatics

From: Data integration in biological research: an overview

Data format class General data- Nucleotide sequence Protein sequence Structural Sequence Other data
  interchange formats data data data alignment types (PPI, etc)
Tabl CSV, TSV BED; GFF GFF, Uniprot-GFF PSF(D), MMCIF(D) SAM(D)  
FASTA-like   FASTA; FASTQ FASTA, PIR   SAM(M) Wig
GenBank-like   GenBank; EMBL Uniprot-TEXT PDB, PSF(M), MMCIF(D) CLUSTAL, MSF,  
      PHYLIP(D)  
Tag-structured HTML; XML; JSON SBOL-XML Uniprot-XML;    PSI MI-XML;
    Uniprot-RDF/XML    PSI-PAR
  1. D = data; M = metadata. Formats appearing in more than one class are a mixture of classes