Skip to main content

Table 3 Mostly commonly used data formats in bioinformatics

From: Data integration in biological research: an overview

Data format class

General data-

Nucleotide sequence

Protein sequence

Structural

Sequence

Other data

 

interchange formats

data

data

data

alignment

types (PPI, etc)

Tabl

CSV, TSV

BED; GFF

GFF, Uniprot-GFF

PSF(D), MMCIF(D)

SAM(D)

 

FASTA-like

 

FASTA; FASTQ

FASTA, PIR

 

SAM(M)

Wig

GenBank-like

 

GenBank; EMBL

Uniprot-TEXT

PDB, PSF(M), MMCIF(D)

CLUSTAL, MSF,

 
     

PHYLIP(D)

 

Tag-structured

HTML; XML; JSON

SBOL-XML

Uniprot-XML;

  

PSI MI-XML;

   

Uniprot-RDF/XML

  

PSI-PAR

  1. D = data; M = metadata. Formats appearing in more than one class are a mixture of classes