Skip to main content

Table 1 Terminology

From: Data integration in biological research: an overview

Schema

A structured and “queryable” way of storing data

Database

A single or collection of schemata

Sources

A number of databases that contain data. Data that reside in each source can either duplicate and/or complement data from other sources

Data Integration

The process of combining data that reside in different sources, to provide users with a unified view of such data

Data Standards

Agreements on representation, format, and definition for common data

Data Formats

A structured way to represent data and metadata in a file

Data Warehousing

Model for integrating data where the data from different sources reside on a central repository (aka data warehouse)

Federated Databases

Model for integrating data where the data reside on the original sources and users are provided with a unified view of the data based on mapping mechanisms of the information

Linked Data

The network of interlinked data that is available on the web. It is used to automatically share semantically rich information and represents the biggest attempt to convert significant amounts of human knowledge across all fields in a computer readable format

Ontology

A structured way of describing data, often presented in a computer-readable format. In bioinformatics, ontologies are sets of unambiguous, universally agreed terms used to describe biological phenomena and “entities”, their properties and their relationships

lled Vocabulary

A collection of terms for describing a certain domain of interest

Unique Identifier

A unique representation for a biological entity (molecule, organism, ontology term, etc.). Usually an alphanumeric string that is used to refer to this entity and distinguishes it from others (much like ID or passport number in humans).

Metadata

Data describing data, i.e., additional information (e.g., a comment, explanation, attributes, etc.) for a specific biological entity or process. As an example, in the context of an ontology, this is used to specify significant properties of the ontology

Annotation

The process of attaching relevant information (metadata) to a raw biological entity

Automatic Annotation

Automatic means that the annotation is being done by computer software (often by transferring information from a source to another). This is a way of producing a large amount of metadata

Manual Annotation

As opposed to automatic annotation, manual means that an actual individual does it

GUI

Graphical User Interface. Is the way that a user interacts with a computer by using graphical icons and visual indicators such as buttons, forms etc. In the scope of this paper we are using the term GUI to refer to interfaces that allow biologists to search/read/edit integrated biological data

API

Application Programming Interface. Set of tool and protocols that a power user can use in order to automatically gain access to functionality and/or data that have been developed/gathered by another individual/organisation

UX

User eXperience. The process of improving user satisfaction by focusing on the usability of a given product.

Visualisation Tools

Applications that help biologists view the data in a more human-friendly way (e.g., Cytoscape for visualising complex networks) like 3D or graph representations of the data