CABRI Flat File Specifications
The catalogues are submitted to CABRI in a flat file format, and to facilitate easier (and indeed subsequent automated)
indexing procedures, a standardisation of this format is required.
Here follows the guidelines for submission of a flat file for indexing to CABRI
General Definitions
Before proceeding to the analysis of this text, it is important that you have a clear understanding of the following terms.
Catalogue |
A catalogue consists of a group of unique entries, each of which includes information related to one item (strain, cell line, plasmid, ...)
of the collection. |
Entry |
An entry consists of a set of fields for a particular product. |
Field |
A field in an entry specify a particular information about the product which is described in the related entry. Fields are
composed by a label and the information contents, which must be separated by any number of spaces and tabs. All fields are included
in one of three predefined data sets: minimum, recommended or full. |
Field label |
The label of the field is a predefined string of characters that specifies the meaning of the following information contents.
Labels cannot include spaces or tabs, but can include the undersign symbol _.
Labels are predefined for each biological material and are listed at the end of this text.
Field labels different from the predefined ones are not allowed and related information contents won't be indexed. |
Information contents |
The information contents must adhere to the related data input procedures.
It should not include HTML tags. It can be broken by a newline, but all following lines of text must
either have a blank space or tab in the first column or include the field label. |
Minimum Data Set (MDS) |
It consists of mandatory information needed to identify a unique item of the collection: strains for which this information
is not available cannot be inserted into the catalogue since they lack some essential data. |
Recommended Data Set (RDS) |
It includes useful information for an improved description of the material. This data should always be included in
the catalogue, when available. Since it is not always available, strains can be listed in the CABRI catalogues when it is missing. |
Full Data Set (FDS) |
It provides all remaining information that is available at the collection for a strain or cell line. Since the original CABRI
catalogues were indipendently build they do not share a common data set and each collection can have its own FDS, although
information which are available in the FDS undergo a homogenization effort. |
|
|
Practical rules
·
A catalogue shall consist of a collection of unique entries, an entry being a set of fields for a particular product.
|
Example of an entry from the BCCM/IHEM catalogue
Strain_number IHEM 306
Other_collection_numbers -
Organism_type Fungi
Name Aspergillus flavus Link : Fries
Restrictions pathogen class H2
Conditions_for_growth medium MDPA, 25C
Status -
History -
Pathogenicity mycotic sinusitis
Geographic_origin Belgium, Brussels
Isolated_by N. Nolard 1978
Determined_by N. Nolard 1978
Form_of_supply dried
|
|
·
Entries in the catalogue should be separated by at least one blankline that facilitates easier visual viewing of the file format
|
Example of two consecutive entries from the bacteria catalogue
of CABI Bioscience
Strain_number IMI 185481
Other_collection_numbers NCIMB 9131; NCTC 9757; ATCC 12472
Name Chromobacterium violaceum, Bergonzini 1881
Organism_type Bacteria
Restrictions -
Status Type of Chromobacterium violaceum Bergonzini
History NCIMB > 1974 IMI
Isolated_from -
Conditions_for_growth MA, 23
Form_of_supply Freeze Dried
Strain_number IMI 104402
Other_collection_numbers -
Name Micropolyspora faeni, Cross et al. 1968
Organism_type Bacteria
Restrictions -
Status -
History J. Lacey A94 > 1964 IMI
Isolated_from Mouldy hay
Geographic_origin UK
Conditions_for_growth MA PDA MGT23 DL
Form_of_supply Freeze Dried
|
|
·
Every entry in a catalogue shall start with the same field which must be present at the start of all entries in that particular catalogue.
It is important to remember that the contents of the first field of each entry must be unique in the catalogue and, by convention,
CABRI therefore requires that some form of accession number is used as the first field.
The order of the fields that is reported in the list below must be followed.
|
Acceptable (fictious example)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
Accession_number 13
Brief_description Human urinary bladder carcinoma
Morphology epithelial-like adherent cells
Depositor Dr. A. Non, Somewhere, USA
Original_paper Nature 1988;32:765-771
|
Unacceptable (fictious example)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
Brief_description
Human urinary bladder carcinoma
Accession_number 13
Morphology epithelial-like adherent cells
Depositor Dr. A. Non, Somewhere, USA
Original_paper Nature 1988;32:765-771
|
|
·
Every catalogue shall have each new field commence on a new line, and these fields shall not be indented from the left margin.
|
Acceptable (fictious example)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
|
Unacceptable (fictious example)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
|
|
·
Every field shall commence with an appropriate field label. Lists of field labels is available below.
Labels start with a capital letter and don't include any other capital letter.
Every field name will consist of a single word with no blank spaces.
Special cases (e.g. acronyms) are considered where the whole field name can be upper case, but this only applies to fields of the Full Data Set.
|
|
·
Information contents within a field must be separated from the field label by white spaces, possibly only one blank space.
|
Unacceptable (fictious example)
Accession_number 12
Brief_descriptionHuman B cell precursor leukemia
Morphology: small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
|
|
·
An entry need contain all the fields of the Minimum Data Set (MDS) particular to the organism type of a catalogue
(see the list of MDSs).
There are some cases, where it is acceptable for scientific reasons that some information may be missing or undefined or not appropriate
for a subgroup of items. In these cases, if an entry does not have any data for a field of the MDS, this must be
included anyway in the catalogue and a dash character ("-") must be put instead of the missing information.
|
Acceptable (fictious example) (if "Depositor" is included in the MDS and there is no information available)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor -
Original_paper Blood 1981;20:130-135
|
Unacceptable (fictious example) (if "Depositor" is included in the MDS)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor
Original_paper Blood 1981;20:130-135
Equally unacceptable (fictious example) (if "Depositor" is included in the MDS)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Original_paper Blood 1981;20:130-135
|
|
·
An entry must also contain all the fields of the organism type specific Recommended Data Set (RDS) for which it has some data (see the
list of RDSs).
Instead, it must not contain those fields of the RDS for which it has no information.
|
Acceptable (fictious example) (if "Tissue" is included in the RDS)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
Tissue blood
|
Unacceptable (fictious example) (if "Tissue" is included in the RDS)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
Tissue -
Equally unacceptable (fictious example) (if "Tissue" is included in the RDS)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
Tissue
|
|
·
An entry can also contain other fields, neither included in the MDS nor in the RDS, particular to the catalogue, for which it has some data.
These fields make up the Full Data Set (FDS) of the catalogue and cannot be included when they do not contain any data.
|
Acceptable (fictious example) (if "Stocks" is included in the catalogue's FDS)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
Stocks 30
|
Unacceptable (fictious example) (if "Stocks" is included in the catalogue's FDS)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
Stocks -
Equally unacceptable (fictious example) (if "Stocks" is included in the catalogue's FDS)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
Stocks
|
|
·
Fields may span multiple lines. However, if a field does span multiple lines, subsequent lines must either commence with the same field name or
be indented from the left margin with white space of at least one blank space.
|
Acceptable, preferred format (fictious example)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
Equally acceptable (fictious example)
Accession_number 12
Brief_description Human B cell precursor
Brief_description leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
|
Unacceptable (fictious example)
Accession_number 12
Brief_description Human B cell precursor
leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
|
|
·
Field labels must be consistent throughout the entries and adhere to the CABRI standard labels
(see below), with no discrepancy in spelling or case.
They must also be listed according to the agreed order.
|
Acceptable (fictious example)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
Accession_number 13
Brief_description Human urinary bladder carcinoma
Morphology epithelial-like adherent cells
Depositor Dr. A. Non, Somewhere, USA
Original_paper Nature 1988;32:765-771
|
Unacceptable (fictious example)
Accession_number 12
Brief_description Human B cell precursor leukemia
Morphology small, round cells in suspension
Depositor Dr. J. Soap, Anywhere, USA
Original_paper Blood 1981;20:130-135
Brief_descr
Human urinary bladder carcinoma
Accession_number 13
Morphology epithelial-like adherent cells
Depositor Dr. A. Non, Somewhere, USA
Reference_paper Nature 1988;32:765-771
|
|
First submission of catalogues
·
When a catalogue is being submitted for indexing for the first time, a file containing the following information must accompany it.
The name of the accompanying file should be composed by the name of the catalogue and the extension ".inf",
e.g., "cbs_fil.inf" and "dsmz_bact.inf".
- The name and a brief description of the catalogue itself and what it contains, with some contact information.
Example:
The X-Lab Cell Culture Catalogue contains numerous cells from various organisms.
Contact J. Soap@anywhere.net
- The list of all possible fields within the catalogue.
- The MDS fields for that organism type must always be present.
- The RDS fields must be present if they are included at least for one entry.
- All FDS fields must be listed.
- Each field name must be followed by the acronym of the data set to which it belongs enclosed within parentheses.
Example:
Accession_number (MDS)
Brief_description (MDS)
Morphology (MDS)
Depositor (MDS)
Original_paper (MDS)
Tissue (RDS)
Stocks (FDS)
- A description of all FDS fields and their meaning.
Example:
Stocks
Meaning: the number of stocks maintained at the collection
- A list of any special symbols which are used within the catalogue,
e.g. accented letters (à, ü), other national letters (ñ, ß) and symbols (©, etc...).
Please remember that the language to be used is English and that greek letters should be avoided.
See Symbols section for more information.
- A list of any fields which are to be linked to other fields within other reference lists submitted as a separate catalogue.
This would facilitate indexing in a manner which would provide, e.g., a hypertext link of all names of media in the Media field to the actual
composition of each media in the Media catalogue.
Example:
"Media" field of "Cell Catalogue" links to "Name" field of "Media Catalogue"
- A list of fields for inclusion in the CABRI Search Set (CSS).
The CABRI Search Set (CSS) is designed to allow users a greater deal of flexibility in searching CABRI while maintaining a simple search interface.
Interrogating the CSS allows for increased control over what sections of the catalogues are searched.
At present, the CSS has two input fields: Identification and Name.
When a catalogue is being submitted to CABRI, it is necessary to nominate one or more field(s) to be included
for each of the CSS fields.
Ideally the number of fields to be included under each of the umbrella CSS fields should be kept to a minimum as the intention is for the CSS to be a
search on specific aspects of a catalogue. However, at least one field must be nominated for each CSS field.
All the remaning fields of the catalogue will be considered by the CABRI search engine when searching through the "Other text" input field (see the CABRI
search page).
Example:
Fields for inclusion in CSS
Identification: Strain_number, Other_collection_numbers
Name: Name, Infrasubspecific_names
|
|
·
After the first time, the information file must only be submitted when a change in the catalogue occurs with reference to one or more of the
following points.
- The MDS or RDS have changed (according to some decision of the CABRI Technical Committe).
- The FDS has changed, i.e. some fields have been added to/removed from the FDS.
- Some essential information of the description has changed (e.g., the contact person).
- Some links are no more used or new links have been added.
- The fields included in the CSS have changed.
|