Databanks

We provide an access to a large set of public biological databanks in different formats (FASTA, genbank, hmm...). They are stored and accessible for all users. Their update is performed and managed with the BioMAJ software package, or manually.

Update system

We use to manage most of the databanks. BioMAJ (BIOlogie Mise A Jour) is a workflow engine dedicated to data synchronization and processing.
The software automates the update cycle and the supervision of the locally mirrored databank repository. Common usages are to download remote databanks (Genbank for example) and apply some transformations (blast indexing, emboss indexing, etc.).
Any script can be applied on downloaded data. When all treatments are successfully applied, bank is put in "production" on a dedicated release directory. With cron tasks, update tasks can be executed at regular interval, data are downloaded again only if a change is detected.

Access

Available databanks are stored in a specific directory accessible from front and all cluster nodes.

/db/

Each databank is stored in a directory named as the databank name. The arborescence is conform to the remote source of the databank. A current link indicates the last updated databank. For example, if you want to use the last version of uniprot, the FASTA file is accessible from this path :

/db/uniprot/current/fasta/uniprot.fsa

Some tools need their own databanks and provide them. You will find them in /db/outils/ directory.

Available databanks

Managed with BioMaJ

Manually updated

Ask for a databank

For asking a new databank, first make sure the data are not restricted to a particular license and fill the dedicated form.