Databanks
We provide an access to a large set of public biological databanks in different formats (FASTA, genbank, hmm...). They are stored and accessible for all users. Their update is performed and managed with the BioMAJ software package, or manually.
Update system
We use to manage most of the databanks. BioMAJ (BIOlogie Mise A Jour) is a workflow engine dedicated to data synchronization and processing.
The software automates the update cycle and the supervision of the locally mirrored databank repository. Common usages are to download remote databanks (Genbank for example) and apply some transformations (blast indexing, emboss indexing, etc.).
Any script can be applied on downloaded data. When all treatments are successfully applied, bank is put in "production" on a dedicated release directory. With cron tasks, update tasks can be executed at regular interval, data are downloaded again only if a change is detected.
Access
Available databanks are stored in a specific directory accessible from front
and all cluster nodes.
/db/
Each databank is stored in a directory named as the databank name. The arborescence is conform to the remote source of the databank. A current link indicates the last updated databank. For example, if you want to use the last version of uniprot, the FASTA file is accessible from this path :
/db/uniprot/current/fasta/uniprot.fsa
Available databanks
Managed with BioMaJ
Manually updated
Ask for a databank
For asking a new databank, first make sure the data are not restricted to a particular license and fill the dedicated form.