Scientific research never stands still and research into biodiversity is no exception to this. The most recent developments are having an enormous impact on the requirements placed on current and future data storage.
For example, to improve the reproducibility and transparency of research increasingly higher requirements are being placed on the storage of research data. Yet at the same time the research field is changing and new techniques are being introduced. From next-generation sequencing to 3-D modeling, all of the techniques increasingly used within Naturalis require ever-larger amounts of reliable and rapidly available storage.
Besides the size of the data storage new requirements are also being placed on making the data available for fellow researchers, research funding bodies, and the wider public. Data is therefore being accessed from beyond the computer network of Naturalis and increasingly higher requirements are being placed on the availability of the data storage facilities.
Scalable storage based on Ceph
The existing storage solution was no longer able to satisfy these increasingly higher requirements. A radically different approach was needed and that was found in the form of Ceph.
Ceph is a relatively new storage solution based on a distributed object store that allows large quantities of data to be stored very reliably. The open source software can be installed on standard, ‘commodity’ servers in which the software uses a smart algorithm to ensure the distribution of pieces of data (objects) across the available servers. Furthermore, Ceph is a solution that can meet the needs for three important types of storage: block, object and file system.
By using Ceph as a bulk storage solution, Naturalis meets the needs of several important use cases:
- Block storage in the form of images and volumes that, in combination with rapid local storage, can be used for analysis of biodiversity data in the cloud environment;
- Object storage disclosed by means of a S3 API for data preservation and public dissemination;
- Storage for the backup of primary data.
Ready for the future
As a result of the fundamental choice for an extremely scalable open source solution, Naturalis is now in a better position to respond to future demands. An increasing need can be met by scaling up the size of the storage without having to implement expensive changes in the architecture. Changing demands can also be responded to more easily by making use of open source software.
Furthermore, Ceph is a highly cost-effective solution. Instead of paying a lot of money for software licenses and then being bound to specific hardware, cheaper commodity hardware can be used and investments can be made in internal knowledge and external support.