Jump to content Jump to navigation

Data tools

One of the tools Naturalis uses is an Extract, Transform and Load tool (ETL).

This software has been developed for the migration of 200 legacy databases from the Collection sector to the central registration system within Naturalis, named CRS. The tool is needed because the legacy systems have a strongly heterogeneous structure in terms of technology and data. 

The data from the legacy systems will be transformed by the ETL into values that can be imported into the CRS. Following this process the data can be exported as a load set and template that the import module of CRS can use. Sometimes fields can be transferred on a one-to-one basis and sometimes a transformation rule is needed that only transfers data if these satisfy certain conditions. The output of the ETL is a database that in terms of structure and restrictions is the same as the database behind the CRS. 

An example of a transformation rule that is programmed in the ETL:

Legacy database field name Aim database CRS field name Transformation value
storageContent_Locality AND storageObject_Locality LOCALITYTEXT If storagecontent_Locality is not empty, then: storagecontent_Locality. ELSE: If storageobject_Locality is not empty, then: storageobject_Locality

 

A vital aspect in the database migration procedure is the checking of the transformations made by the ETL. For this the ETL export can make a virtual registration number that seeks a record in the database is with an entered value for each transformation rule in the ETL. This results in an ‘object’ that does not exist but for which every database field is filled. It therefore forms and ‘ideal’ test record.

The setting up of an ETL has proven to be an important investment for safeguarding quality and efficiency during the migration of large heterogeneous datasets.