A man holds a tuff sample from East Africa's rift valley. Photo by Kevin Krajick/Earth Institute.
15 Oct, 2020

A tuff sample from East Africa's rift valley. Photo by Kevin Krajick/Earth Institute.

An Innovative Cyberinfrastructure Will Create 'The lnternet of Samples'

A National Science Foundation award to the Lamont-Doherty Earth Observatory and partner institutions will integrate scientific samples into a digital data ecosystem. CyVerse users will be able to register samples and index their metadata through the CyVerse Data Store.

The National Science Foundation (NSF)'s Cyberinfrastructure for Sustained Scientific Innovation program has awarded the Lamont-Doherty Earth Observatory and partner institutions $4 million to develop an innovative cyberinfrastructure that will integrate scientific samples into a digital data ecosystem: iSamples, the 'Internet of Samples.' The project is collaborative with the University of Arizona (UArizona), University of Kansas, and UC Berkeley Data Science Institute, with subawards to Open Context, a data system for archaeology, and the Smithsonian Institution of Washington.

Lamont geoinformatics specialist Kerstin Lehnert is serving as principal investigator. All too often, she says, researchers go to great trouble and expense to collect samples at sea, in polar regions, or in other remote areas, only to have the samples gather dust or even get thrown away after they've served their original purpose. Over the next four years, the iSamples project will create a system where researchers across disciplines and throughout the world can share and access metadata on existing samples.

"This will establish an infrastructure that hopefully changes with culture. In many universities and institutions, collections get thrown out. With iSamples they will be [uniformly] registered and their use can be tracked, paving a way to demonstrate the value of these collections," said Lehnert.

The iSamples project comes at an especially opportune moment. Since March, the pandemic has curtailed fieldwork.

"A lot of researchers and students will have to rely on the access to information and data online in the context specifically of the iSamples project. With the COVID restrictions for travel, researchers haven't been able to go into the field to collect new samples and so the access to existing collections has never been more important," said Lehnert.

The first two years of the project will largely focus on the design and development of the cyberinfrastructure components, and years three and four will focus on implementing the iSamples services at several data systems serving different purposes and communities, including the Lamont-operated System for Earth Sample Registration (SESAR), the Smithsonian Institution of Washington's natural history collections database, and the Genomic Observatories Metadata Database (GEOME). There will be ongoing community engagement throughout the grant to gather feedback from other potential users.

Ramona Walls, a former CyVerse science informatician and now an assistant research professor at UArizona's BIO5 Institute, will lead the university's portion of the grant.

A headshot of Ramona Walls, smiling with trees behind her."My role is multifaceted," Wall said. She will oversee the development of parts of the project's technical architecture, as well as the development of an interdisciplinary materials-centered data model, ensuring that the model correlates with the way existing disciplines currently label and track their associated datasets.

Walls will act as a liaison to the life sciences community, particularly for groups such as the Biodiversity Information Standards and the Genomics Standards Consortium.

One implementation of the iSamples tiered infrastructure will be integrated with CyVerse, enabling CyVerse users to register samples, get an ID for them, and index their metadata as soon as they initiate a project.

"Scientists need to be able to record metadata about samples as soon as they are collected," Walls said. "This will allow them to do that and integrate sample-based data with their CyVerse projects."

After all, she added, "almost any project involves the collection of a sample."