CyVerse Engineer Creates iRODS Gocommands to Expedite Data Transfer for Astronomer's Research
Astronomer Joseph Long recently obtained his PhD from the University of Arizona with a thesis on giant exoplanets and high-contrast imaging. Long was part of the Extreme Wavefront Control Lab that developed MagAO-X (or Magellan Adaptive Optics Extreme), which is an adaptive optics instrument for the Magellan Clay telescope, a 6.5-meter telescope completed in 2002 and located at the Las Campanas Observatory in the mountains of Chile.
Long notes that their instrument records more data per night than a typical astronomy imaging instrument. "We take a lot of short exposures instead of a smaller number of long exposures," Long said. "There's a technical reason to do that, because the turbulence of the atmosphere is constantly changing and making stars twinkle. We want to take exposures short enough that they freeze the distortion from the atmosphere in an individual frame. Our instrument compensates for that distortion so we get sharper images and can see very faint things next to very bright things. It's called high-contrast imaging, and we use it to look for exoplanets, which are planets around other stars."
To get their data from the telescope, Long and his colleagues had been using the largest portable hard drives they could buy for their last three trips to Chile. Long said, "We originally planned to carry the hard drives back to Tucson and upload everything to the CyVerse Data Store from there. However, we were running into the limits of what a single portable hard drive can hold. On the last trip to Chile, I was working on how to maximize the data transfer speed between Las Campanas Observatory in Chile and the CyVerse Data Store."
As Long encountered obstacles with moving large amounts of data between continents, he turned to Illyoung Choi, Research and Development Engineer at CyVerse. Choi created Gocommands for iRODS that resulted in a faster interface to the CyVerse Data Store. Prior to this, Long had been telling his collaborators to connect remotely to the telescope itself and get the data from there, which was highly inefficient. With Illyoung's tools, Long's collaborators will now be connected over a U.S. research network to the CyVerse Data Store instead of competing with the limited bandwidth at the observatory. The new method will allow data to be automatically transferred from the instrument to the Data Store, and Long and his team can give users access via the Data Store.
Long said, "Illyoung's tool helps us get the data off the telescope. He has done some clever things to bundle up collections of small files and use multiple streams to maximize the use of the network link between the two locations, and this has allowed us to get close to the theoretical bandwidth limit between the locations."
Choi has been working with CyVerse for the past eight years. His first job as a student research assistant while completing his PhD involved working on a program called iRODS FUSE (Filesystem in Userspace), which allows users to access the CyVerse Data Store easily. Choi's research for his thesis focused on cloud storage for scientific research, specifically a way to transfer large-scale research data between cloud storage and compute clusters. After graduation, he joined CyVerse full-time. Choi's work at CyVerse involves developing various client programs for the Data Store, including Gocommands, iRODS CSI Driver, and SFTP (Secure File Transfer Protocol) interface.
Long said, "One of the things we were working on was a two-pronged approach to high-contrast imaging data management. We need to archive these data and make them searchable. The tool I built for that is still a prototype, but it can connect to CyVerse'e Data Store, read into the files, and index the metadata, such as time of observation, conditions, and target coordinates, by pulling the FITS (Flexible Image Transport System) files." To make this tool usable for analysis, there is also a way to query for observations that match particular conditions.
This fall, Long will be working as a fellow at the Flatiron Institute at the Simons Foundation Center for Computational Astrophysics on a tool that distributes computations on large datasets. The tool will allow him to start an analysis process that reads in just the data the tool is responsible for directly from the cloud storage backend, which prevents the need to replicate the entire dataset to every computer when using multiple computers to perform the computation. This reduces the amount of duplicated network transfer and overhead. Long said, "The thing I need to make that work is the catalog and Data Store backend, and that's what Illyoung helped with."
Today's data-intensive research workflows require scientists like Dr. Long to learn the vocabulary and some of the techniques of software engineers. Cataloging data, transporting it to the computers where it will be analyzed, and archiving it for later research all become more complex as data volumes exceed the capacity of a single laptop or workstation. By delegating these functions to the CyVerse Data Store, the MagAO-X team can provide guest investigators with more convenient access to their data, while also enabling new kinds of analyses.