Empowering Agricultural and Ecological Research
CyVerse scientific analyst Tyson Swetnam partnered with NVIDIA to accelerate geoinformatic research.
Finding an answer for HPC resource accessibility
The field of remote sensing plays a critical role in helping us understand how flora and fauna change over time due to the change in environmental conditions. Today, we can observe the earth from an ever-growing constellation of space-based satellites, occupied aircraft with advanced sensor technology, small unoccupied aerial systems, or drones. High-performance computing (HPC) and artificial intelligence (AI) are key technologies that help us understand the present state of the environment and what kinds of changes are happening to the landscape.
The research areas in this field range from studying how different plant varieties survive in a specific environment to mapping homes that are in areas susceptible to wildfires. In the summer of 2020, wildfires burned millions of acres across the western United States, with many millions more burned globally. Feeding people, protecting homes and crops, and ensuring that forests can survive a wildfire require researchers to have up-to-date data on conditions out in the field. Data from remote sensing is critical to predicting where the next fire might occur and what risks it could pose to local towns and communities.
Mapping the terrain
It's a big challenge to study the patterns and variations that occur on land masses
over time. Researchers like Tyson Swetnam, an assistant professor of geoinformatics at the University of Arizona, fly drones fitted with cameras and light detection and ranging (lidar) sensors to map terrain and vegetation. Lidar is a remote sensing method that uses light pulses to measure variable distances on the Earth. These light pulses – combined with other data recorded by the airborne system – generate precise, three-dimensional (3D) data about the shape of the Earth and its surface characteristics. The pictures from the drone's cameras are also converted into 3D datasets, similar to the lidar, using a technique called structure-from-motion, multi-view-stereo photogrammetry. Stereoscopy and photogrammetry have been around for over 150 years, but today, with advanced computing, we can convert the pixels from photographs into 3D point clouds.
The drones fly pre-planned missions from 15 minutes to over 90 minutes long, with flights occurring during the middle of the day to make sure the terrain is evenly illuminated and shadows are reduced. Some agricultural researchers are interested in flying drones as often as every day during the growing season, while other scientists may collect data once or twice a year over a larger area.
Containers boost research
Leveraging the power of GPUs is only part of the overall solution stack used by Swetnam and his team. He needed an easy way to simplify the deployment of the software tools used for visualization on GPU-based clusters. He found it in the software containers offered in the NVIDIA NGC catalog.
GCU-optimized application containers remove much of the grunt work for HPC researchers. Applications and all of their relevant dependencies are packaged in self-contained environments that are agnostic to any underlying hardware or other installed software. Because of this, containers eliminate the installation process, and application deployments can be completed without impacting other applications on the cluster.
Swetnam used the OpenGL, TensorFlow, and PyTorch containers from the NGC catalog to help analyze the data on the GPU-based clusters. Aside from being portable, allowing Swetnam to remove the containers between his T4-based compute node hosted on CyVerse's Visual and Interactive Computing Environment and UArizona's GPU-cluster system, the containers offered the added benefit of reproducibility. An important part of his research is to help others in the same research area by sharing his data, findings, and analytical pipelines. With NGC, other researchers can simply reuse Swetnam's containers for their own research or to corroborate his findings, without having to go through the complicated process of recreating his exact environment to run their applications on.
Containers also benefit system administrators who are responsible for providing the infrastructure required to run these HPC applications. They're optimized to support the most recent versions of applications and ensure maximum performance. They make it easy to put the latest features and capabilities in the hands of researchers who want them, but they also continue to support those who want to continue using older versions of apps. And they enable IT to deliver all of this relatively effortlessly.
"Containers are integral to our HPC services. In December 2020 alone, Singularity, our container technology of choice, was used 32,000 times. We have been using containers from the NGC catalog since 2018, and for us, it improves time to results," said Chris Reidy, principal HPC system administrator at UArizona. "For example, Tensorflow was typically difficult to build. But not with the NGC container. We were able to offer a Singularity container with Tensorflow to our researchers and update it regularly. Another case comes from the fact that much of our users' code is built on Ubuntu. It is trivial to run an Ubuntu container on our CentOS compute nodes as long as the underlying kernel is consistent."
Enabling precision agriculture
The 3D point cloud data generated by Swetnam's team is key to enabling other researchers, such as UArizona's precision agriculture team. They study plant phenotypes to understand which traits of a plant are expressed or suppressed due to external conditions, including temperature, soil, quality, moisture and more.
The team applies the 3D point cloud data to their machine learning models to predict characteristics such as plant maturation, canopy height, structure, and the onset of diseases like charcoal rot of sorghum. Their goal is to create a mobile application that lets farmers assess the onset of diseases by simply taking a picture from a mobile phone or flying their own drones.
The 3D point cloud data is an essential input to their machine learning models, as they're able to use the terrain data beyond their two-acre experimental site. Using the highly visual point cloud data allows them to better understand the parameters required to improve their machine learning models.
HPC at the edge
Swetnam and is team are also working at the edge – in the field with drones that will process data as it's collected. By training ML models in agricultural and natural environments, the team plans to deploy drones that in real time identify features from video and imagery. In doing so they will save farmers and foresters significant time in collecting, downloading, and generating imagery and 3D point cloud data. Containers and deep learning models from the NGC catalog integrated through CyVerse's flexible computing platform make this possible, playing a critical role in the team's overall development-to-deployment process.