NSF Backs Bioinformatics Approach to Understanding Plant RNA Modifications

Aug. 11, 2020

CyVerse co-principal investigator Eric Lyons co-leads the team of researchers who will identify RNA modifications and develop resources that may lead to hardier crops.

Image
BTI NSF news banne

RNA perform a variety of functions in cells, helping with everything from regulating genes to building proteins. In recent years, it has become clear that chemical modifications to RNA help guide these functions, but only a handful of these modifications have been identified in plants.

On July 24, Andrew Nelson, a faculty member at the Boyce Thompson Institute, and collaborators received a $2 million award from the National Science Foundation (NSF) to identify and infer the functional significance of dozens of different types of RNA modifications in 15 diverse model and crop species. Resources developed by the project will make it easier for plant scientists to utilize and expand upon the discoveries.

 

Image
Nelson

BTI Assistant Professor Andrew Nelson. Photo by Anna Nelson Dittrich

The project also places a strong focus on building undergraduate curricula teaching biology as a data-driven science.

"If these RNA modifications have the impact that we think they will," Nelson explained, "researchers will be able to do some very targeted gene editing in their favorite species and potentially make more stress-tolerant crops, which is becoming increasingly important because of the effects of climate change."

Nelson is joined in this effort by project co-leaders Rebecca Murphy, an associate professor of biology at Centenary College of Louisiana, Brian Gregory, an associate professor of biology at the University of Pennsylvania, and Eric Lyons, an associate professor of plant sciences at the University of Arizona and co-principal investigator of CyVerse.

The first step will be led by Nelson at BTI. His team will map more than a petabyte of publicly available RNA sequence data from at least 15 different species back to their respective genomes, including important crops such as corn, rice, wheat, and cotton. For perspective, a petabyte is approximately the same amount of data it would take to stream a playlist of music for 2,600 years.

"The amount of publicly available RNA sequencing data for these 15 plants has tripled in the last two years," said Nelson. "It's an incredible resource."

After the data are processed, Gregory's team will run them through two different algorithms. The first algorithm, called HAMR, was developed by the Gregory lab. HAMR capitalizes on flaws in RNA sequencing technologies, and can identify up to 45 different modifications based on the pattern of mistakes. The second algorithm, called PEA, identifies two important RNA marks that HAMR cannot detect.

Once the modifications have been identified, Nelson will develop a pipeline for identifying the context in which they occur. Do the RNA modifications show up only in roots? Are they present on the same gene in many related species? Do certain genes get modified by a specific mark only under drought conditions? By answering questions such as these, he hopes to identify specific RNA modifications that underlie critical cellular processes.

All of these data, as well as the workflow used to process them, will be made available to scientists and the public. This effort, along with additional data analysis and management, will be headed by Lyons..

"We are going to release our data as a curated list that researchers can use to

Image
Eric_L-headshot

CyVerse co-principal investigator Eric Lyons.

generate hypotheses," Lyons explained. "In addition, we will be releasing our code and workflows for others to replicate and reuse our work."

"One of the key challenges of this project will be to process approximately 1 petabyte of raw data. The data processing systems, which will use a combination of local and national cyberinfrastructure resources such as CyVerse and XSEDE, will be used for others wanting to process biological data on scales rarely reached by individual research groups."

The potential of the data generated by the project is vast, emphasized Gregory. "Hopefully, this large-scale resource will allow us and others to focus on the RNA modification sites that are truly important to crop plant stress responses," he said, "in turn allowing us to utilize the knowledge for future crop improvement."

Undergraduate involvement will be a key element of the project. Murphy will introduce students to bioinformatics, RNA sequencing, and genomics through coursework at her primarily undergraduate institution. In the summer, a number of these students will travel to BTI to participate in immersive bioinformatics training as well as in vivo biomolecular work.

"Students will be able to hone their computational and data analysis skills while making real contributions to cutting edge science," said Murphy.

Teaching coding skills to undergraduates is imperative, Nelson added: "Bioinformatics used to play a supporting role in plant biology. Now it is actually driving much of the discovery."

Nelson stressed that collaborative funding opportunities such as those offered by the NSF make ambitious projects like this practical, adding, "this project wouldn't be possible without three amazing collaborators. I think together we will probably uncover some very fundamental principles of RNA biology."

The NSF grant (no. IOS-2023310), entitled, "TRTech-PGR: Identification and characterization of stress-responsive and evolutionary conserved epitranscriptomic modification sites in plant transcriptomes," is in the amount of $2,022,004.

 

Create Account

An Open Science Workspace for Collaborative Data-driven Discovery