CyVerse ‘Power User’ Joins Boyce Thompson Institute
Andrew Nelson, once an aspiring veterinarian, changed his career plans when a professor told him he could get paid to do science.
The revelation inspired him to swap pursuit of a DVM for a PhD, following his enjoyment of laboratory work as an undergraduate at Southwestern Oklahoma State University to a graduate program in biochemistry and biophysics at Texas A&M.
As a graduate student, Nelson studied the telomerase complex – a ribonucleoprotein that maintains the ends of linear chromosomes – in Arabidopsis (thale cress, or small flowering plants related to broccoli and cabbage to the less botanically inclined).
He brought his interest in RNA molecules with him to a postdoctoral position in plant sciences in the Beilstein Lab at the University of Arizona.
Nelson was especially interested in how lncRNAs (long non-coding RNAs – large, mysterious RNA molecules that do not encode proteins as many RNAs do) were evolving in plant genomes. In humans, where they have been the most thoroughly studied, lncRNAs have been found to be important for crucial developmental mechanisms.
“I realized that the plant science community had a very poor understanding of how many lncRNAs there were in plant genomes,” Nelson said. “We didn’t have robust annotations, and where lncRNAs had been identified, those identification efforts were performed using completely different criteria – making it difficult to pick out the most interesting candidates for functional analysis.”
“There was an understanding that there were important genes out there that weren’t encoding for proteins, but we didn’t have a good understanding of what those looked like, or what they were,” he explained.
Nelson turned to the web, looking for various bioinformatics resources that might help him find a way of identifying lncRNAs from among existing publicly available genomic datasets.
His efforts led him to CyVerse via the project’s co-principal investigator Eric Lyons, a professor in the UA’s school of plant sciences and lead of CoGe, a CyVerse-powered platform for open-access comparative genomics research.
“When Nelson approached me to learn more about comparative genomics and bioinformatics, I could see it was the beginning of a long a fruitful set of collaborations,” said Lyons, “Andrew’s deep knowledge of plant evolution and lncRNA identification were a natural fit for comparative genomics and large-scale re-analysis of the terabytes of RNA data generated by hundreds of labs over the past decade.”
CyVerse worked for Nelson.
“I like to call myself a CyVerse power user now,” he said, “because I spend so much time using the wonderful tools that CyVerse provides for RNA sequencing, data visualization, and many other applications.”
Nelson quickly found tools available in CyVerse’s open-access platform that helped him reanalyze RNA sequences in publicly available datasets (also stored in CyVerse) – searching through information that was there all along, but had previously been overlooked because no one was looking for it.
He later developed two tools of his own, RMTA and Evolinc, that have been integrated into CyVerse’s Discovery Environment platform for software apps and tools. RMTA (Peri et al., under review) is an automated workflow for high-throughput RNA-Seq assembly and analysis, perfect for processing thousands of RNA-Seq datasets. Evolinc (Nelson et al., 2017) is a pipeline for rapid lncRNA identification from large volumes of RNA-Seq data.
Nelson then co-authored a proposal with CyVerse scientific analyst Upendra Devisetty and Rebecca Murphy at Centenary College of Louisiana to expand his RNA-mining across the 15 plant species that have been most thoroughly sequenced, and to use that data to identify lncRNAs and determine their functions using computational approaches and comparative analyses. The proposal was funded in 2018 by the National Science Foundation’s Division of Integrated Organismal Systems Plant Genome Research Program, and the investigation is ongoing, with over 40,000 different experiments already processed.
Over the past summer alone, while working with a team of undergraduate students, Nelson used the CyVerse apps’ ability to run the same analysis repeatedly to process over 22,000 sequences.
“That’s not going to happen with a resource other than CyVerse,” Nelson said. “The automation that the CyVerse team has put in place is incredible.”
Not to mention the data storage, he added. “We’re probably approaching half a petabyte of RNA sequencing data stored in CyVerse,” Nelson admitted, somewhat abashedly. “CyVerse is an amazing resource.”
Nelson is one of three from among over 100 applicants invited to join the prestigious Boyce Thompson Institute (BTI), which touted his bioinformatics expertise, a skill he’s gained through his work with CyVerse.
He’s looking forward to continuing his project to identify lncRNAs in those 15 plant species, as well as begin investigating how lncRNAs might help bolster the stress-coping abilities of tomatoes and their relatives that are naturally resilient to stressors such as heat and drought – a project which may have important agricultural applications with climate change.
“It’s nice to get paid to do something you like,” Nelson said.