Building Novel Genomes for New Study Species

Aug. 19, 2020

CyVerse collaborators hosted a workshop at this year's virtual Botany 2020 conference to help fellow researchers without genomics experience learn to build novel genomes using CyVerse.

Image
NMG n

Iochroma cyaneum, blue trumpet flower, is one of Susan Strickler's study species. Photo by D. Constantine.

It can be hard to study a species' genetics without its genome.

And for many researchers working on non-model organisms – those that are less commonly studied and therefore lesser documented – genomes rarely exist. At least until you build them yourself.

Jacob Landis, a postdoctoral researcher at Cornell University, noticed that building novel genomes can be a common challenge among phylogeneticists and systematics researchers, many of whom have never been trained in genomics.

"Almost all the researchers I spoke with wanted to start doing genome sequencing of their species of interest," Landis said, "but almost none had any idea what that actually entailed."

 

Image
Jacob

Jacob Landis

Landis set out to change that, recruiting collaborators from his community, including Boyce Thompson Institute (BTI) Computational Biology Center (BCBC) Director Susan Strickler, and BTI assistant professors Andrew Nelson and Fay-Wei Li.

Together, they designed, piloted, and ran a Non-Model Genomics (NMG) Workshop at this year's fully virtual Botany 2020 conference. Many conference attendees typically are working on non-model organisms, Landis explained, so the workshop provided a welcome option – as reflected by the session's overwhelming popularity.

Capped at 15, the full-day online workshop accrued a waitlist of 120. Attendees included undergraduate and graduate students, postdoctoral researchers, and experienced faculty members.

"Workshops are more accessible to a wider variety of people when they are online," Strickler observed, but she lamented the loss of in-person interaction due to strictly digital communication.

The instructors were able to provide attendees with a level playing field despite vast differences in their career experience, geographical location, and access to technology by using CyVerse, making programs and data available and easily accessible in the cloud regardless what operating system or local compute speed the participants were using.

"Some people get frustrated when they have to start compiling and installing software, and CyVerse is a great way to keep people happier by providing them a starting point with the programs pre-installed," Strickler noted.

"CyVerse was a great integration for all the participants, and it was a really easy way to make the content reproducible," Landis noted. "We could teach this workshop time and again, and the hard part is already done."

The instructors introduced students to the suite of analytical tools available through CyVerse's Discovery Environment platform for genome construction and analysis, using test data available in the CyVerse Data Store.

"We included methods to gear the workshop toward researchers doing phylogenetics or population genetics, or other analyses of non-model systems, so that they could learn to build a genome from scratch," Landis said.

 

Students learned about genome assembly and annotation and then used CyVerse's

Image
Calochortus venustus

Studying Calochortus venustus, the butterfly mariposa lily, was an additional inspiration for Landis and Strickler to organize the non-model genomics workshop. Photo by Adriana Hernandez/Cornell University.

cloud computing environment Atmosphere to do these analyses. Nelson, who is a longtime CyVerse collaborator, reviewed CoGe, or Comparative Genomics, a Powered-by-CyVerse project that provides a suite of tools for genomics researchers.

The plant genetics community is built on member contribution, and the tools presented during the workshop had all been developed by other research groups.

"We try to optimize the tools for our students' questions of interest," Landis said. "For any of these steps there are a ton of programs that can be used, so we try to give the students the most appropriate current workflow."

And, he added, "we use all of these programs and analyses in our own research, which spans the whole gamut: angiosperms, cultivated and wild species, gymnosperms – even a goat project."

 

Image
Susan Strickle

Susan Strickler

"It's nice that these skills are transferable to so many different organisms," Strickler commented. Strickler is an alum of CyVerse's Container Camp, and she continued working with CyVerse systems for her own research after attending camp, sometimes bounding inquiries off CyVerse team members. "It's been great working with the people on the CyVerse team," she said.

Similar to Container Camp, the workshop helped attendees create a community of practice, giving students long-term contacts and collaborators among each other and the instructors.

"The ground-up guidance provided throughout the workshop was extremely helpful," noted Vanessa Handley, Director of Collections and Research at the University of California Botanical Garden, who attended the workshop. "I am about to embark on a landscape scale genomics project for which I need to assemble a reference genome. This workshop was therefore particularly timely."

And, Handley added, "the creative resiliency inherent to delivering virtually deserves special kudos."

The instructors expect to continue offering the workshop at future Botanical Society of America conferences, and also independently. Visit the BCBC website for more information.

As CyVerse is refocusing the Atmosphere cloud computing platform to better serve development of cloud native services, future workshops will likely integrate programs and curricula with CyVerse's Visual Interactive Computing Environment (VICE) and the National Science Foundation's Jetstream Cloud, a larger cloud infrastructure that fulfills the same (and more) functions pioneered by Atmosphere.

The NMG Workshop materials are available in the CyVerse Discovery Environment. Additionally, specific details including the nature and origin of the data used, programs installed, and settings for CyVerse instances, are available in GitHub.

All workshop instructors are funded by awards from the National Science Foundation and consider the workshop to be part of the Broader Impacts category of their awards.

Create Account

Create Account

An Open Science Workspace for Collaborative Data-driven Discovery