The students learned to use tools and technologies that data scientists regularly utilize, preparing them to lead the next generation in data science.
14 Sep, 2018

Summer Interns Work with Drone Datasets, RNA Analyses

The students learned to use tools and technologies that data scientists regularly utilize, preparing them to lead the next generation in data science. 

Different interests brought Tucson, Arizona high school juniors Daniel Lee and Gary Li to CyVerse headquarters at the University of Arizona this summer, where both students participated in the seven-week KEYS research internship program that lets high school students work alongside university faculty in top research labs.

“I’m really interested in physics and STEM sciences, and I felt that if I didn’t learn computer programming that would be a big hole in my repertoire of knowledge,” said Lee, of Oro Valley Basis high school.

“I took a class called Basics of Java at my high school, and that really got me interested in computer science,” said Li, of University High School. “During the KEYS internship I wanted to combine research that would involve coding, and working with CyVerse and Data7 seemed like a good opportunity.”

And code he did. Working with CyVerse’s director of infrastructure, Edwin Skidmore, and science informatician Upendra Devisetty, Li worked to make a popular analysis program called lnc pipeline reporter accessible over the web by any computer using CyVerse Discovery Environment.

“Lnc pipeline reporter is a tool used to analyze data from lnc rna,” Li explained. “The goal is to produce interactive charts and graphs that scientists can use to analyze their results.”

Li had to learn to combine multiple software components into a single container – a process of bringing all the functionalities into a single unit so that they can reliably work together in different computational environments – from a laptop to large cloud computing platforms.

“The goal of my research is to make the process of bringing bioinformatics software into the CyVerse Discovery Environment easy and to facilitate the reproducibility of analyses,” Li said.

“My favorite thing has been learning new coding languages and seeing how different languages are similar. It becomes really easy once you’ve started learning one.” Li hopes to major in computer science at college after his graduation.

Li added that he especially enjoined the team aspect of working with CyVerse. “There are all these people who specialize in different things,” he said. “I can always find someone to help me with different aspects of the project.”

Both Li (left) and Lee (right) also worked with the UA’s Data7 data science institute staff in addition to CyVerse. The two projects are collocated at the UA’s new Bioscience Research Laboratories building, making interactions between the workgroups, and expert assistance for the interns, all the easier.

Lnc pipeline reporter is now publicly available on the Discovery Environment for any scientist to use.

Meanwhile, Lee worked with CyVerse science informatician Tyson Swetnam to develop a web application that compares publicly available data with an individual researcher’s own dataset.

Swetnam’s project with CyVerse is focused on integrating geographical imaging data collect by unmanned aerial drones with other public datasets, such as precipitation, wind speed and direction, or barometric pressure in a given area, such that all of the data can be easily compared for providing scientists with a better understanding of overarching ecological processes and interactions. The drone project is named “calliope” after a species of hummingbird.

Lee’s mission has been to get the project started by creating a map application, called “calliope view” that can integrate and filter different types of data. To that end, Lee used the programming language R to create a map, added sample environmental data from the National Ecological Observation Network (NEON), and then worked to integrate additional information from the CyVerse data store.

“Eventually we’ll have drone imaging data, and Tyson wants to be able to filter based upon altitude, position, or speed of the aircraft, so I’ve also created the framework for what we can eventually make as the drone interface.”


For now, Lee’s map can be used to scroll to test data sites, look at species counts in specific locations, and filter out the species you don’t want to see.

From here, Lee noted, “it would be very easy to integrate this technology at global scale.”

“Our interns learned to use tools that data scientists use every day, such as Docker, R programming language, and manage their software using GitHub, and they worked with real scalable data and actual research questions. Their contributions will be used by many of the researchers that take advantage of our open access platform,” said Nirav Merchant, a CyVerse co-principal investigator and director of Data7. “The skills they acquired will be valuable in any discipline they will pursue.”

Li and Lee presented research posters on their projects for the KEYS program on July 20. Lee is continuing to work with CyVerse beyond his internship to replace the placeholder datasets with actual NEON data, bringing calliope view much closer to user readiness.