The Data Hidden in the Leaves
The demand for people with data science continues to grow across all industries. Data science, also referred to as data analytics, involves turning collected data into valuable information. With the proliferation of technology, massive amounts of data are being collected every day. From businesses wanting to increase sales profits, sports teams assessing the success of a potential team in an area, to healthcare systems tracking disease, data science turns the numbers into something usable. The Bachelor of Science in Data Science is one of the newest degree programs at the College of Coastal Georgia. The program, done in cooperation with the Department of Mathematics and the School of Business and Public Management, is preparing students to fill thousands of available and well-paid positions in data science and analytics. One student, Travis Simmons, who is minoring in data science, is already using his newly acquired data skills in an internship with the University of Arizona (UArizona)'s School of Plant Sciences. His internship is giving him a glimpse as to how data science and biology are intertwined.
Simmons is a senior at the College, majoring in biology with a concentration in general biology, and is earning two minors in environmental science and data science. He is continuing the virtual internship he did this summer with graduate students of UArizona's School of Plant Sciences, which is also being counted as credit towards his environmental science minor. The team is developing algorithms and analyzing data generated by the university's plant phenotyping system. UArizona is home to the world's largest outdoor plant phenotyping system, called the Field Scanner or gantry. This machine consists of state-of-the-art sensors for collecting plant phenotyping data. The imaging system covers over two acres of cultivated land and produces large volumes of data per day. One of the challenges is finding an effective way to process all the data so the information can be used to benefit plant scientists and farmers. CyVerse, the National Science Foundation's premier data science infrastructure project headquartered at UArizona, provides the perfect solution for the team's large-scale data processing endeavors.
"Our job is to take the data and get it down to numbers where we can do statistics and find drought resilient plants," Simmons said. "In order to run those tests, the plants need to be tracked through scans. If you know the information about one day of a plant, but can't tell if it's the same plant on day two, it will be hard to tell if the plant is doing better over time. That was my first project. I was writing a program to match up plants over days, which I was able to do and is in rotation in one of their pipelines. It was so much fun."
Simmons started his internship with a limited knowledge of data science. This fall semester is actually Simmons' first time taking data science classes. In preparation for his internship, he reached out to professors of the College's data science program with specific questions that would enable him to be a contributing member of the team. One professor who helped was Assistant Professor of Mathematics Dr. Renren Zhao, who taught Simmons the fundamentals of Python, a programming language. Simmons was then able to use Python to write sorting scripts to analyze the data generated by the gantry.
Assistant Professor Dr. Duke Pauli of the School of Plant Sciences at UArizona and Associate Professor Dr. Eric Lyons lead the plant phenotyping project. Simmons works in Pauli's lab under his mentor, graduate student Emmanuel Gonzalez. Pauli, Lyons, and Gonzalez wrote a letter together where they talked about Simmons in glowing terms and how impressed they were with his work.
"His attention to detail and willingness to learn are a few qualities that make him a truly outstanding contributor. His first task was to develop a method to track individual plants throughout the field season. Despite having limited experience, he successfully developed, validated, and implemented his computational code which will be used in future gantry scans," the letter reads. "His success was rooted not only in hard work and dedication but also an impressive ability to consider the overall project goals in his work—which is no easy feat. Mr. Simmons has developed a well-rounded set of computational, analytical, and collaborative skills that will serve him well in future endeavors."
Now that Simmons has started his data science classes for his minor, he applies what he's learning in the classroom to his internship.
"There were very specific things that I needed to learn quickly for my internship. Although I'm taking a class learning Python, which I did almost every day over the summer, I'm learning new functions that I've never heard of before," he said. "I'm filling in all these knowledge gaps and I'm going back to apply all these new things to my code that I've already written. I'm improving all my code now based on what I'm learning now."
Simmons is also taking a database management class that focuses on how to move and manage data. Although Simmons is not involved in that aspect of the internship, because of the class he's able to understand what the other team members are talking about during their meetings.
"It's given me the vocabulary that they're using and I can now follow what they're saying. Every day I'm learning something that I can directly apply to my internship. The more I get into it, the more tools I learn," he said.
Simmons had another opportunity to use what he's learning at the College for his internship. The imaging research project was recently featured in the Wall Street Journal. For an accompanying video with the article, Simmons created a 3D model of a plant scan to be displayed on a computer screen in the video. He used what he learned in his geographic information system (GIS) class to create a point cloud visualization. GIS is a framework that provides the ability to capture and analyze spatial and geographical data. It organizes layers of information into visualizations using maps and 3D scenes.
"There's a laser that goes around all the plants and takes points in 3D space. I took those points and gave them a mask. It was a very cool visual of a plant that had been scanned and was spinning around," he said.
Simmons volunteered to generate the animation, something no one else on the team knew how to do. His work can be seen throughout the video.
Data science fits right into what Simmons wants to do for a future career. He wants to be a research professor in the plant science field.
"Computational biology is relatively new and I found it super interesting. When I started the internship, I had no coding experience at all, so I've found that passion through this project, and it's something I'll definitely continue doing," Simmons said.
Simmons wants to be involved in research that links plants to computers in order to solve problems. His internship gives him insight into how data science is helping to identify drought resilient crops that can address food and clothing insecurity as the world's population continues to outgrow its resources.
"There are all these giant advancements in computational biology in the last couple of years. Many are still in the first stages of how to implement what they've gathered. I want to be a part of that to help solve some really tough questions," Simmons said.
Working with the world's largest plant scanner didn't happen through the traditional route of an application process. Simmons shared his uncertainty about what topic to study in graduate school with Coastal Georgia professor, Dr. James Deemy, assistant professor of environmental science. Deemy encouraged him to read scientific and academic papers to find what interested him, and reach out to those authors. That is exactly what Simmons did. He contacted the author of a paper about the iPlant Collaborative, now called CyVerse, an advanced data management platform for plant sciences and other scientific disciplines. This led to Simmons being put in contact with Lyons. He and Lyons met over Zoom to talk about the project and got to know one another. Simmons submitted his resume and CV, which was shared amongst the graduate students. Gonzalez agreed to be Simmons' mentor and his internship started the following week. Through the internship, Simmons is working with not only graduate students but electrical engineers, biologists, genetic biologists, neurobiologists, and data analysts. One of the many things he enjoyed was realizing how supportive the scientific community can be and how excited people are to share their knowledge.
Simmons understands that data science can seem daunting because of the amount of information collected and analyzed. However, he assures students considering a data science minor that they will start from square one.
"It's very accessible. Data science applies to everything. Even if you have an office job, being able to run a small amount of analytics, like sales or job performance, is a powerful tool. Imagine going to your boss with a spreadsheet of not only what you've done, but how the work has been effective. Being able to provide that kind of information for people is powerful—no matter what field you're in," he said. "It gives you a toolset that not many people in your major are going to have."
He hopes other students will also take the initiative to start conversations with people already working in the field they are pursuing. Like in Simmons' situation, one email can lead to new experiences, a newfound passion, and a world of possibilities.
To view the Wall Street Journal video featuring the plant project at the University of Arizona, click here.