ML/AI | CyVerse

abstract data streams to represent machine learning and ai

The past decade has seen the pervasive application of Machine Learning (ML) and Artificial Intelligence (AI) based analysis methods across disciplines, making proficiency in their use a fundamental skill. The advent of foundation models, trained on extensive amounts of data and adaptable to diverse applications and data types, has empowered many disciplines to undertake analytical tasks previously considered infeasible.

CyVerse offers essential infrastructure components that seamlessly integrate these cutting-edge ML/AI advancements into analyses and data exploration workflows. This integration encompasses familiar data science workbenches like Jupyter and RStudio, as well as rapid application development tools like Gradio and Streamlit that are accessible through the CyVerse Discovery Environment. Underpinning this infrastructure is the robust data management platform, the CyVerse Data Store, that seamlessly integrates with popular tools like DVC and MLFlow. These tools collectively empower ML teams to efficiently manage extensive datasets, ensure project reproducibility, and foster effective collaboration. Equally important is the ability to leverage specialized hardware capabilities available at institutional level such as HPC clusters, academic clouds resources with GPU, cutting edge and no-cost NSF ACCESS resources, and commercial cloud infrastructure. Platform components like CyVerse CACAO provide the required cloud automation and reproducibility allowing teams to readily harness these resources in a reliable, scalable, and reproducible manner.

Mastering data management, distributed analysis, and container-based workflows while keeping up with the ever-changing landscape of ML/AI tools, architectures, and platforms presents a significant challenge. To address this, CyVerse, in collaboration with the Data Science Institute and other partners, offers a comprehensive suite of hands-on training programs and workshops that cover a broad spectrum of topics. These programs range from introductory concepts like Container Camp to advanced subjects such as cloud-native technologies and Kubernetes for scientific analysis.

Give your team the pragmatic CyVerse advantage: CyVerse can help you incorporate the ML/AI components you need for your next proposal, grant, coursework, or workshop, and offers architectural guidance and customization services to cater to the unique requirements of your ML/AI analysis workflows.

Learn more about our capabilities and see what the community is building using CyVerse. And be sure to check out our webinars around ML/AI, and follow the UArizona Data Lab's webinars on ML/AI topics and explore their Wiki and Workshops for Machine Learning.

Featured Project

illustration of a woman interacting with AI via chat conversation

Advances in Large Language Models (LLMs) have significantly impacted how we leverage AI technology in education and research. Platforms like ChatGPT offer user-friendly chatbot interfaces that support students in their academic pursuits and assist instructors in content creation. Researchers can also utilize LLMs to summarize complex content effectively.

However, challenges such as "hallucinations" persist, where these platforms might fabricate information, particularly when answering questions about content not readily available on the internet. To address this, techniques like Retrieval Augmented Generation (RAG) are employed, enabling LLMs to operate and answer questions based on data and documents provided by users.

In collaboration with the Data Science Institute (DSI), the Institute for Computation and Data-Enabled Insight (ICDI), and the Computer Science department at UArizona, CyVerse has harnessed open and freely available LLMs to develop Chatür. This secure and privacy-preserving chatbot enhances learning experiences and showcases the potential of integrating LLMs with traditional course material and educational systems to create personalized learning pathways.

Additionally, researchers can provide Chatür with a large corpus of text, including full-text papers and private copyrighted content, and obtain a conversational interface that can answer questions using that content. This capability empowers researchers to efficiently access and utilize information from various sources, fostering deeper insights and accelerated research progress.

By seamlessly blending the capabilities of LLMs and chatbots, Chatür offers a transformative tool for education and research. It enables students to engage with course material in a more interactive and personalized manner, while researchers can leverage LLMs to analyze and synthesize vast amounts of information more effectively.

Learn about Chatür and other open source RAG tools in our webinar “A Conversation With Your Own Data With Open LLMs Using NSF JetStream2 and CyVerse”.

Learn

Prompt Engineering

Prompt engineering refers to the crafting and refining of prompts or input data to generate desired outputs from AI models. It's a strategic approach to maximizing the effectiveness and accuracy of AI systems by tailoring the inputs to elicit the desired responses.

We're passionate about teaching prompt engineering to others, showing how to shape input to get the best AI responses. Take a look at this past workshop we held: Workshop: ChatGPT Prompt Engineering

Deep Learning

Deep learning is a subset of ML where algorithms learn to imitate the human brain's neural networks to recognize patterns and make decisions. It's the technology behind many AI advancements, from image and speech recognition to natural language processing.

If you're curious to learn more about deep learning, check out this past CyVerse webinar: A Deep Dive into Deep Learning Techniques: A First-of-its-kind Hands-on Workshop. It's a great introduction to the topic, covering its basics and applications.

Data Lab

The UArizona Data Lab, under the Data Science Institute and in partnership with the Institute for Computation & Data-Enabled Insight, acts as a dynamic hub for promoting interdisciplinary research in data science. It provides a collaborative space where scholars and learners from various fields collaborate to investigate, interpret, and derive insights from intricate datasets. Through interdisciplinary workshops, consultations, and a suite of tools and resources, the UArizona Data Lab enables researchers, students, and industry collaborators to leverage the power of data-driven exploration.

UArizona Data Lab ML/AI Webinar Playlist

Explore UArizona Data Lab Machine Learning WIKI

Explore UArizona Data Lab Machine Learning Workshops Wiki

Use Cases

iNaturalist's 70M+ Crowdsourced Images to Build an ML Insect Identification App

iNaturalist's ML Insect Identification App

The iNaturalist Open Download tool created an impact by providing easy access to millions of labeled, crowd-sourced images of living organisms. This facilitated improved species identification and ecological monitoring through better-trained models. It also spurred innovation in techniques like Self-Supervised Learning (SSL). Moreover, it empowered citizen scientists to contribute to research and conservation efforts, fostering public engagement in biodiversity preservation. Overall, the tool democratized access to biodiversity data, supported innovation, and encouraged community involvement in nature conservation efforts.

HydroFRAME

By leveraging container technology, Condon's team overcomes barriers in hydrology research, making high-level models more accessible to researchers, educators, and policymakers. Containers streamline model sharing, facilitate collaborative research, and enhance educational tools, ultimately advancing understanding and management of groundwater resources nationwide.

COALESCE: Empowering Ultra-precision Ag with ML, Modeling & Robots

Researchers working to feed the world are applying and integrating layers of technologies—sensors, machine learning, AI, high-throughput phenotyping platforms such as drones an small-scale rolling robots that can also fertilize, weed, and cult single plants in a field—with the ultimate goal of replacing farmers’ reliance on heavy machinery and broadcast spraying. COALESCE (for COntext Aware LEarning for Sustainable CybEr-agricultural systems), a collaboration involving CyVerse, Iowa State, UIllinois Urbana-Champaign, George Mason University, Iowa Soybean Association, and Ohio State, aims to deliver ML to execute decisions directly and almost immediately at the field as data is gathered.