CyVerse and the University of Arizona Host ChatGPT Workshops
CyVerse recently hosted two ChatGPT workshops at the University of Arizona (UA). A workshop on May 31 was presented for UA faculty and staff, and a separate workshop was held on June 8 for the fifty newest UA BIO5 KEYS research interns.
Developed by OpenAI and released in November 2022, ChatGPT is an artificial intelligence (AI) chatbot. ChatGPT combines a generative pre-trained transformer (GPT) language model with chatbot functionality, which is useful for conversational applications. GPT is a type of large language model (LLM), which consists of a neural network with multiple parameters that is trained with large amounts of unlabeled text using self-supervised learning. Self-supervised learning is a type of machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training.
The May 31 workshop, which was attended by 74 participants, was led by instructors Tyson L. Swetnam of CyVerse, Carlos Lizárraga-Celaya, Greg T. Chism, and Heidi Steiner of UArizona's Data Science Institute, and Andrew Bennett, an incoming Assistant Professor in Hydrology and Atmospheric Sciences at UA. The workshop involved presentations, open-mic discussions, and hands-on coding. The instructors also discussed ethical considerations around the veracity of ChatGPT results, as well as platform options and alternatives. In addition to learning about the basic ChatGPT site available here, attendees learned that they can use Bing Chat (Bing is the default search engine for the program) as well as alternative options such as BARD, a program created by Google which is similar to ChatGPT. Additionally, instructors discussed sites and platforms that enhance the use of ChatGPT, such as Hugging Face, an AI community that has over 100,000 free AI models available.
"Before training any Large Language Model, there is a data collection process to gather publicly available data from many sources, such as web crawling, GitHub, Wikipedia, Books, arXiv, and Stack Exchange," instructor Carlos Lizárraga observed. "The text data is then separated into tokens and a statistical language model is constructed to begin the training process. These trainings involve a vocabulary of tens of thousands of words and tens of billions of parameters. This requires a high-performance computing infrastructure and computing time on the order of months to produce a working model after thousands of iterations to predict what word comes next in a sentence. The models are not intelligent or reasoning; they only decide what to type next based on probabilities in the language model."
Instructor Tyson L. Swetnam believes that teaching staff, students, and faculty how to use these platforms is critical. "Microsoft Bing Chat, GitHub CoPilot, and Google BARD are changing the way we interact with computers and our daily work in profound ways," Swetnam said. "People are excited and even terrified at the potential of these AI platforms. We wanted to get out ahead of it and try to show how to use these tools effectively and in ways that improve our work productivity. We also wanted to allow for discussion about the ethical and moral implications of using AI for education and research."
Feedback from the event included comments such as "I enjoyed the examples of how people are using ChatGPT in their life and work and learning how to write more effective prompts" and "I enjoyed the interactive nature and broad spectrum of topics related to ChatGPT and AI in research." Another attendee commented that "the supporting materials and links to AI/ChatGPT resources was great! I also enjoyed hearing what others are using ChatGPT for, which was very enlightening."
A recording of the May 31 workshop can be viewed here, and an overview of the presentation, which includes prompt engineering tips and other ChatGPT guidance, can be viewed here. More information about CyVerse learning options, including workshops, webinars, and learning materials, can be found at this link.