microbe.cards is a comprehensive database of microbial phenotypes and predictions generated by large language models (LLMs). LLMs are advanced AI models trained on vast amounts of text data to understand and generate human-like text. We evaluate LLM output on a set of phenotypes using high-quality data from Bugphyzz. Please note that this page also contains LLM-generated descriptions that have not been validated.
Metadata on microbial phenotypes and their environments are key to studies of microbial taxa. However, these are currently sparse for most taxa except for highly studied reference organisms. Recently, large natural language models (LLMs) have emerged as a groundbreaking approach to translate knowledge across scientific literature and databases for user-defined tasks. In this work, we systematically explore the quality of biological information embedded in publicly available LLMs and their potential to expand biological knowledge of traditional manually compiled database-based approaches for microbial research. We evaluated the performance of state of the art LLMs, including Anthropic's Claude, Meta’s Llama 3, and OpenAI’s GPT-4, for their efficiency in extracting knowledge for microbial research, demonstrating that LLMs accurately predict a range of phenotypic properties from bacterial species names alone, often outperforming sequence-based prediction methods. Moreover, we show that LLMs can effectively interpret gene names and draw conclusions about bacterial phenotypes. Ensemble models combining LLM outputs further improved prediction accuracy for certain phenotypes, demonstrating the value of integrating multiple models to capture complementary information. To make our data and methods accessible to the scientific community, we have developed a web portal that offers LLM-based characterizations of 12,264 phenotypes for 2,312 species through an open access platform (https://microbe.cards). Our study highlights the utility of LLMs as a powerful method for species-level characterizations in microbiology and facilitates the discovery of novel phenotypic associations.
If you have any questions or feedback, please don't hesitate to contact us at: pmu15@helmholtz-hzi.de