Extending Biology Models with Deep NLP over Scientific Articles

This paper describes R3 (Reading, Reasoning, and Reporting), our system for deep language understanding and model management for the biomedical domain. Starting from a base BioPAX model, we learn extensions to it by reading biomedical research articles from PubMed Central. We describe the particular issues for text understanding in this domain and how we use pre- and post-analysis reasoning to bridge the differences in how knowledge is packaged in a text versus a biomedical database. We close with brief description of our first year results, where R3 was faster than all other reported systems, reading 1,000 articles in 15 minutes.

David McDonald, Scott Friedman, Amandalynne Paullada, Rusty Bobrow, and Mark Burstein. Extending Biology Models with Deep NLP over Scientific Articles. In AAAI-16 Workshop on Knowledge Extraction from Text, Forthcoming.