By Prof. Dr. Richard A. Bonneau, Professor of Biology and Computer Science; Director, NYU Center for Data Science, New York, USA
Vladimir Gligorijevic, P Douglas Renfrew and Richard Bonneau.
Due to limitations of existing experimental methods for determining protein functions and the high cost of experiments, the vast majority of proteins across many organisms remain unannotated. Developing ML methods for combining large-scale genome-wide heterogeneous data to extract useful protein feature representations for function prediction thus remains a key problem in biology. This problem also serves to illustrate many general aspects of multiple applications of machine learning to biological sequences. We will review a few of our recent deep learning-based methods for predicting function from various data types, including protein sequences, structures and protein-protein interaction networks. We will first present our recent integrative method, deepNF (deep network fusion) that integrates different protein-protein interaction networks using multi-modal network auto-encoders to construct a shared protein feature representation indicative of protein function. In the second part of the talk, we will focus on methods for predicting function from combining protein sequence and structure. This discussion will center on a method based on graph convolutional networks (GCNs) which extracts features from protein contact maps and sequences while learning to predict function. We will discuss the performance of GCN in predicting functions of experimentally determined structures from PDB and how we are planning on applying this method on Rosetta-predicted structures. Our hope is to predict protein function in a manner that allows us to map the predicted function back to residues, locations on structures, and network neighborhoods in manner that facilitates experimental followup.
Curriculum vitae:
Dr. Bonneau is Group Leader for Systems Biology at the newly founded Flatiron Institute in New York City and also Director of the New York University Center for Data Science. His group works on inferring and modeling both biological and social networks (at the Simons foundation and the SMaPP lab at NYU respectively), developing new methods to learn very large networks from large collections of genomics data. His group actively participates in applying these methods to ongoing systems biology consortia efforts that span bacteria, model systems, bacteria, the immune system and crop plants. His group also develops methods for the prediction and design of bio molecular polymers (and polymers that mimic biological structure). To carry out this work his group develops Rosetta as a core member of the RosettaCommons; Dr. Bonneau is a founding member of the Rosetta commons and a member of the RosettaCommons executive board. Dr. Bonneau is committed to doing all he can to leverage his position as Director of the Center for Data Science to help increase diversity in the many computational fields that comprise what we today refer to as the field of data science.
REGISTRATION:
For registration please contact Benedicta Frech: benedicta.frech@h-its.org
In case you are not able to attend in person, you can watch the talk afterwards on the HITS YouTube channel: https://www.youtube.com/user/TheHITSters.
The colloquium will be live streamed, please use the following link: https://hitsmediaweb.h-its.org/Mediasite/Play/0d38e9bc22344589b572c45a711af3671d