Scalable Relational Learning for Large Heterogeneous Networks


Ryan Rossi
Zhou, Rong

Scalable Relational Learning for Large Heterogeneous Networks

Relational models for heterogeneous network data are becoming increasingly important for many real-world applications. However, existing relational learning approaches are not parallel, have scalability issues, and thus unable to handle large heterogeneous network data. In this paper, we propose parallel collective matrix factorization (PCMF) that serves as a fast and flexible framework for joint modeling of large heterogeneous networks. The PCMF learning algorithm solves for a single parameter given the others, leading to a parallel scheme that is fast, flexible, and general for a variety of relational learning tasks and heterogeneous data types. The proposed approach is carefully designed to be (a) efficient for large heterogeneous networks (linear in the total number of observations from the set of input matrices), (b) flexible as many components are interchangeable and easily adaptable, and (c) effective for a variety of applications as well as for different types of data. The experiments demonstrate the scalability, flexibility, and effectiveness of PCMF. For instance, we show that PCMF outperforms a recent state-of-the-art parallel approach in runtime, scalability, and prediction quality. Finally, the effectiveness of PCMF is shown on a number of relational learning tasks such as serving predictions in a real-time streaming fashion.

Additional information

Focus Areas

Our work is centered around a series of Focus Areas that we believe are the future of science and technology.

Licensing & Commercialization Opportunities

We’re continually developing new technologies, many of which are available for¬†Commercialization.


Our scientists and staffers are active members and contributors to the science and technology communities.