Accelerating Matrix Computations for Mining Large Dynamic Complex Networks

Many common techniques in graph mining and machine learning are based on linear

algebra routines that are expensive on large data sets. A prime example is the computation of

many eigenpairs or even the full eigendecomposition of a graph’s Laplacian matrix. Executing such

operations on graphs with millions or billions of edges results in high running times or may even

be impractical. Moreover, library implementations of linear algebraic kernels do not take graph

changes over time into account. Since dynamic graphs such as the web graph or social interaction

networks are abundant these days, this omission leads to a waste of computing resources.

Our proposal aims at significantly faster (yet inexact) algorithms for linear algebra routines for dynamic

graph mining applications. We want to develop algorithms that monitor the graph’s and the

algorithm’s state over time with appropriate data structures; this avoids costly restarts from scratch.

Moreover, we use approximation in order to trade running time with (a still sufficient) accuracy and

exploit common structural features of complex networks. To this end, we want to combine results

and techniques from numerical linear algebra (NLA), combinatorial scientific computing and theoretical

computer science. As a significant part of the project, we demonstrate the improvement by

means of three common graph mining tasks in dynamic scenarios: clustering, similarity and representation.

Finally, the integration of our new algorithms into the open-source network analysis tool

NetworKit will help community adoption and speeds up other NLA-based graph mining routines.

Many common techniques in graph mining and machine learning are based on linear

algebra routines that are expensive on large data sets. A prime example is the computation of

many eigenpairs or even the full eigendecomposition of a graph’s Laplacian matrix. Executing such

operations on graphs with millions or billions of edges results in high running times or may even

be impractical. Moreover, library implementations of linear algebraic kernels do not take graph

changes over time into account. Since dynamic graphs such as the web graph or social interaction

networks are abundant these days, this omission leads to a waste of computing resources.

Our proposal aims at significantly faster (yet inexact) algorithms for linear algebra routines for dynamic

graph mining applications. We want to develop algorithms that monitor the graph’s and the

algorithm’s state over time with appropriate data structures; this avoids costly restarts from scratch.

Moreover, we use approximation in order to trade running time with (a still sufficient) accuracy and

exploit common structural features of complex networks. To this end, we want to combine results

and techniques from numerical linear algebra (NLA), combinatorial scientific computing and theoretical

computer science. As a significant part of the project, we demonstrate the improvement by

means of three common graph mining tasks in dynamic scenarios: clustering, similarity and representation.

Finally, the integration of our new algorithms into the open-source network analysis tool

NetworKit will help community adoption and speeds up other NLA-based graph mining routines.

Duration of project

Start date: 01/2020

End date: 12/2022

Research Areas

Massively Parallel and Data-Intensive Systems

Research Areas

Big Data