||While recent years have witnessed a steady trend of applying Deep Learning (DL) to networking systems, most of the underlying Deep Neural Networks (DNNs) suffer two major limitations. First, they fail to generalize to topologies unseen during training. This lack of generalizability hampers the ability of the DNNs to make good decisions every time the topology of the networking system changes. Second, existing DNNs commonly operate as "blackboxes" that are difficult to interpret by network operators, and hinder their deployment in practice. In this paper, we propose to rely on a recently developed family of graph-based DNNs to address the aforementioned limitations. More specifically, we focus on a network congestion prediction application and apply Graph Attention (GAT) models to make congestion predictions per link using the graph topology and time series of link loads as inputs. Evaluations on three real backbone networks demonstrate the benefits of our proposed approach in terms of prediction accuracy, generalizability, and interpretability.