Functional interpretation of omics data is a main challenge. Network-based algorithms have been harnessed for this task for almost two decades, aiming to infer important sub-networks (or modules) from biological networks. Such modules are then subjected to various downstream analyses, including hypergeometric enrichment analysis against Gene Ontology annotations (GO terms). Unexpectedly, for some module discovery algorithms, many GO terms are reported as enriched even when rerunning the algorithm on randomly permuted omic datasets, suggesting they are false positive discoveries. To address this bias, we developed the EMpirical Pipeline (EMP), which generates background distributions of hypergeometric enrichment scores of modules called on permutated datasets. We then designed empirically-based criteria according to which we evaluated six popular network-based algorithms (NetBox, Bionet, jActiveModules-greedy, jActiveModules-SA, HotNet2, KeyPathwayMiner). Our evaluation indicates that NetBox outperforms the other algorithms in most criteria, notably also having lower rates of false calls.