
Citing comprehensive and correct related work is crucial in academic writing. It can not only support the author’s claims but also help readers trace other related research papers. Nowadays, with the rapid increase in the number of scientific literatures, it has become increasingly challenging to search for high-quality citations and write the manuscript. In this paper, we present an automatic writing assistant model, AutoCite, which not only infers potentially related work but also automatically generates the citation context at the same time. Specifically, AutoCite involves a novel multi-modal encoder and a multi-task decoder architecture. Based on the multi-modal inputs, the encoder in AutoCite learns paper representations with both citation network structure and textual contexts. It aggregates information from both outer-neighbors and inner-neighbors with fully considering the citation link direction and extracts the semantic information from textual contexts on links. The multi-task decoder in AutoCite couples and jointly learns citation prediction and context generation in a unified manner. To effectively join the encoder and decoder, we introduce a novel representation fusion component, i.e., gated neural fusion, which feeds the multi-modal representation inputs from the encoder and creates outputs for the downstream multi-task decoder adaptively. Extensive experiments on five real-world citation network datasets validate the effectiveness of our model.