WSDM2021

DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

Kunal Dahiya 1 Deepak Saini 2 Anshul Mittal 1 Ankush Shaw 1 Kushal Dave 3 Akshay Soni 3 Himanshu Jain 1 Sumeet Agarwal 4 Manik Varma 3
1Indian Institute of Technology Delhi, India
2Microsoft Research India, India
3Microsoft, USA
4IIT Delhi, India

Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label set. This paper develops the DeepXML framework that addresses these challenges by decomposing the deep extreme multi-label task into four non-extreme sub-tasks each of which can be trained accurately and efficiently. Choosing different components for the four sub-tasks allows DeepXML to generate a family of algorithms with varying trade-offs between accuracy and scalability. In particular, DeepXML yields the Astec algorithm that could be 2-12% more accurate and 6-50x faster to train than leading deep extreme classifiers on publically available short text datasets. Astec could also efficiently train on Bing short text datasets containing up to 62 million labels while making predictions for billions of users and data points per day on commodity hardware. This allowed Astec to be deployed on Bing for a range of short text applications such as matching user queries to advertiser bid phrases and showing personalized ads where it yielded very significant gains in click-through-rates, coverage, revenue and other online metrics over state-of-the-art techniques currently in production.