The problem of assigning ranking scores to items based on observed comparison data, e.g., paired comparisons, choice, and full ranking outcomes, has been of continued interest in a wide range of applications, including information search, aggregation of social opinions, electronic commerce, online gaming platforms, and more recently, evaluation of machine learning algorithms. The key problem is to compute ranking scores, which are of interest for quantifying the strength of skills, relevancies or preferences, and prediction of ranking outcomes when ranking scores are estimates of parameters of a statistical model of ranking outcomes. One of
the most popular statistical models of ranking outcomes is the Bradley-Terry model for paired comparisons (equivalent to multinomial logit model), and its extensions to choice and full ranking outcomes. The problem of computing ranking scores under the Bradley-Terry models amounts to estimation of model parameters.
In this paper, we study a popular method for inference of the Bradley-Terry model parameters, namely the MM algorithm, for maximum likelihood estimation and maximum a posteriori probability estimation. This class of models includes the Bradley-Terry model of paired comparisons, the Rao-Kupper model of paired comparisons allowing for tie outcomes, the Luce choice model, and the Plackett-Luce ranking model. We establish tight characterizations of the convergence rate for the MM algorithm, and show that it is essentially
equivalent to that of a gradient descent algorithm. For the maximum likelihood estimation, the convergence is shown to be linear with the rate crucially determined by the algebraic connectivity of the matrix of item pair co-occurrences in observed comparison data. For the Bayesian inference, the convergence rate is also shown to be linear, with the rate determined by a parameter of the prior distribution in a way that can make the convergence arbitrarily slow for small values of this parameter. We propose a simple modification of the classical MM algorithm that avoids the observed slow convergence issue and accelerates the convergence. The
key component of the accelerated MM algorithm is a parameter rescaling performed at each iteration step that is carefully chosen based on theoretical analysis and characterisation of the convergence rate.
Our experimental results, performed on both synthetic and real-world data, demonstrate the identified slow convergence issue of the classic MM algorithm, and show that significant efficiency gains can be obtained by our new proposed method.