Authors: Min Xie, Raymond Chi-Wing Wong, Peng Peng and Vassilis Tsotras
Name of Conference: IEEE International Conference on Data Engineering (ICDE 2020), April 20-24, 2020, Dallas, Texas, USA
Date of Publication: April 20, 2020
When faced with a database containing millions of products, a user may be only interested in a (typically much) smaller representative subset. Various approaches were proposed to create a good representative subset that fits the user’s needs which are expressed in the form of a utility function (e.g., the top-k and diversification query). Recently, a regret minimization query was proposed: it does not require users to provide their utility functions and returns a small set of tuples such that any user’s favorite tuple in this subset is guaranteed to be not much worse than his/her favorite tuple in the whole database. In a sense, this query finds a small set of tuples that makes the user happy (i.e., not regretful) even if s/he gets the best tuple in the selected set but not the best tuple among all tuples in the database. In this paper, we study the min-size version of the regret minimization query; that is, we want to determine the least tuples needed to keep users happy at a given level. We term this problem as the α-happiness query where we quantify the user’s happiness level by a criterion, called the happiness ratio, and guarantee that each user is at least α happy with the set returned (i.e., the happiness ratio is at least α) where α is a real number from 0 to 1. As this is an NP-hard problem, we derive an approximate solution with theoretical guarantee by considering the problem from a geometric perspective. Since in practical scenarios, users are interested in achieving higher happiness levels (i.e., α is closer to 1), we performed extensive experiments for these scenarios, using both real and synthetic datasets. Our evaluations show that our algorithm outperforms the best-known previous approaches in two ways: (i) it answers the α-happiness query by returning fewer tuples to users and, (ii) it answers much faster (up to two orders of magnitude times improvement for large α).