[Jairsubscribers] 7 new articles published by JAIR
jair-ed at isi.edu
Wed Aug 2 08:25:17 PDT 2017
Dear JAIR subscriber:
This message lists papers that have been recently published in JAIR and describes how to access them. (If you wish to remove yourself from this mailing list, see instructions at the end of this message.)
If you are receiving this email in a digest format of multiple messages, you may wish to turn off the digest option to get shorter emails every month by visiting http://mailman.isi.edu/mailman/listinfo/jairsubscribers
and setting the digest option to "Off".
I. New JAIR Articles
Asrar Ahmed, Pradeep Varakantham, Meghna Lowalekar, Yossiri Adulyasak and Patrick Jaillet (2017)
"Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)",
Volume 59, pages 229-264
For quick access go to <http://www.jair.org/papers/paper5242.html>
Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.
Gabriele Farina and Nicola Gatti (2017)
"Adopting the Cascade Model in Ad Auctions: Efficiency Bounds and Truthful Algorithmic Mechanisms",
Volume 59, pages 265-310
For quick access go to <http://www.jair.org/papers/paper5438.html>
Sponsored Search Auctions (SSAs) are one of the most successful applications of microeconomic mechanisms, with a revenue of about $72 billion in the US alone in 2016. However, the problem of designing the best economic mechanism for sponsored search auctions is far from being solved, and, given the amount at stake, it is no surprise that it has received growing attention over the past few years. The most common auction mechanism for SSAs is the Generalized Second Price (GSP). However, the GSP is known not to be truthful: the agents participating in the auction might have an incentive to report false values, generating economic inefficiency and suboptimal revenues in turn. Superior, efficient truthful mechanisms, such as the Vickrey-Clarke-Groves (VCG) auction, are well known in the literature. However, while the VCG auction is currently adopted for the strictly related scenario of contextual advertising, e.g., by Google and Facebook, companies are reluctant to extend it to SSAs, fearing prohibitive switching costs. Other than truthfulness, two issues are of paramount importance in designing effective SSAs. First, the choice of the user model; not only does an accurate user model better target ads to users, it also is a critical factor in reducing the inefficiency of the mechanism. Often an antagonist to this, the second issue is the running time of the mechanism, given the performance pressure these mechanisms undertake in real-world applications. In our work, we argue in favor of adopting the VCG mechanism based on the cascade model with ad/position externalities (APDC-VCG). Our study includes both the derivation of inefficiency bounds and the design and the experimental evaluation of exact and approximate algorithms.
Tamir Tassa, Tal Grinshpoun and Roie Zivan (2017)
"Privacy Preserving Implementation of the Max-Sum Algorithm and its Variants",
Volume 59, pages 311-349
For quick access go to <http://www.jair.org/papers/paper5504.html>
One of the basic motivations for solving DCOPs is maintaining agents' privacy. Thus, researchers have evaluated the privacy loss of DCOP algorithms and defined corresponding notions of privacy preservation for secured DCOP algorithms. However, no secured protocol was proposed for Max-Sum, which is among the most studied DCOP algorithms. As part of the ongoing effort of designing secure DCOP algorithms, we propose P-Max-Sum, the first private algorithm that is based on Max-Sum. The proposed algorithm has multiple agents preforming the role of each node in the factor graph, on which the Max-Sum algorithm operates. P-Max-Sum preserves three types of privacy: topology privacy, constraint privacy, and assignment/decision privacy. By allowing a single call to a trusted coordinator, P-Max-Sum also preserves agent privacy. The two main cryptographic means that enable this privacy preservation are secret sharing and homomorphic encryption. In addition, we design privacy-preserving implementations of four variants of Max-Sum. We conclude by analyzing the price of privacy in terns of runtime overhead, both theoretically and by extensive experimentation.
Lars Otten and Rina Dechter (2017)
"AND/OR Branch-and-Bound on a Computational Grid",
Volume 59, pages 351-435
For quick access go to <http://www.jair.org/papers/paper5456.html>
We present a parallel AND/OR Branch-and-Bound scheme that uses the power of a computational grid to push the boundaries of feasibility for combinatorial optimization. Two variants of the scheme are described, one of which aims to use machine learning techniques for parallel load balancing. In-depth analysis identifies two inherent sources of parallel search space redundancies that, together with general parallel execution overhead, can impede parallelization and render the problem far from embarrassingly parallel. We conduct extensive empirical evaluation on hundreds of CPUs, the first of its kind, with overall positive results. In a significant number of cases parallel speedup is close to the theoretical maximum and we are able to solve many very complex problem instances orders of magnitude faster than before; yet analysis of certain results also serves to demonstrate the inherent limitations of the approach due to the aforementioned redundancies.
Yuqian Li and Vincent Conitzer (2017)
"Game-Theoretic Question Selection for Tests",
Volume 59, pages 437-462
For quick access go to <http://www.jair.org/papers/paper5413.html>
Conventionally, the questions on a test are assumed to be kept secret from test takers until the test. However, for tests that are taken on a large scale, particularly asynchronously, this is very hard to achieve. For example, TOEFL iBT and driver's license test questions are easily found online. This also appears likely to become an issue for Massive Open Online Courses (MOOCs, as offered for example by Coursera, Udacity, and edX). Specifically, the test result may not reflect the true ability of a test taker if questions are leaked beforehand.
In this paper, we take the loss of confidentiality as a fact. Even so, not all hope is lost as the test taker can memorize only a limited set of questions' answers, and the tester can randomize which questions to let appear on the test. We model this as a Stackelberg game, where the tester commits to a mixed strategy and the follower responds. Informally, the goal of the tester is to best reveal the true ability of a test taker, while the test taker tries to maximize the test result (pass probability or score). We provide an exponential-size linear program formulation that computes the optimal test strategy, prove several NP-hardness results on computing optimal test strategies in general, and give efficient algorithms for special cases (scored tests and single-question tests). Experiments are also provided for those proposed algorithms to show their scalability and the increase of the tester's utility relative to that of the uniform-at-random strategy. The increase is quite significant when questions have some correlation---for example, when a test taker who can solve a harder question can always solve easier questions.
Shaowei Cai, Jinkun Lin and Chuan Luo (2017)
"Finding A Small Vertex Cover in Massive Sparse Graphs: Construct, Local Search, and Preprocess",
Volume 59, pages 463-494
For quick access go to <http://www.jair.org/papers/paper5443.html>
The problem of finding a minimum vertex cover (MinVC) in a graph is a well known NP-hard combinatorial optimization problem of great importance in theory and practice. Due to its NP-hardness, there has been much interest in developing heuristic algorithms for finding a small vertex cover in reasonable time. Previously, heuristic algorithms for MinVC have focused on solving graphs of relatively small size, and they are not suitable for solving massive graphs as they usually have high-complexity heuristics. This paper explores techniques for solving MinVC in very large scale real-world graphs, including a construction algorithm, a local search algorithm and a preprocessing algorithm. Both the construction and search algorithms are based on low-complexity heuristics, and we combine them to develop a heuristic algorithm for MinVC called FastVC. Experimental results on a broad range of real-world massive graphs show that, our algorithms are very fast and have better performance than previous heuristic algorithms for MinVC. We also develop a preprocessing algorithm to simplify graphs for MinVC algorithms. By applying the preprocessing algorithm to local search algorithms, we obtain two efficient MinVC solvers called NuMVC2+p and FastVC2+p, which show further improvement on the massive graphs.
Ramya Ramakrishnan, Chongjie Zhang and Julie Shah (2017)
"Perturbation Training for Human-Robot Teams",
Volume 59, pages 495-541
For quick access go to <http://www.jair.org/papers/paper5390.html>
In this work, we design and evaluate a computational learning model that enables a human-robot team to co-develop joint strategies for performing novel tasks that require coordination. The joint strategies are learned through "perturbation training," a human team-training strategy that requires team members to practice variations of a given task to help their team generalize to new variants of that task. We formally define the problem of human-robot perturbation training and develop and evaluate the first end-to-end framework for such training, which incorporates a multi-agent transfer learning algorithm, human-robot co-learning framework and communication protocol. Our transfer learning algorithm, Adaptive Perturbation Training (AdaPT), is a hybrid of transfer and reinforcement learning techniques that learns quickly and robustly for new task variants. We empirically validate the benefits of AdaPT through comparison to other hybrid reinforcement and transfer learning techniques aimed at transferring knowledge from multiple source tasks to a single target task.
We also demonstrate that AdaPT's rapid learning supports live interaction between a person and a robot, during which the human-robot team trains to achieve a high level of performance for new task variants. We augment AdaPT with a co-learning framework and a computational bi-directional communication protocol so that the robot can co-train with a person during live interaction. Results from large-scale human subject experiments (n=48) indicate that AdaPT enables an agent to learn in a manner compatible with a human's own learning process, and that a robot undergoing perturbation training with a human results in a high level of team performance. Finally, we demonstrate that human-robot training using AdaPT in a simulation environment produces effective performance for a team incorporating an embodied robot partner.
II. Unsubscribing from our Mailing List
To remove yourself from the JAIR subscribers mailing list, visit our
Web site (http://www.jair.org/), follow the link "notify me of new
articles", enter your email address in the form at the bottom of the
page, and follow the directions. In the event that you've already
deleted yourself from the list and we keep sending you messages like
this one, send mail to jair-ed at isi.edu.
More information about the Jairsubscribers