It's too restricted to apply directly to any real RL problem, and in some sense tries to seem more general than it really is. It's certainly good to turn intuitions into theorems in toy settings, but I don't think the setting chosen is particularly natural. I'm torn whether it is a plus or a minus that these can be reframed as reasonably simple intuitions. Proposition 5: (This one is more complicated) Proposition 4: With the assumption of theorem 3 we roughly have gradient descent, and sufficient small gradient descent steps on a nonzero gradient function always produce local improvement even doing the top-k. Theorem 3: Something like a gradient descent step in a small neighborhood increases utility even if we take the top-k gradient components. Proposition 3: An agent that is allowed to move arbitrarily far in one step is basically the same as a non-interactive agent. Proposition 2: If we fix some dimensions, we can compute utility ignoring the fixed dimensions. ![]() Proposition 1: We can decrease utility by moving arbitrarily far if the boundary shape vs. Theorem 2: The only way moving arbitrarily far can't arbitrarily decrease utility is if one can move arbitrarily far without arbitrarily decreasing utility. Theorem 1: If moving in a particular direction D strictly increases the utility available from moving in other directions, an optimal agent will move as far as possible along D. ![]() Here are the various results translated into prose: In particular, the various component-wise strict increase assumptions are doing a lot of work. Weaknesses: The theoretical setting makes quite strong assumptions, and doesn't really discuss the intuition behind them, so it would be easy for a cursory reader to infer that more is happening than really is. The setting is simple enough that all proofs are relatively straightforward. I also like the various ties into the incomplete contract literature. Strengths: The paper is discussing an important topic, and tries to provide a concrete theoretical model of the ideas discussed in Russell's Human Compatible and associated prior literature. They then discuss several mitigations: a low impact agent, an agent with human interaction, and agent with human interaction that moves slowly enough for the interaction to work, and a combination of low impact and interaction. ![]() They prove a theorem saying roughly that an optimal agent will perform arbitrarily poorly if there is an arbitrarily poor area of the space. Summary and Contributions: The paper provides a simplified setting in which an agent with an incompletely specified version of a ground truth human reward function can perform arbitrarily poorly according to the true human reward. Anna has previously published several peer-reviewed articles (including a cell type atlas for the retina), and is a current research analyst at GDI.Review for NeurIPS paper: Consequences of Misaligned AI NeurIPS 2020 Consequences of Misaligned AI Prior to her MPhil, she received a BS in Computer Science and Molecular Biology from MIT and an MSc in Machine Learning at University College London. Anna previously interned at the National Institutes of Health, was named a 2019 Goldwater Scholar, and served as an AI Instructor for Inspirit AI. Her current research focuses topically on the intersections between computational biology, machine learning, and genomics. ![]() Marshall Scholar and MPhil Candidate in Genomic Medicine, CambridgeĪnna Sappington is a Marshall Scholar and MPhil student studying Medical Sciences in Oncology at the University of Cambridge. Author on GDI Review 04: Future of Work: Passion Economy
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |