New Kind of Memory for AI

AI researchers have typically tried to get around the issues posed by by Montezuma’s Revenge and Pitfall! by instructing reinforcement-learning algorithms to explore randomly at times, while adding rewards for exploration—what’s known as “intrinsic motivation.”

But the Uber researchers believe this fails to capture an important aspect of human curiosity. “We hypothesize that a major weakness of current intrinsic motivation algorithms is detachment,” they write. “Wherein the algorithms forget about promising areas they have visited, meaning they do not return to them to see if they lead to new states.”

The team’s new family of reinforcement-learning algorithms, dubbed Go-Explore, remember where they have been before, and will return to a particular area or task later on to see if it might help provide better overall results. The researchers also found that adding a little bit of domain knowledge, by having human players highlight interesting or important areas, sped up the algorithms’ learning and progress by a remarkable amount. This is significant because there may be many real-world situations where you would want an algorithm and a person to work together to solve a hard task.

Their code scores an average of 400,000 points in Montezuma’s Revenge—an order of magnitude higher than the average for human experts. In Pitfall! it racks up 21,000 on average, far better than most human players.


Folksonomies: artificial intelligence technology ai

/science/computer science/artificial intelligence (0.905934)
/education/homework and study tips (0.864824)
/education/teaching and classroom resources (0.753451)

New Kind of Memory (0.797046 (:0.000000)), team’s new family (0.654337 (:0.000000)), new states (0.610880 (:0.000000)), particular area (0.608168 (:0.000000)), Uber researchers (0.588792 (:0.000000)), reinforcement-learning algorithms (0.586471 (:0.000000)), human players (0.585597 (:0.000000)), algorithms (0.576974 (:0.000000)), important areas (0.575091 (:0.000000)), intrinsic motivation (0.573632 (:0.000000)), real-world situations (0.559226 (:0.000000)), little bit of domain knowledge (0.555144 (:0.000000)), important aspect of human curiosity (0.554698 (:0.000000)), researchers (0.548369 (:0.000000)), areas (0.536357 (:0.000000)), reinforcement (0.532100 (:0.000000)), overall results (0.531854 (:0.000000)), Montezuma’s Revenge (0.526880 (:0.000000)), code (0.522910 (:0.000000)), algorithms’ learning (0.518234 (:0.000000)), algorithm (0.512929 (:0.000000)), times (0.512120 (:0.000000)), points (0.511815 (:0.000000)), task (0.510674 (:0.000000)), issues (0.509662 (:0.000000)), person (0.509256 (:0.000000)), rewards (0.509229 (:0.000000)), exploration (0.507611 (:0.000000)), detachment (0.504921 (:0.000000)), Pitfall (0.504059 (:0.000000)), order of magnitude (0.504000 (:0.000000)), major weakness of current intrinsic motivation algorithms (0.463363 (:0.000000)), hard task (0.462788 (:0.000000)), human experts (0.460585 (:0.000000)), remarkable amount (0.457483 (:0.000000)), average (0.448341 (:0.000000))

Montezuma:Location (0.775569 (:0.000000)), AI:Organization (0.763266 (:0.000000))

Motivation (0.946076): dbpedia_resource
Scientific method (0.713580): dbpedia_resource
Knowledge (0.683497): dbpedia_resource
Analysis of algorithms (0.604401): dbpedia_resource
Human (0.561697): dbpedia_resource
Algorithm (0.543424): dbpedia_resource
Psychology (0.519488): dbpedia_resource
Alan Turing (0.512968): dbpedia_resource

 Uber has cracked two classic ’80s video games by giving an AI algorithm a new type of memory
Periodicals>Journal Article:  Knight, Will (2018-11-26), Uber has cracked two classic ’80s video games by giving an AI algorithm a new type of memory, MIT Review, Retrieved on 2019-03-02
  • Source Material []
  • Folksonomies: artificial intelligence technology