Jürgen Schmidhuber
Encyclopedia
Jürgen Schmidhuber is a computer scientist
and artist
known for his work on machine learning
, universal Artificial Intelligence
(AI), artificial neural network
s, digital physics
, and low-complexity art
. His contributions also include generalizations of Kolmogorov complexity
and the Speed Prior
. From 2004 to 2009 he was professor of Cognitive Robotics
at the Tech. University Munich
. Since 1995 he has been co-director of the Swiss AI Lab IDSIA
in Lugano
, since 2009 also professor of Artificial Intelligence
at the University of Lugano
. In honor of his achievements he was elected to the European Academy of Sciences and Arts
in 2008.
s found in human brain
s. A particularly successful model of this type is called Long short term memory
. From training sequences it learns
to solve numerous tasks unsolvable by previous such models. Applications range from automatic music composition to speech recognition
, reinforcement learning
and robotics
in partially observable environments. As of 2010, his group has the best results on benchmarks in automatic handwriting recognition
, obtained with deep neural networks
and recurrent neural networks.
. In the same year he published the first work on Meta-genetic programming
. Since then he has co-authored numerous additional papers on artificial evolution
. Applications include robot
control, soccer learning, drag
minimization, and time series
prediction. He received several best paper awards at scientific conferences on evolutionary computation.
for neural networks
based on principles of the market economy
(inspired by John Holland
's bucket brigade
algorithm for classifier
systems): adaptive neurons compete for being active in response to certain input patterns; those that are active when there is external reward get stronger synapses, but active neurons have to pay those that activated them, by transferring parts of their synapse
strengths, thus rewarding "hidden" neurons setting the stage for later success.
and creativity
for an autonomous agent. The agent is equipped with an adaptive predictor
trying to predict future events from the history of previous events and actions. A reward-maximizing, reinforcement learning
, adaptive controller
is steering the agent and gets curiosity reward for executing action sequences that improve the predictor. This discourages it from executing actions leading to boring outcomes that are either predictable or totally unpredictable. Instead the controller is motivated to learn actions that help the predictor to learn new, previously unknown regularities in its environment, thus improving its model of the world, which in turn can greatly help to solve externally given tasks. This has become an important concept of developmental robotics
. Schmidhuber argues that his corresponding formal theory
of creativity
explains essential aspects of art
, science
, music
, and humor.
(ICA) called predictability
minimization. It is based on co-evolution
of adaptive predictors and initially random, adaptive feature detectors processing input patterns from the environment. For each detector there is a predictor trying to predict its current value from the values of neighboring detectors, while each detector is simultaneously trying to become as unpredictable as possible. It can be shown that the best the detectors can do is to create a factorial
code
of the environment, that is, a code that conveys all the information about the inputs such that the code components are statistically independent, which is desirable for many pattern recognition
applications.
´s assumption (1967) that the history of the universe is computable. He pointed out that the simplest explanation of the universe would be a very simple Turing machine
programmed to systematically execute all possible programs computing all possible histories for all types of computable physical laws. He also pointed out that there is an optimally efficient way of computing all computable universes based on Leonid Levin
´s universal search algorithm (1973). In 2000 he expanded this work by combining Ray Solomonoff
´s theory of inductive inference with the assumption that quickly computable universes are more likely than others. This work on digital physics
also led to limit-computable generalizations of algorithmic information
or Kolmogorov complexity
and the concept of Super Omegas, which are limit-computable numbers that are even more random (in a certain sense) than Gregory Chaitin
´s number of wisdom Omega
.
learning algorithms and universal AI
(see Gödel machine
). Contributions include the first theoretically optimal
decision makers living in environments obeying arbitrary unknown but computable probabilistic laws, and mathematically sound general problem solvers such as the remarkable asymptotically fastest algorithm for all well-defined problems, by his former postdoc Marcus Hutter
. Based on the theoretical results obtained in the early 2000s, Schmidhuber is actively promoting the view that in the new millennium
the field of general AI
has matured and become a real formal science
.
works (since 1997) can be described by very short computer programs containing very few bit
s of information, and reflect his formal theory of beauty
based on the concepts of Kolmogorov complexity
and minimum description length
.
Schmidhuber writes that since age 15 or so his main scientific ambition has been to build an optimal scientist, then retire. First he wants to build a scientist better than himself (he quips that his colleagues claim that should be easy) who will then do the remaining work. He claims he "cannot see any more efficient way of using and multiplying the little creativity he's got".
Computer scientist
A computer scientist is a scientist who has acquired knowledge of computer science, the study of the theoretical foundations of information and computation and their application in computer systems....
and artist
Artist
An artist is a person engaged in one or more of any of a broad spectrum of activities related to creating art, practicing the arts and/or demonstrating an art. The common usage in both everyday speech and academic discourse is a practitioner in the visual arts only...
known for his work on machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
, universal Artificial Intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
(AI), artificial neural network
Neural network
The term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes...
s, digital physics
Digital physics
In physics and cosmology, digital physics is a collection of theoretical perspectives based on the premise that the universe is, at heart, describable by information, and is therefore computable...
, and low-complexity art
Low-complexity art
Low-Complexity Art was introduced by Jürgen Schmidhuber in 1997. He characterizes it as the computer age equivalent of minimal art. Low-Complexity Art is based on algorithmic information theory: it has low Kolmogorov complexity, that is, it can be generated by a short algorithm. Schmidhuber...
. His contributions also include generalizations of Kolmogorov complexity
Kolmogorov complexity
In algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...
and the Speed Prior
Speed prior
Jürgen Schmidhuber's speed prior is a complexity measure similar to Kolmogorov complexity, except that it is based on computation speed as well as programlength.The speed prior complexity of a program is its...
. From 2004 to 2009 he was professor of Cognitive Robotics
Robotics
Robotics is the branch of technology that deals with the design, construction, operation, structural disposition, manufacture and application of robots...
at the Tech. University Munich
Munich
Munich The city's motto is "" . Before 2006, it was "Weltstadt mit Herz" . Its native name, , is derived from the Old High German Munichen, meaning "by the monks' place". The city's name derives from the monks of the Benedictine order who founded the city; hence the monk depicted on the city's coat...
. Since 1995 he has been co-director of the Swiss AI Lab IDSIA
IDSIA
The Swiss institute for Artificial Intelligence IDSIA was founded in 1988 by the private Dalle Molle foundation...
in Lugano
Lugano
Lugano is a city of inhabitants in the city proper and a total of over 145,000 people in the agglomeration/city region, in the south of Switzerland, in the Italian-speaking canton of Ticino, which borders Italy...
, since 2009 also professor of Artificial Intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
at the University of Lugano
Lugano
Lugano is a city of inhabitants in the city proper and a total of over 145,000 people in the agglomeration/city region, in the south of Switzerland, in the Italian-speaking canton of Ticino, which borders Italy...
. In honor of his achievements he was elected to the European Academy of Sciences and Arts
European Academy of Sciences and Arts
The European Academy of Sciences and Arts was created in 1990 in Salzburg, Austria by heart surgeon Felix Unger of Salzburg; the cardinal archbishop of Vienna, Franz König; and the political scientist and philosopher Nikolaus Lobkowicz....
in 2008.
Recurrent neural networks
The dynamic recurrent neural networks developed in his lab are simplified mathematical models of the biological neural networkBiological neural network
In neuroscience, a biological neural network describes a population of physically interconnected neurons or a group of disparate neurons whose inputs or signalling targets define a recognizable circuit. Communication between neurons often involves an electrochemical process...
s found in human brain
Human brain
The human brain has the same general structure as the brains of other mammals, but is over three times larger than the brain of a typical mammal with an equivalent body size. Estimates for the number of neurons in the human brain range from 80 to 120 billion...
s. A particularly successful model of this type is called Long short term memory
Long short term memory
Long short term memory or LSTM is a recurrent neural network architecture published in 1997 by Sepp Hochreiter and Jürgen Schmidhuber...
. From training sequences it learns
Learning
Learning is acquiring new or modifying existing knowledge, behaviors, skills, values, or preferences and may involve synthesizing different types of information. The ability to learn is possessed by humans, animals and some machines. Progress over time tends to follow learning curves.Human learning...
to solve numerous tasks unsolvable by previous such models. Applications range from automatic music composition to speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
, reinforcement learning
Reinforcement learning
Inspired by behaviorist psychology, reinforcement learning is an area of machine learning in computer science, concerned with how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward...
and robotics
Robotics
Robotics is the branch of technology that deals with the design, construction, operation, structural disposition, manufacture and application of robots...
in partially observable environments. As of 2010, his group has the best results on benchmarks in automatic handwriting recognition
Handwriting recognition
Handwriting recognition is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning or...
, obtained with deep neural networks
Neural Networks
Neural Networks is the official journal of the three oldest societies dedicated to research in neural networks: International Neural Network Society, European Neural Network Society and Japanese Neural Network Society, published by Elsevier...
and recurrent neural networks.
Artificial evolution / genetic programming
As an undergrad at TUM Schmidhuber evolved computer programs through genetic algorithms. The method was published in 1987 as one of the first papers in the emerging field that later became known as genetic programmingGenetic programming
In artificial intelligence, genetic programming is an evolutionary algorithm-based methodology inspired by biological evolution to find computer programs that perform a user-defined task. It is a specialization of genetic algorithms where each individual is a computer program...
. In the same year he published the first work on Meta-genetic programming
Genetic programming
In artificial intelligence, genetic programming is an evolutionary algorithm-based methodology inspired by biological evolution to find computer programs that perform a user-defined task. It is a specialization of genetic algorithms where each individual is a computer program...
. Since then he has co-authored numerous additional papers on artificial evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...
. Applications include robot
Robot
A robot is a mechanical or virtual intelligent agent that can perform tasks automatically or with guidance, typically by remote control. In practice a robot is usually an electro-mechanical machine that is guided by computer and electronic programming. Robots can be autonomous, semi-autonomous or...
control, soccer learning, drag
Drag (physics)
In fluid dynamics, drag refers to forces which act on a solid object in the direction of the relative fluid flow velocity...
minimization, and time series
Time series
In statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the...
prediction. He received several best paper awards at scientific conferences on evolutionary computation.
Neural economy
In 1989 he created the first learning algorithmAlgorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
for neural networks
Neural Networks
Neural Networks is the official journal of the three oldest societies dedicated to research in neural networks: International Neural Network Society, European Neural Network Society and Japanese Neural Network Society, published by Elsevier...
based on principles of the market economy
Market economy
A market economy is an economy in which the prices of goods and services are determined in a free price system. This is often contrasted with a state-directed or planned economy. Market economies can range from hypothetically pure laissez-faire variants to an assortment of real-world mixed...
(inspired by John Holland
John Holland
John Holland may refer to:*Sir John Holland, 1st Baronet , English politician*Sir John Holland, 2nd Baronet , British politician*John Holland, on the Los Angeles County Civil Defense and Disaster Commission in 1960s...
's bucket brigade
Bucket brigade
A bucket brigade or human chain is a method for transporting items where items are passed from one stationary person to the next.The method was important in firefighting before the advent of hand pumped fire engines, whereby firefighters would pass buckets to each other to extinguish a blaze. A...
algorithm for classifier
Classifier
Classifier may refer to:*Classifier *Classifier *Classifier *Hierarchical classifier*Linear classifier...
systems): adaptive neurons compete for being active in response to certain input patterns; those that are active when there is external reward get stronger synapses, but active neurons have to pay those that activated them, by transferring parts of their synapse
Synapse
In the nervous system, a synapse is a structure that permits a neuron to pass an electrical or chemical signal to another cell...
strengths, thus rewarding "hidden" neurons setting the stage for later success.
Artificial curiosity and creativity
In 1990 he published the first in a long series of papers on artificial curiosityCuriosity
Curiosity is an emotion related to natural inquisitive behavior such as exploration, investigation, and learning, evident by observation in human and many animal species. The term can also be used to denote the behavior itself being caused by the emotion of curiosity...
and creativity
Creativity
Creativity refers to the phenomenon whereby a person creates something new that has some kind of value. What counts as "new" may be in reference to the individual creator, or to the society or domain within which the novelty occurs...
for an autonomous agent. The agent is equipped with an adaptive predictor
Predictor
Predictor may refer to:* a predictor variable, also known as an independent variable* the Kerrison Predictor, a military fire-control computer* something which makes a prediction* a branch predictor, a part of many modern processors...
trying to predict future events from the history of previous events and actions. A reward-maximizing, reinforcement learning
Reinforcement learning
Inspired by behaviorist psychology, reinforcement learning is an area of machine learning in computer science, concerned with how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward...
, adaptive controller
Controller (control theory)
In control theory, a controller is a device which monitors and affects the operational conditions of a given dynamical system. The operational conditions are typically referred to as output variables of the system which can be affected by adjusting certain input variables...
is steering the agent and gets curiosity reward for executing action sequences that improve the predictor. This discourages it from executing actions leading to boring outcomes that are either predictable or totally unpredictable. Instead the controller is motivated to learn actions that help the predictor to learn new, previously unknown regularities in its environment, thus improving its model of the world, which in turn can greatly help to solve externally given tasks. This has become an important concept of developmental robotics
Developmental robotics
Developmental Robotics , sometimes called epigenetic robotics, is a methodology that uses metaphors from neural development and developmental psychology to develop the mind for autonomous robots. The focus is on a single or multiple robots going through stages of autonomous mental development...
. Schmidhuber argues that his corresponding formal theory
Formal theory
Formal theory can refer to:* Another name for a theory which is expressed in formal language.* An axiomatic system, something representable by symbols and its operators...
of creativity
Creativity
Creativity refers to the phenomenon whereby a person creates something new that has some kind of value. What counts as "new" may be in reference to the individual creator, or to the society or domain within which the novelty occurs...
explains essential aspects of art
Art
Art is the product or process of deliberately arranging items in a way that influences and affects one or more of the senses, emotions, and intellect....
, science
Science
Science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe...
, music
Music
Music is an art form whose medium is sound and silence. Its common elements are pitch , rhythm , dynamics, and the sonic qualities of timbre and texture...
, and humor.
Unsupervised learning / factorial codes
During the early 1990s Schmidhuber also invented a neural method for nonlinear independent component analysisIndependent component analysis
Independent component analysis is a computational method for separating a multivariate signal into additive subcomponents supposing the mutual statistical independence of the non-Gaussian source signals...
(ICA) called predictability
Predictability
Predictability is the degree to which a correct prediction or forecast of a system's state can be made either qualitatively or quantitatively.-Predictability and Causality:...
minimization. It is based on co-evolution
Co-evolution
In biology, coevolution is "the change of a biological object triggered by the change of a related object." Coevolution can occur at many biological levels: it can be as microscopic as correlated mutations between amino acids in a protein, or as macroscopic as covarying traits between different...
of adaptive predictors and initially random, adaptive feature detectors processing input patterns from the environment. For each detector there is a predictor trying to predict its current value from the values of neighboring detectors, while each detector is simultaneously trying to become as unpredictable as possible. It can be shown that the best the detectors can do is to create a factorial
Factorial
In mathematics, the factorial of a non-negative integer n, denoted by n!, is the product of all positive integers less than or equal to n...
code
Code
A code is a rule for converting a piece of information into another form or representation , not necessarily of the same type....
of the environment, that is, a code that conveys all the information about the inputs such that the code components are statistically independent, which is desirable for many pattern recognition
Pattern recognition
In machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...
applications.
Kolmogorov complexity / computer-generated universe
In 1997 Schmidhuber published a paper based on Konrad ZuseKonrad Zuse
Konrad Zuse was a German civil engineer and computer pioneer. His greatest achievement was the world's first functional program-controlled Turing-complete computer, the Z3, which became operational in May 1941....
´s assumption (1967) that the history of the universe is computable. He pointed out that the simplest explanation of the universe would be a very simple Turing machine
Turing machine
A Turing machine is a theoretical device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a...
programmed to systematically execute all possible programs computing all possible histories for all types of computable physical laws. He also pointed out that there is an optimally efficient way of computing all computable universes based on Leonid Levin
Leonid Levin
-External links:* at Boston University....
´s universal search algorithm (1973). In 2000 he expanded this work by combining Ray Solomonoff
Ray Solomonoff
Ray Solomonoff was the inventor of algorithmic probability, and founder of algorithmic information theory, He was an originator of the branch of artificial intelligence based on machine learning, prediction and probability...
´s theory of inductive inference with the assumption that quickly computable universes are more likely than others. This work on digital physics
Digital physics
In physics and cosmology, digital physics is a collection of theoretical perspectives based on the premise that the universe is, at heart, describable by information, and is therefore computable...
also led to limit-computable generalizations of algorithmic information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...
or Kolmogorov complexity
Kolmogorov complexity
In algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...
and the concept of Super Omegas, which are limit-computable numbers that are even more random (in a certain sense) than Gregory Chaitin
Gregory Chaitin
Gregory John Chaitin is an Argentine-American mathematician and computer scientist.-Mathematics and computer science:Beginning in 2009 Chaitin has worked on metabiology, a field parallel to biology dealing with the random evolution of artificial software instead of natural software .Beginning in...
´s number of wisdom Omega
Chaitin's constant
In the computer science subfield of algorithmic information theory, a Chaitin constant or halting probability is a real number that informally represents the probability that a randomly constructed program will halt...
.
Universal AI
Important research topics of his group include universalUniversal property
In various branches of mathematics, a useful construction is often viewed as the “most efficient solution” to a certain problem. The definition of a universal property uses the language of category theory to make this notion precise and to study it abstractly.This article gives a general treatment...
learning algorithms and universal AI
Ai
AI, A.I., Ai, or ai may refer to:- Computers :* Artificial intelligence, a branch of computer science* Ad impression, in online advertising* .ai, the ISO Internet 2-letter country code for Anguilla...
(see Gödel machine
Gödel machine
A Gödel machine is a self-improving optimal problem solver computer program invented by Jürgen Schmidhuber. It rewrites its own code when it can prove the new strategy is optimal.-References:* *...
). Contributions include the first theoretically optimal
Optimization (mathematics)
In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....
decision makers living in environments obeying arbitrary unknown but computable probabilistic laws, and mathematically sound general problem solvers such as the remarkable asymptotically fastest algorithm for all well-defined problems, by his former postdoc Marcus Hutter
Marcus Hutter
Marcus Hutter is a German computer scientist and professor at the Australian National University. Hutter was born and educated in Munich, where he studied physics and computer science...
. Based on the theoretical results obtained in the early 2000s, Schmidhuber is actively promoting the view that in the new millennium
Millennium
A millennium is a period of time equal to one thousand years —from the Latin phrase , thousand, and , year—often but not necessarily related numerically to a particular dating system....
the field of general AI
Ai
AI, A.I., Ai, or ai may refer to:- Computers :* Artificial intelligence, a branch of computer science* Ad impression, in online advertising* .ai, the ISO Internet 2-letter country code for Anguilla...
has matured and become a real formal science
Formal science
The formal sciences are the branches of knowledge that are concerned with formal systems, such as logic, mathematics, theoretical computer science, information theory, systems theory, decision theory, statistics, and some aspects of linguistics....
.
Low-complexity art / theory of beauty
Schmidhuber's low-complexity artLow-complexity art
Low-Complexity Art was introduced by Jürgen Schmidhuber in 1997. He characterizes it as the computer age equivalent of minimal art. Low-Complexity Art is based on algorithmic information theory: it has low Kolmogorov complexity, that is, it can be generated by a short algorithm. Schmidhuber...
works (since 1997) can be described by very short computer programs containing very few bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
s of information, and reflect his formal theory of beauty
Beauty
Beauty is a characteristic of a person, animal, place, object, or idea that provides a perceptual experience of pleasure, meaning, or satisfaction. Beauty is studied as part of aesthetics, sociology, social psychology, and culture...
based on the concepts of Kolmogorov complexity
Kolmogorov complexity
In algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...
and minimum description length
Minimum description length
The minimum description length principle is a formalization of Occam's Razor in which the best hypothesis for a given set of data is the one that leads to the best compression of the data. MDL was introduced by Jorma Rissanen in 1978...
.
Schmidhuber writes that since age 15 or so his main scientific ambition has been to build an optimal scientist, then retire. First he wants to build a scientist better than himself (he quips that his colleagues claim that should be easy) who will then do the remaining work. He claims he "cannot see any more efficient way of using and multiplying the little creativity he's got".
External references and sources
- Google Scholar: Numerous scientific articles referencing Schmidhuber's work
- Scholarpedia article on Universal Search, discussing Schmidhuber's Speed PriorSpeed priorJürgen Schmidhuber's speed prior is a complexity measure similar to Kolmogorov complexity, except that it is based on computation speed as well as programlength.The speed prior complexity of a program is its...
, Optimal Ordered Problem Solver, Gödel Machine - German article on Schmidhuber in CIO magazine: "Der ideale Wissenschaftler" (the ideal scientist)
- Build An Optimal Scientist, Then Retire: Interview with J. Schmidhuber in H+ Magazine, 2010
- Video of Schmidhuber's talk on artificial curiosity and creativity at the Singularity SummitSingularity SummitThe Singularity Summit is the annual conference of the Singularity Institute for Artificial Intelligence. It was started in 2006 at Stanford University by Ray Kurzweil, Eliezer Yudkowsky, and Peter Thiel, and the subsequent summits in 2007, 2008, 2009, 2010, and 2011 have been held in San...
2009, NYC