
Yoshua Bengio 
For conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.
Yoshua Bengio was born to two college students in Paris, France. His parents had rejected their traditional Moroccan Jewish upbringings to embrace the 1960s counterculture’s focus on personal freedom and social solidarity. He attributes his comfort in following his “scientific intuition” to this upbringing.[1]In search of a more inclusive society, the family moved to Montreal, in the French-speaking Canadian province of Quebec, when Yoshua was twelve years old.
Bengio spent his childhood as a self-described “typical nerd,” bored by high school and reading alone in the library. Like many in his generation he discovered computers during his teenage years, pooling money earned from newspaper delivery with his brother to purchase Atari 800 and Apple II personal computers. This led him to study computer engineering at McGill. Unlike a typical computer science curriculum, this included significant training in physics and continuous mathematics, providing essential mathematical foundations for his later work in machine learning.
After earning his first degree in 1986, Bengio remained at McGill to follow up with a masters’ degree in 1988 and a Ph.D. in computer science in 1991. His study was funded by a graduate scholarship from the Canadian government. He was introduced to the idea of neural networks when reading about massively parallel computation and its application to artificial intelligence. Discovering the work of Geoffey Hinton, his co-awardee, awakened an interest in the question “what is intelligence?” This chimed with his childhood interest in science fiction, in what he called a “watershed moment” for his career. Bengio found a thesis advisor, Renato De Mori, who studied speech recognition and was beginning to transition from classical AI models to statistical approaches.
As a graduate student he was able to attend conferences and workshops to participate in the tight-knit but growing community interested in neural networks, meeting what he called the “French mafia of neural nets” including co-awardee Yann LeCun. He describes Hinton and LeCun as his most important career mentors, though he did not start working with Hinton until years later. He first did a one-year postdoc at MIT with Michael I. Jordan which helped him advance his understanding of probabilistic modeling and recurrent neural networks. Then, as a postdoctoral fellow at Bell Labs, he worked with LeCun to apply techniques from his Ph.D. thesis to handwriting analysis. This contributed to a groundbreaking AT&T automatic check processing system, based around an algorithm that read the numbers written by hand on paper checks by combining neural networks with probabilistic models of sequences.
Bengio returned to Montreal in 1993 as a faculty member at its other major university, the University of Montreal. He won rapid promotion, becoming a full professor in 2002. Bengio suggests that Canada’s “socialist” commitment to spreading research funding widely and towards curiosity-driven research explains its willingness to support his work on what was then an unorthodox approach to artificial intelligence. This, he believes, laid the groundwork for Canada’s current strength in machine learning.
In 2000 he made a major contribution to natural language processing with the paper “A Neural Probabilistic Language Model.” Training networks to distinguish meaningful sentences from nonsense was difficult because there are so many different ways to express a single idea, with most combinations of words being meaningless. This causes what the paper calls the “curse of dimensionality,” demanding infeasibly large training sets and producing unworkably complex models. The paper introduced high-dimensional word embeddings as a representation of word meaning, letting networks recognize the similarity between new phrases and those included in their training sets, even when the specific words used are different. The approach has led to a major shift in machine translation and natural language understanding systems over the last decade.
Bengio’s group further improved the performance of machine translation systems by combining neural word embeddings with attention mechanisms. “Attention” is another term borrowed from human cognition. It helps networks to narrow their focus to only the relevant context at each stage of the translation in ways that reflect the context of words, including, for example, what a pronoun or article is referring to.
Together with Ian Goodfellow, one of his Ph.D. students, Bengio developed the concept of “generative adversarial networks.” Whereas most networks were designed to recognize patterns, a generative network learns to generate objects that are difficult to distinguish from those in the training set. The technique is “adversarial” because a network learning to generate plausible fakes can be trained against another network learning to identify fakes, allowing for a dynamic learning process inspired by game theory. The process is often used to facilitate unsupervised learning. It has been widely used to generate images, for example to automatically generate highly realistic photographs of non-existent people or objects for use in video games.
Bengio had been central to the institutional development of machine learning in Canada. In 2004, a program in Neural Computation and Adaptive Perception was funded within the Canadian Institute for Advanced Research (CIFAR). Hinton was its founding director, but Bengio was involved from the beginning as a Fellow of the institute. So was LeCun, with whom Bengio has been codirecting the program (now renamed Learning in Machines and Brains) since 2014. The name reflects its interdisciplinary cognitive science agenda, with a two-way passage of ideas between neuroscience and machine learning.
Thanks in part to Bengio, the Montreal area has become a global hub for work on what Bengio and his co-awardees call “deep learning.” He helped to found Mila, the Montreal Institute for Learning Algorithms (now the Quebec Artificial Intelligence Institute), to bring together researchers from four local institutions. Bengio is its scientific director, overseeing a federally funded center of excellence that co-locates faculty and students from participating institutions on a single campus. It boasts a broad range of partnerships with famous global companies and an increasing number of local machine learning startup firms. As of 2020, Google, Facebook, Microsoft and Samsung had all established satellite labs in Montreal. Bengio himself has co-founded several startup firms, most notably Element AI in 2016 which develops industrial applications for deep learning technology.
Author: Thomas Haigh
[1] Personal details and quotes are from Bengio’s Heidelberg Laureate interview - https://www.youtube.com/watch?v=PHhFI8JexLg.
Geoffrey E Hinton 
For conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.
When Geoffrey Everest Hinton decided to study science he was following in the tradition of ancestors such as George Boole, the Victorian logician whose work underpins the study of computer science and probability. Geoffrey’s great grandfather, the mathematician and bigamist Charles Hinton, coined the word “tesseract” and popularized the idea of higher dimensions, while his father, Howard Everest Hinton, was a distinguished entomologist. Their shared middle name, Everest, celebrates a relative after whom the mountain was also named (to commemorate his service as Surveyor General of India).
Having begun his time at Cambridge University with plans to study physiology and physics, before dabbling in philosophy on his way to receiving a degree in experimental psychology in 1970, Hinton concluded that none of these sciences had yet done much to explain human thought. He made a brief career shift into carpentry, in search of more tangible satisfactions, before being drawn back to academia in 1972 by the promise of artificial intelligence, which he studied at the University of Edinburgh.
By the mid-1970s an “AI winter” of high profile failures had reduced funding and enthusiasm for artificial intelligence research. Hinton was drawn to a particularly unfashionable area: the development of networks of simulated neural nodes to mimic the capabilities of human thought. This willingness to ignore conventional wisdom was to characterize his career. As he put it, “If you think it’s a really good idea and other people tell you it’s complete nonsense then you know you are really onto something.”[1]
The relationship of computers to brains had captivated many computer pioneers of the 1940s, including John von Neumann who used biological terms such as “memory,” “organ” and “neuron” when first describing the crucial architectural concepts of modern computing in the “First Draft of a Report on the EDVAC.” This was influenced by the emerging cybernetics movement, particularly the efforts of Warren McCulloch and Walter Pitts to equate networks of stylized neurons with statements in boolean logic. That inspired the idea that similar networks might, like human brains, be able to learn to recognize objects or carry out other tasks. Interest in this approach had declined after Turing Award winner Marvin Minsky, working with Seymour Papert, demonstrated that a heavily promoted class of neural networks, in which inputs were connected directly to outputs, had severe limits on its capabilities.
Graduating in 1978, Hinton followed in the footsteps of many of his forebears by seeking opportunities in the United States. Joining a group of cognitive psychologists as a Sloan Foundation postdoctoral researcher at the University of California, San Diego. Their work on neural networks drew on a broad shift in the decades after the Second World War towards Bayesian approaches to statistics, which treat probabilities as degrees of belief, updating estimates as data accumulates.
Most work on neural networks relies on what is now called a “supervised learning” approach, exposing an initially random network configuration to a “training set” of input data. Its initial responses would have no systematic relationship to the features of the input data, but the algorithm would reconfigure the network as each guess was scored against the labels provided. Thus, for example, a network trained on a large set of photographs of different species of fish might develop a reliable ability to recognize whether a new picture showed a carp or a tuna. This required a learning algorithm to automatically reconfigure the network to identify “features” in the input data that correlated with correct outputs.
Working with David Rumelhart and Ronald J. Williams, Hinton popularized what they termed a “back-propagation” algorithm in a pair of landmark papers published in 1986. The term reflected a phase in which the algorithm propagated measures of the errors produced by the network’s guesses backwards through its neurons, starting with those directly connected to the outputs. This allowed networks with intermediate “hidden” neurons between input and output layers to learn efficiently, overcoming the limitations noted by Minsky and Papert.
Their paper describes the use of the technique to perform tasks including logical and arithmetic operations, shape recognition, and sequence generation. Others had worked independently along similar lines, including Paul J. Werbos, without much impact. Hinton attributes the impact of his work with Rumelhart and Williams to the publication of a summary of their work in Nature, and the efforts they made to provide compelling demonstrations of the power of the new approach. Their findings began to revive enthusiasm for the neural network approach, which has increasingly challenged other approaches to AI such as the symbol processing work of Turing Award winners John McCarthy and Marvin Minsky and the rule-based expert systems championed by Edward Feigenbaum.
By the time the papers with Rumelhart and William were published, Hinton had begun his first faculty position, in Carnegie-Mellon’s computer science department. This was one of the leading computer science programs, with a particular focus on artificial intelligence going back to the work of Herb Simon and Allen Newell in the 1950s. But after five years there Hinton left the United States in part because of his opposition to the “Star Wars” missile defense initiative. The Defense Advanced Research Projects Agency was a major sponsor of work on AI, including Carnegie-Mellon projects on speech recognition, computer vision, and autonomous vehicles. Hinton first became a fellow of the Canadian Institute for Advanced Research (CIFAR) and moved to the Department of Computer Science at the University of Toronto. He spent three years from 1998 until 2001 setting up the Gatsby Computational Neuroscience Unit at University College London and then returned to Toronto.
Hinton’s research group in Toronto made a string of advances in what came to be known as “deep learning”, named as such because it relied on neural networks with multiple layers of hidden neurons to extract higher level features from input data. Hinton, working with David Ackley and Terry Sejnowski, had previously introduced a class of network known as the Boltzmann machine, which in a restricted form was particularly well-suited to this layered approach. His ongoing work to develop machine learning algorithms spanned a broad range of approaches to improve the power and efficiency of systems for probabilistic inference. In particular, his joint work with Radford Neal and Richard Zemel in the early 1990s introduced variational methods to the machine learning community.
Hinton carried this work out with dozens of dozens of Ph.D. students and post-doctoral collaborators, many of whom went on to distinguished careers in their own right. He shared the Turing award with one of them, Yann LeCun, who spent 1987-88 as a post-doctoral fellow in Toronto after Hinton served as the external examiner on his Ph.D. in Paris. From 2004 until 2013 he was the director of the program on "Neural Computation and Adaptive Perception" funded by the Canadian Institute for Advanced Research. That program included LeCun and his other coawardee, Yoshua Bengio. The three met regularly to share ideas as part of a small group. Hinton has advocated for the importance of senior researchers continuing to do hands-on programming work to effectively supervise student teams.
Hinton has long been recognized as a leading researcher in his field, receiving his first honorary doctorate from the University of Edinburgh in 2001, three years after he became a fellow of the Royal Society. In the 2010s his career began to shift from academia to practice as the group’s breakthroughs underpinned new capabilities for object classification and speech recognition appearing in widely used systems produced by cloud computing companies such as Google and Facebook. Their potential was vividly demonstrated in 2012 when a program developed by Hinton with his students Alex Krizhevsky and Ilya Sutskever greatly outperformed all other entrants to ImageNet, an image recognition competition involving a thousand different object types. It used graphics processor chips to run code combining several of the group’s techniques in a network of “60 million parameters and 650,000 neurons” composed of “five convolutional layers, some of which are followed by max-pooling layers, and three globally-connected layers with a final 1000-way softmax.”[2] The “convolutional layers” were an approach originally conceived of by LeCun, to which Hinton’s team had made substantial improvements.
This success prompted Google to acquire a company, DDNresearch, founded by Hinton and the two students to commercialize their achievements. The system allowed Google to greatly improve its automatic classification of photographs. Following the acquisition, Hinton became a vice president and engineering fellow at Google. In 2014 he retired from teaching at the university to establish a Toronto branch of Google Brain. Since 2017, he has held a volunteer position as chief scientific advisor to Toronto’s Vector Institute for the application of machine learning in Canadian health care and other industries. Hinton thinks that in the future teaching people how to train computers to perform tasks will be at least as important as teaching them how to program computers.
Hinton has been increasingly vocal in advocating for his long-standing belief in the potential of “unsupervised” training systems, in which the learning algorithm attempts to identify features without being provided large numbers of labelled examples. As well as being useful these unsupervised learning methods have, Hinton believes, brought us closer to understanding the learning mechanisms used by human brains.
Yann LeCun 
For conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.
Yann LeCun spent his early life in France, growing up in the suburbs of Paris. (His name was originally Le Cun, but he dropped the space after discovering that Americans were confused and treated Le as his middle name). His father was an engineer, whose interests in electronics and mechanics were passed on to Yann during a boyhood of tinkering. As a teenager he enjoyed playing in a band as well as science and engineering. He remained in the region to study, earning the equivalent of a masters’ degree from the École Supérieure d'Ingénieurs en Électrotechnique et Électronique, one of France’s network of competitive and specialized non-university schools established to train to the country’s future elite. His work there focused on microchip design and automation.
LeCun attributes his longstanding interest in machine intelligence to seeing the murderous mainframe HAL, whom he encountered as a young boy in the movie 2001. He began independent research on machine learning as an undergraduate, making it the centerpiece of his Ph.D. work at the Sorbonne Université (then called Université Pierre et Marie Curie). LeCun’s research closely paralleled discoveries made independently by his co-awardee Geoffrey Hinton. Like Hinton he had been drawn to the then-unfashionable neural network approach to artificial intelligence, and like Hinton he discovered the well-publicized limitations of simple neural networks could be overcome with what was later called the “back-propagation” algorithm able to efficiently train “hidden” neurons in intermediate layers between the input and output nodes.
A workshop held in Les Houches in the French Alps in 1985 first brought LeCun into direct contact with the international research community working along these lines. It was there that he met Terry Sejnowski, a close collaborator of Hinton’s whose work on backpropagation was not yet published. A few months later when Hinton was in Paris he introduced himself to LeCun, which led to an invitation to a summer workshop at Carnegie Mellon and a post-doctoral year with Hinton’s new research group in Toronto. This collaboration endured: two decades later, in 2004, he worked with Hinton to establish a program on Neural Computation and Adaptive Perception through the Canadian Institute for Advanced Research (CIFAR). Since 2014 he has co-directed it, now renamed Learning in Machines & Brains, with his co-awardee Yoshua Bengio.
At the conclusion of the fellowship, in 1988, LeCun joined the staff of Bell Labs, a renowned center of computer science research. Its Adaptive Systems Research department, headed by Lawrence D. Jackel, focused on machine learning. Jackel was heavily involved in establishing the Neural Networks for Computing workshop series, later run by LeCun and renamed the “Learning Workshop”. It was held annually from 1986 to 2012 at the Snowbird resort in Utah. The invitation-only event brought together an interdisciplinary group of researchers to exchange ideas on the new techniques and learn how to apply them in their own work.
LeCun’s work at Bell Labs focused on the neural network architectures and learning algorithms. His most far-reaching contribution was a new approach, called the “convolutional neural network.” Many networks are designed to recognize visual patterns, but a simple learning model trained to respond to a feature in one location (say the top left of an image) would not respond to the same feature in a different location. The convolutional network is designed so that a filter or detector is swept across the grid of input values. As a result, higher level portions of the network would be alerted to the pattern wherever it occured in the image. This made training faster and reduced the overall size of networks, boosting their performance. This work was an extension of LeCun’s earlier achievements, because convolutional networks rely on backpropagation techniques to train their hidden layers.
As well as developing the convolutional approach, LeCun pioneered its application in “graph transformer networks” to recognize printed and handwritten text. This was used in a widely deployed system to read numbers written on checks, produced in the early 1990s in collaboration with Bengio, Leon Bottou and Patrick Haffner. At that time handwriting recognition was enormously challenging, despite an industry-wide push to make it work reliably in “slate” computers (the ancestors of today’s tablet systems). Automated check clearing was an important application, as millions were processed daily. The job required very high accuracy, but unlike general handwriting analysis required only digit recognition, which reduced the number of valid symbols. The technology was licensed by specialist providers of bank systems such as National Cash Register. LeCun suggests that at one point it was reading more than 10% of all the checks written in the US.
Check processing work was carried out in centralized locations, which could be equipped with the powerful computers needed to run neural networks. Increases in computer power made it possible to build more complex networks and deploy convolutional approaches more widely. Today, for example, the technique is used on Android smartphones to power the speech recognition features of the Google Assistant such as real-time transcription, and the camera-based translation features of the translation app.
His other main contribution at Bell Labs was the development of "Optimal Brain Damage" regularization methods. This evocatively named concept identifies ways to simplify neutral networks by removing unnecessary connections. Done properly, this “brain damage” could produce simpler, faster networks that performed as well or better than the full-size version.
In 1996 AT&T, which had failed to establish itself in the computer industry, spun off most of Bell Labs and its telecommunications hardware business into a new company, Lucent Technologies. LeCun stayed behind to run an AT&T Labs group focused on image processing research. His primary accomplishment there was the DjVu image compression technology, developed with Léon Bottou, Patrick Haffner, and Paul G. Howard. High speed Internet access was rare, so as a communications company AT&T’s services would be more valuable if large documents could be downloaded more quickly. LeCun’s algorithm compressed files more effectively than Adobe’s Acrobat software, but lacked the latter’s broad support. It was extensively used by the Internet Archive in the early 2000s.
LeCun left industrial research in 2003, for a faculty position as a professor of computer science at New York University’s Courant Institute of Mathematical Sciences, the leading center for applied mathematical research in the US. It has a strong presence in scientific computation and particular focus on machine learning. He took the opportunity to restore his research focus on neural networks. At NYU LeCun ran the Computational and Biological Learning Lab, which continued his work on algorithms for machine learning and applications for computer vision. He is still at NYU, though as his reputation has grown he has added several new titles and additional appointments. Most notable of these is Silver endowed professorship awarded to LeCun in 2008, funded by a generous bequest from Polaroid co-founder Julius Silver to allow NYU to attract and retain top faculty.
LeCun had retained his love of building things, including hobbies constructing airplanes, electronic musical instrument, and robots. At NYU he combined this interest in robotics with his work on convolutional networks for computer vision to participate in DARPA-sponsored projects for autonomous navigation. His most important institutional initiative was work in 2011 to create the NYU Center for Data Science, which he directed until 2014. The center offers undergraduate and graduate degrees and functions as a focal point for data science initiatives across the university.
By the early 2010s the leading technology companies were scrambling to deploy machine learning systems based on neural networks. Like other leading researchers LeCun was courted by the tech giants, and from December 2013 he was hired by Facebook to create FAIR (Facebook AI Research), which he led until 2018 in New York, sharing his time between NYU and FAIR. That made him the public face of AI at Facebook, broadening his role from a researcher famous within several fields to a tech industry leader frequently discussed in newspapers and magazines. In 2018, he stepped down from the director role and became Facebook’s Chief AI Scientist to focus on strategy and scientific leadership.
Author: Thomas Haigh






