Hardware for Deep Learning

William Dally, Stanford University

Bill is Chief Scientist and Senior Vice President of Research at NVIDIA Corporation and a Professor (Research) and former chair of Computer Science at Stanford University. Bill and his group have developed system architecture, network architecture, signaling, routing, and synchronization technology that can be found in most large parallel computers today. While at Bell Labs Bill contributed to the BELLMAC32 microprocessor and designed the MARS hardware accelerator. At Caltech he designed the MOSSIM Simulation Engine and the Torus Routing Chip which pioneered wormhole routing and virtual-channel flow control. At the Massachusetts Institute of Technology his group built the J-Machine and the M-Machine, experimental parallel computer systems that pioneered the separation of mechanisms from programming models and demonstrated very low overhead synchronization and communication mechanisms. At Stanford University his group has developed the Imagine processor, which introduced the concepts of stream processing and partitioned register organizations, the Merrimac supercomputer, which led to GPU computing, and the ELM low-power processor. Bill is a Member of the National Academy of Engineering, a Fellow of the IEEE, a Fellow of the ACM, and a Fellow of the American Academy of Arts and Sciences. He has received the ACM Eckert-Mauchly Award, the IEEE Seymour Cray Award, the ACM Maurice Wilkes award, and the IPSJ FUNAI Achievement Award. He currently leads projects on computer architecture, network architecture, circuit design, and programming systems. He has published over 200 papers in these areas, holds over 100 issued patents, and is an author of the textbooks, Digital Design: A Systems Approach, Digital Systems Engineering, and Principles and Practices of Interconnection Networks.

Deep Neural Networks (DNNs) have come to dominate application areas including speech recognition, image understanding, and natural language processing. Most of the technology of DNNs was developed by 1990. However, they were not widely applied until after 2010 when large data sets and powerful GPUs for training became available. These networks place heavy demands on computing hardware for both training and inference. GPUs are ideally suited to training DNNs because of their high floating-point efficiency and memory bandwidth. Efficient communication is essential to scale training across multiple GPUs. For inference, hardware accelerators can offer advantages particularly on sparse and compressed networks. This talk will examine the current state of the art in hardware for deep learning.

This content is restricted to our MIG members and members of the MIT community. Please login or contact us for more information about our partner programs.

Speaker

Abstract