NIPS 2017 Workshop: Deep Learning At Supercomputer Scale
Five years ago, it took more than a month to train a state-of-the-art image recognition model on the ImageNet dataset. Earlier this year, Facebook demonstrated that such a model could be trained in an hour. However, if we could parallelize this training problem across the world’s fastest supercomputers (~100 PFlops), it would be possible to train the same model in under a minute. This workshop is about closing that gap: how can we turn months into minutes and increase the productivity of machine learning researchers everywhere?
This one-day workshop will facilitate active debate and interaction across many different disciplines. The conversation will range from algorithms to infrastructure to silicon, with invited speakers from Cerebras, DeepMind, Facebook, Google, OpenAI, and other organizations. When should synchronous training be preferred over asynchronous training? Are large batch sizes the key to reach supercomputer scale, or is it possible to fully utilize a supercomputer at batch size one? How important is sparsity in enabling us to scale? Should sparsity patterns be structured or unstructured? To what extent do we expect to customize model architectures for particular problem domains, and to what extent can a “single model architecture” deliver state-of-the-art results across many different domains? How can new hardware architectures unlock even higher real-world training performance?
Our goal is bring people who are trying to answer any of these questions together in hopes that cross pollination will accelerate progress towards deep learning at true supercomputer scale.
Confirmed Speakers
- Priya Goyal - “ImageNet in 1 Hour” - Facebook Research
- Timothy Lillicrap - “Scalable RL & AlphaGo” - DeepMind
- Nitish Keskar - “Generalization Gap” - Salesforce Research
- Scott Gray - “Small World Network Architectures” - OpenAI
- Matthew Johnson & Daniel Duckworth - “KFAC and Natural Gradients” - Google Brain
- Tim Salimans - “Evolutionary Strategies” - OpenAI
- Elad Hoffer, Itay Hubara - “Closing the Generalization Gap” - Technion
- Michael James - “Scaling with Small Batches” - Cerebras
- Azalia Mirhoseini - “Learning Device Placement” - Google Brain
- Gregory Diamos - “Scaling is Predictable” - Baidu
- Simon Knowles - “Scalable Silicon Compute” - GraphCore
- Sam Smith - “Don’t Decay the Learning Rate, Increase the Batchsize” - Google Brain
- Shankar Krishnan - “Neumann Optimizer” - Google Brain
- Ujval Kapasi - “Practical Scaling Techniques” - NVIDIA
- Thorsten Kurth - “Scaling Deep Learning to 15 PetaFlops” - Lawrence Berkeley National Labs
- Chris Ying - “ImageNet is the New MNIST” - Google Brain
Important Dates
- 3 November 2017: Submissions due, 4:59pm PST
- 10 November 2017: Decisions announced
- 9 December 2017: Workshop
Accepted Posters
- Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes - Takuya Akiba, Shuji Suzuki, Keisuke Fukuda
- MPIML: A High-Performance Sparse Communication Layer for Machine Learning - Cedric Renggli, Dan Alistarh, Torsten Hoefler
- Deep Gradient Compression: Reducing Communication Bandwidth for Distributed Training - Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally
- Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization - Jiong Zhang, Qi Lei, Inderjit S. Dhillon
- Scalable Language Modeling: WikiText-103 on a Single GPU in 12 hours - Stephen Merity, Nitish Shirish Keskar, James Bradbury, Richard Socher
- Supercomputing for Deep Learning: Lessons Learned and Future Directions - Sreenivas R. Sukumar, Peter J. Mendygral, Jacob Balma, Jeffrey Dawson, Alessandro Rigazzi, Aaron Vose, Michael F. Ringenberg, Kristyn J. Maschhoff and Steve L. Scott
- Scaling GRPC Tensorflow on up to 512 nodes of Cori Supercomputer - Amrita Mathuriya, Thorsten Kurth, Vivek Rane, Mustafa Mustafa, Lei Shao, Debbie Bard, Prabhat, Victor W Lee
- Communication Analysis of Hybrid Model and Data Parallelism in Training Neural Networks - Amir Gholami1, Ariful Azad, Kurt Keutzer, Aydın Buluc
- Block-Sparse Recurrent Neural Networks - Sharan Narang, Eric Undersander, Greg Diamos
- Implementing a Sparse Prediction Machine on the Trinity Supercomputer - Boram Yoon, Pete Schultz, Garrett Kenyon
- Sparse Persistent RNNs - Feiwen Zhu, Jeff Pool, Michael Andersch, Jeremy Appleyard, Fung Xie
- Adaptive Memory Networks - Daniel Li, Asim Kadav
- ImageNet Training in Minutes - Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel, Kurt Keutzer
Registration Awards
We will offering NIPS workshop registrations to the four best papers submitted by students at academic instituions. Please indicate in your submission if you would like to apply for this award.
Feel free to reach us at supercomputersfordl2017@gmail.com
with any questions you might have.
Schedule
- 8:00 - 8:10 - Welcome
- 8:10 - 8:30 - Nitish Keskar - “Generalization Gap”
- 8:30 - 8:50 - Itay Hubara, Elad Hoffer - “Closing the Generalization Gap”
- 8:50 - 9:10 - Sam Smith - “Don’t Decay the Learning Rate, Increase the Batchsize”
- 9:10 - 9:30 - Priya Goyal - “ImageNet in 1 Hour”
-
9:30 - 9:50 - Chris Ying - “ImageNet is the New MNIST”
-
Coffee Break
- 10:10 - 10:30 - Matthew Johnson & Daniel Duckworth - “KFAC and Natural Gradients”
- 10:30 - 10:50 - Shankar Krishnan - “Neumann Optimizer”
-
10:50 - 11:10 - Tim Salimans - “Evolutionary Strategies”
- 11:15 - 12:00 - Panel on Future Hardware Directions
- Greg Diamos, Baidu Research, Moderator
- Jeff Dean, Google Senior Fellow, Panelist
- Michael James, Cerebras Chief Software Architect and Co-founder, Panelist
- Simon Knowles, Graphcore CTO and Co-founder, Panelist
- Scott Gray, OpenAI, Panelist
-
Lunch Break (and posters)
- 1:30 - 1:50 - Azalia Mirhoseini - “Learning Device Placement”
- 1:50 - 2:10 - Gregory Diamos - “Scaling and Sparsity”
- 2:10 - 2:30 - Scott Gray - “Small World Network Architectures”
-
2:30 - 2:50 - Timothy Lillicrap - “Scalable RL & AlphaGo”
-
Break
- 3:20 - 3:40 - Thorsten Kurth - “Scaling Deep Learning to 15 PetaFlops”
- 3:40 - 4:00 - Simon Knowles - “Scalable Silicon Compute”
- 4:00 - 4:20 - Ujval Kapasi - “Practical Scaling Techniques”
-
4:20 - 4:40 - Michael James - “Designing for Supercompute-Scale Deep Learning”
- 4:45 - 5:30 - Posters and Mingling
Organizers
- Erich Elsen, Google Brain
- Danijar Hafner, University College London
- Zak Stone, Google Brain
- Brennan Saeta, Google Brain
- Nicolas Le Roux, Google Brain