Machine Learning Systems at Scale:
OpenAI is a non-profit research company, discovering and enacting the path to safe artificial general intelligence. As part of our work, we regularly push the limits of scalability in cutting-edge ML algorithms. We’ve found that in many cases, designing the systems we build around the core algorithms is as important as designing the algorithms themselves. This means that many systems engineering areas, such as distributed computing, networking, and orchestration, are crucial for machine learning to succeed on large problems requiring thousands of computers. As a result, at OpenAI engineers and researchers work closely together to build these large systems as opposed to a strict researcher/engineer split. In this talk, we will go over some of the lessons we’ve learned, and how they come together in the design and internals of our system for learning-based robotics research.
Bio: Jonas leads technology development for OpenAI’s robotics group, developing methods to apply machine learning and AI to robots. He also helped build the infrastructure to scale OpenAI’s distributed ML systems to thousands of machines.