Successfully reported this slideshow.
Your SlideShare is downloading. ×

MLCommons: Better ML for Everyone

Ad

MLCommons:
Better ML for
Everyone
David Kanter
Executive Director

Ad

MLCommons™ in 6 questions
1. What is MLCommons?
2. Why benchmarks?
3. Why datasets?
4. Why best practices?
5. What’s next?...

Ad

1. What is
MLCommons?

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 28 Ad
1 of 28 Ad

MLCommons: Better ML for Everyone

Download to read offline

MLCommons aims to accelerate machine learning to benefit everyone.

MLCommons will build a a common set of tools for ML practitioners including:

Benchmarks to measure progress: MLCommons will leverage MLPerf (built on DAWNbench) to measure speed, but also expand benchmarking other aspects of ML such as accuracy and algorithmic efficiency. ML models continue to increase in size and consequently cost. Sustaining growth in capability will require learning how to do more (accuracy) with less (efficiency).

Public datasets to fuel research: MLCommons new People’s Speech project seeks to develop a public dataset that, in addition to being larger than any other public speech dataset by more than an order of magnitude (86K hours labeled speech), better reflects diverse languages and accents. Public datasets drive machine learning like nothing else; consider ImageNet’s impact on the field of computer vision.

Best practices to accelerate development: MLCommons will make it easier to develop and deploy machine learning solutions by fostering consistent best practices. For instance, MLCommons’ MLCube project provides a common container interface for machine learning models to make them easier to share, experiment with (including benchmark), develop, and ultimately deploy.

MLCommons aims to accelerate machine learning to benefit everyone.

MLCommons will build a a common set of tools for ML practitioners including:

Benchmarks to measure progress: MLCommons will leverage MLPerf (built on DAWNbench) to measure speed, but also expand benchmarking other aspects of ML such as accuracy and algorithmic efficiency. ML models continue to increase in size and consequently cost. Sustaining growth in capability will require learning how to do more (accuracy) with less (efficiency).

Public datasets to fuel research: MLCommons new People’s Speech project seeks to develop a public dataset that, in addition to being larger than any other public speech dataset by more than an order of magnitude (86K hours labeled speech), better reflects diverse languages and accents. Public datasets drive machine learning like nothing else; consider ImageNet’s impact on the field of computer vision.

Best practices to accelerate development: MLCommons will make it easier to develop and deploy machine learning solutions by fostering consistent best practices. For instance, MLCommons’ MLCube project provides a common container interface for machine learning models to make them easier to share, experiment with (including benchmark), develop, and ultimately deploy.

Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

MLCommons: Better ML for Everyone

  1. 1. MLCommons: Better ML for Everyone David Kanter Executive Director
  2. 2. MLCommons™ in 6 questions 1. What is MLCommons? 2. Why benchmarks? 3. Why datasets? 4. Why best practices? 5. What’s next? 6. How can I get involved?
  3. 3. 1. What is MLCommons?
  4. 4. ● Information access ● Business productivity Machine learning (ML) could benefit everyone ● Health ● Safety Icon: Ætoms Photo: Ian Maddox Photo: Katrina.Tuliao
  5. 5. And has a huge potential market
  6. 6. But machine learning is a young industry.
  7. 7. Young industries need things to grow !
  8. 8. MLCommons is a new open engineering organization to create better ML for everyone Open engineering organizations AI/ML organizations MLCommons
  9. 9. MLCommons is supported by industry and academics Academics from educational institutions including: Harvard University Indiana University Polytechnique Montreal Stanford University University of California, Berkeley University of Toronto University of Tübingen University of York, United Kingdom Yonsei University
  10. 10. MLCommons is the work of many people... And many others contributing ideas and code...
  11. 11. MLCommons creates better ML through three pillars Benchmarks Datasets Best practices Better ML Research
  12. 12. 2. Why Benchmarks?
  13. 13. “What get measured, gets improved.” — Peter Drucker Benchmarking aligns research with development, engineering with marketing, and competitors across the industry in pursuit of the same clear objective. Benchmarks drive progress and transparency
  14. 14. MLCommons will host MLPerf™ Industry standard; drives progress and transparency MLPerf result press coverage (selected)
  15. 15. MLPerf progress Scale 2018 2019 2020 2021 Training - HPC Training Inference - Datacenter Inference - Edge Inference - Mobile Inference - Tiny (IoT) Increasing breadth Improving technical approach New training/inference benchmarks ● Recommendation: DLRM + 1TB dataset ● Medical imaging: U-NET ● Speech-to-text: RNN-T Standardized methodology for Training ● Optimizer definitions ● Hyperparameter definitions ● Convergence expectation (WIP) Adding power measurement to Inference Launched Mobile App (early alpha release)
  16. 16. 3. Why Datasets?
  17. 17. ML needs ImageNet++ for everything Imagenet: $300K → Modern ML ~80% of research papers by leading ML companies cite public datasets And ML innovations needs: ● Large ● CC license or similar ● Redistributable ● Diverse ● Continually improving But most public datasets are: ● Small ● Legally restricted ● Not redistributable ● Not diverse ● Static
  18. 18. MLCommons is starting with speech-to-text https://commons.wikimedia.org/wiki/File:List_of_language s_by_number_of_native_speakers.png English Voice interfaces will reach most of Earth’s 8 billion people by 2025 Need bigger datasets that support more diverse languages and accents { Earth’s population grouped by native language
  19. 19. People’s Speech: 10 years of speech, CC-BY Read text Conversation + noise Diverse languages/accents English 60+ Other languages Future w ork ● ~10 years of labeled speech (>10TB) ● CC-BY license (likely), redistributable ● Undergoing evaluation by MLCommons members ● Aiming for public release 1H2021 ● Living dataset
  20. 20. 4. Why Best Practices?
  21. 21. ML has too much friction Example: found an ML model you want to use? Interface (how do you even run it)? Software dependances? Dataset? Platform compatibility? All solved after a couple of days of hard work! And then it converges to 81.6% of claimed accuracy? Unsplash.com
  22. 22. MLCube™ is a shipping container for ML models Cargo ships Unsplash.com: / Shipping container: KMJ / Medicines: Ralf Roletschek / Electronics: DustyDingo Complex infrastructure Complex contents Simple interface = low friction
  23. 23. MLCube makes it easier to share models Basically, a docker with consistent command line and metadata (really an abstract interface for any container) Simple runners for: ● Local machine ● Multiple clouds ● Kubernetes Or incorporate into your own infrastructure Learn more at: https://github.com/mlcommons/mlcube
  24. 24. 5. What’s Next?
  25. 25. MLCommons Research Algorithmic Research Working Group ● Benchmarks for algorithms to improve efficiency: better accuracy/compute Medical Research Working Group ● Federated evaluation across distributed data: research ~= clinical practice Scientific Research Working Group ● Better datasets and software for science (Your idea here)
  26. 26. 6. How can I get involved?
  27. 27. We welcome people who want to make ML better. ● Join our mailing list ● Attend community events ● Become a member (free for academics) ● Participate in working groups ● Submit benchmark results Join us at mlcommons.org!
  28. 28. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

×