Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
MLCommons:
Better ML for
Everyone
David Kanter
Executive Director
MLCommons™ in 6 questions
1. What is MLCommons?
2. Why benchmarks?
3. Why datasets?
4. Why best practices?
5. What’s next?...
1. What is
MLCommons?
● Information access
● Business productivity
Machine learning (ML) could benefit everyone
● Health
● Safety
Icon: Ætoms
Ph...
And has a huge potential market
But machine learning is a young industry.
Young industries need things to grow
!
MLCommons is a new open engineering
organization to create better ML for everyone
Open engineering
organizations
AI/ML
org...
MLCommons is supported by industry and
academics
Academics from educational
institutions including:
Harvard University
Ind...
MLCommons is the work of many people...
And many others contributing
ideas and code...
MLCommons creates better ML through three
pillars
Benchmarks
Datasets
Best
practices
Better ML
Research
2. Why
Benchmarks?
“What get measured, gets improved.” — Peter Drucker
Benchmarking aligns research with development,
engineering with market...
MLCommons will host MLPerf™
Industry standard; drives progress and transparency
MLPerf result press coverage (selected)
MLPerf progress
Scale 2018 2019 2020 2021
Training - HPC
Training
Inference - Datacenter
Inference - Edge
Inference - Mobi...
3. Why
Datasets?
ML needs ImageNet++ for everything
Imagenet: $300K → Modern ML
~80% of research papers by leading ML companies cite public...
MLCommons is starting with speech-to-text
https://commons.wikimedia.org/wiki/File:List_of_language
s_by_number_of_native_s...
People’s Speech: 10 years of speech, CC-BY
Read text Conversation +
noise
Diverse
languages/accents
English
60+ Other
lang...
4. Why Best
Practices?
ML has too much friction
Example: found an ML model you want to use?
Interface (how do you even run it)?
Software dependan...
MLCube™ is a shipping container for ML models
Cargo ships Unsplash.com: / Shipping container: KMJ / Medicines: Ralf Rolets...
MLCube makes it easier to share models
Basically, a docker with consistent command line and metadata
(really an abstract i...
5. What’s
Next?
MLCommons Research
Algorithmic Research Working Group
● Benchmarks for algorithms to improve efficiency: better accuracy/c...
6. How can I
get involved?
We welcome people who want to make ML
better.
● Join our mailing list
● Attend community events
● Become a member (free fo...
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

MLCommons: Better ML for Everyone

Download to read offline

MLCommons aims to accelerate machine learning to benefit everyone.

MLCommons will build a a common set of tools for ML practitioners including:

Benchmarks to measure progress: MLCommons will leverage MLPerf (built on DAWNbench) to measure speed, but also expand benchmarking other aspects of ML such as accuracy and algorithmic efficiency. ML models continue to increase in size and consequently cost. Sustaining growth in capability will require learning how to do more (accuracy) with less (efficiency).

Public datasets to fuel research: MLCommons new People’s Speech project seeks to develop a public dataset that, in addition to being larger than any other public speech dataset by more than an order of magnitude (86K hours labeled speech), better reflects diverse languages and accents. Public datasets drive machine learning like nothing else; consider ImageNet’s impact on the field of computer vision.

Best practices to accelerate development: MLCommons will make it easier to develop and deploy machine learning solutions by fostering consistent best practices. For instance, MLCommons’ MLCube project provides a common container interface for machine learning models to make them easier to share, experiment with (including benchmark), develop, and ultimately deploy.

  • Be the first to like this

MLCommons: Better ML for Everyone

  1. 1. MLCommons: Better ML for Everyone David Kanter Executive Director
  2. 2. MLCommons™ in 6 questions 1. What is MLCommons? 2. Why benchmarks? 3. Why datasets? 4. Why best practices? 5. What’s next? 6. How can I get involved?
  3. 3. 1. What is MLCommons?
  4. 4. ● Information access ● Business productivity Machine learning (ML) could benefit everyone ● Health ● Safety Icon: Ætoms Photo: Ian Maddox Photo: Katrina.Tuliao
  5. 5. And has a huge potential market
  6. 6. But machine learning is a young industry.
  7. 7. Young industries need things to grow !
  8. 8. MLCommons is a new open engineering organization to create better ML for everyone Open engineering organizations AI/ML organizations MLCommons
  9. 9. MLCommons is supported by industry and academics Academics from educational institutions including: Harvard University Indiana University Polytechnique Montreal Stanford University University of California, Berkeley University of Toronto University of Tübingen University of York, United Kingdom Yonsei University
  10. 10. MLCommons is the work of many people... And many others contributing ideas and code...
  11. 11. MLCommons creates better ML through three pillars Benchmarks Datasets Best practices Better ML Research
  12. 12. 2. Why Benchmarks?
  13. 13. “What get measured, gets improved.” — Peter Drucker Benchmarking aligns research with development, engineering with marketing, and competitors across the industry in pursuit of the same clear objective. Benchmarks drive progress and transparency
  14. 14. MLCommons will host MLPerf™ Industry standard; drives progress and transparency MLPerf result press coverage (selected)
  15. 15. MLPerf progress Scale 2018 2019 2020 2021 Training - HPC Training Inference - Datacenter Inference - Edge Inference - Mobile Inference - Tiny (IoT) Increasing breadth Improving technical approach New training/inference benchmarks ● Recommendation: DLRM + 1TB dataset ● Medical imaging: U-NET ● Speech-to-text: RNN-T Standardized methodology for Training ● Optimizer definitions ● Hyperparameter definitions ● Convergence expectation (WIP) Adding power measurement to Inference Launched Mobile App (early alpha release)
  16. 16. 3. Why Datasets?
  17. 17. ML needs ImageNet++ for everything Imagenet: $300K → Modern ML ~80% of research papers by leading ML companies cite public datasets And ML innovations needs: ● Large ● CC license or similar ● Redistributable ● Diverse ● Continually improving But most public datasets are: ● Small ● Legally restricted ● Not redistributable ● Not diverse ● Static
  18. 18. MLCommons is starting with speech-to-text https://commons.wikimedia.org/wiki/File:List_of_language s_by_number_of_native_speakers.png English Voice interfaces will reach most of Earth’s 8 billion people by 2025 Need bigger datasets that support more diverse languages and accents { Earth’s population grouped by native language
  19. 19. People’s Speech: 10 years of speech, CC-BY Read text Conversation + noise Diverse languages/accents English 60+ Other languages Future w ork ● ~10 years of labeled speech (>10TB) ● CC-BY license (likely), redistributable ● Undergoing evaluation by MLCommons members ● Aiming for public release 1H2021 ● Living dataset
  20. 20. 4. Why Best Practices?
  21. 21. ML has too much friction Example: found an ML model you want to use? Interface (how do you even run it)? Software dependances? Dataset? Platform compatibility? All solved after a couple of days of hard work! And then it converges to 81.6% of claimed accuracy? Unsplash.com
  22. 22. MLCube™ is a shipping container for ML models Cargo ships Unsplash.com: / Shipping container: KMJ / Medicines: Ralf Roletschek / Electronics: DustyDingo Complex infrastructure Complex contents Simple interface = low friction
  23. 23. MLCube makes it easier to share models Basically, a docker with consistent command line and metadata (really an abstract interface for any container) Simple runners for: ● Local machine ● Multiple clouds ● Kubernetes Or incorporate into your own infrastructure Learn more at: https://github.com/mlcommons/mlcube
  24. 24. 5. What’s Next?
  25. 25. MLCommons Research Algorithmic Research Working Group ● Benchmarks for algorithms to improve efficiency: better accuracy/compute Medical Research Working Group ● Federated evaluation across distributed data: research ~= clinical practice Scientific Research Working Group ● Better datasets and software for science (Your idea here)
  26. 26. 6. How can I get involved?
  27. 27. We welcome people who want to make ML better. ● Join our mailing list ● Attend community events ● Become a member (free for academics) ● Participate in working groups ● Submit benchmark results Join us at mlcommons.org!
  28. 28. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

MLCommons aims to accelerate machine learning to benefit everyone. MLCommons will build a a common set of tools for ML practitioners including: Benchmarks to measure progress: MLCommons will leverage MLPerf (built on DAWNbench) to measure speed, but also expand benchmarking other aspects of ML such as accuracy and algorithmic efficiency. ML models continue to increase in size and consequently cost. Sustaining growth in capability will require learning how to do more (accuracy) with less (efficiency). Public datasets to fuel research: MLCommons new People’s Speech project seeks to develop a public dataset that, in addition to being larger than any other public speech dataset by more than an order of magnitude (86K hours labeled speech), better reflects diverse languages and accents. Public datasets drive machine learning like nothing else; consider ImageNet’s impact on the field of computer vision. Best practices to accelerate development: MLCommons will make it easier to develop and deploy machine learning solutions by fostering consistent best practices. For instance, MLCommons’ MLCube project provides a common container interface for machine learning models to make them easier to share, experiment with (including benchmark), develop, and ultimately deploy.

Views

Total views

71

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

7

Shares

0

Comments

0

Likes

0

×