Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AI Library - An Open Source Machine Learning Framework

277 views

Published on

Prasanth Anbalagan, Senior Software Engineer (QE and Analysis) on the Artificial Intelligence Center of Excellence Team at Red Hat

Published in: Technology
  • Be the first to comment

  • Be the first to like this

AI Library - An Open Source Machine Learning Framework

  1. 1. AI Library An Open Source Machine Learning Framework Prasanth Anbalagan, AI Center of Excellence, Red Hat MLconf 2018 1
  2. 2. Machine Learning ● Challenges faced in adopting Machine Learning ○ Implementation ■ Data Science expertise required in implementing the models ○ Infrastructure ■ Choice of Infrastructure vs Deployment vs Management. ○ Accessibility ■ Ease of use. 2
  3. 3. AI Library ● AI-Library ○ an open source collection of AI components ■ machine learning algorithms ■ machine learning solutions to common use cases ■ part of Open Data Hub ● “machine learning-as-a-service” platform, built on top of OpenShift and Kubernetes. ○ allows rapid prototyping of ideas. 3
  4. 4. AI Components Association Rule Learning Correlation Analysis Duplicate Bug Detection Sentiment Analysis Flake Analysis Matrix Factorization 4
  5. 5. Understanding the Workflow Save Data Use Results OpenWhisk ML models Run Model python modules Container Application Platform aws 5
  6. 6. AI Library Association Rule Learning Correlation Analysis Duplicate Bug Detection Sentiment Analysis Flake Analysis Matrix Factorization Storage Actions Ansible Object Storage (S3 Compatible) OpenWhisk + OpenShift Deployment using playbooks AI Components 6
  7. 7. Saving Data RADOS Command Line Interface Open Data Hub Reports & Visuals ETL Model Training APIs Apache Kafka Big Data Storage Data Streams Open Data Hub aws 7
  8. 8. Models 8
  9. 9. Duplicate Bug Detection Product Percentage Red Hat OpenStack 12% Red Hat Enterprise Linux 13% Red Hat Ceph Storage 10% OpenShift Container Platform 10% ● Duplicate Bugs Statistics 9
  10. 10. Duplicate Bug Detection Topic Modeling Existing bugs 10
  11. 11. Duplicate Bug Detection Topic Modeling Similarity Measure score1 score2 score3 score n sort Top matches New bug 11
  12. 12. Duplicate Bug Detection Duplicate Bug Detection Existing bugs new bugs Recommendation on duplicates Software Bot 12
  13. 13. Flake Analysis What are flakes? ● Test fails, but software functions correctly and there is no bug. 13
  14. 14. Flake Analysis Test Logs Clustering Test Logs 14
  15. 15. Flake Analysis Classification Clusters of Test Logs New Test Log 15
  16. 16. Flake Analysis Clusters of Test Logs Probability of a test being flake in the chosen cluster Flake 16
  17. 17. Run a Model 17
  18. 18. Container Application Platform Project 1 OpenWhisk Jobs Workflow Read data Invoke action (training, prediction, poll etc) Save data or store results poll status submit jobs Project1 Project2 (ML models) 18
  19. 19. { "status": "failure", "log": Traceback (most recent call last): File "/build/cockpit/bots/../test/verify/check-networking-team", line 81, in testTeam b.wait_present("#network-interface-slaves tr[data-interface='%s']" % iface1) File "/build/cockpit/test/common/testlib.py", line 230, in wait_present return self.wait_js_func('ph_is_present', selector) raise Error(res['error']) Error: timeoutnnWrote TestNetworking-testTeam-rhel-7-4-127.0.0.2-2301-FAIL.png Wrote TestNetworking-testTeam-rhel-7-4-127.0.0.2-2301-FAIL.html "test": "testTeam (check_networking_team.TestNetworking)", "flake": true, } Sample Training Data - Flake Analysis flag whether a test failure was a false positive or not 19
  20. 20. curl -u <Auth> "https://openwhisk.openshift.com/api/v1/namespaces/_/actions/ai-library/ flake-analysis-training?" -X POST -H "Content-Type: application/json" -d '{ "name" : "flakes-training-10982" , "app_args" : "-s3Path=flake-analysis/datasets/training/records -s3Destination= flake-analysis/models/testflakes.model" }' Model Training - Flake Analysis 20
  21. 21. "key": "job_cpu", "value": "8000m" Model Training - Flake Analysis "key": "docker_image", "value": "docker.io/panbalag/ailibrary" "key": "job_memory", "value": "16000Mi" [#]$ wsk action get /whisk.system/ai-library/flake-analysis-training 8 cores 16 GB Apache Spark Tensorflow Scikit-learn Scipy Gensim ... 21
  22. 22. # ---------------------------------------------------------------------- # testNotRemovingDisks (check_storage_mdraid.TestStorage) # ---------------------------------------------------------------------- [0608/105412.651574:ERROR:gpu_process_transport_factory.cc(1019)] Lost UI shared context. DevTools listening on ws://127.0.0.1:9406/devtools/browser/e6259bcd-d4d3-4014-ac9e-1d7526cd2771 Traceback (most recent call last): File "/build/cockpit/bots/../test/verify/check-storage-mdraid", line 298, in testNotRemovingDisks b.wait_not_in_text('#detail-sidebar', "DISK3") File "/build/cockpit/test/common/testlib.py", line 261, in wait_js_cond self.raise_cdp_exception("timeoutnwait_js_cond", cond, result["exceptionDetails"], trailer) File "/build/cockpit/test/common/testlib.py", line 166, in raise_cdp_exception raise Error("%s(%s): %s" % (func, arg, msg)) Error: timeout Sample Prediction Data - Flake Analysis Logs from test failures 22
  23. 23. curl -u <Auth> "https://openwhisk.openshift.com/api/v1/namespaces/_/actions/ai-library/ flake-analysis-prediction?" -X POST -H "Content-Type: application/json" -d '{ "name" : "flakes-prediction-012" , "app_args" : "-model = flake-analysis/models/testflakes.model -s3Path = flake-analysis/prediction-data/failures/records -s3Destination = flake-analysis/predictions" Prediction - Flake Analysis 23
  24. 24. # ---------------------------------------------------------------------- # testNotRemovingDisks (check_storage_mdraid.TestStorage) # ---------------------------------------------------------------------- [0608/105412.651574:ERROR:gpu_process_transport_factory.cc(1019)] Lost UI shared context. .. … ….. ……. File "/build/cockpit/test/common/testlib.py", line 261, in wait_js_cond self.raise_cdp_exception("timeoutnwait_js_cond", cond, result["exceptionDetails"], trailer) File "/build/cockpit/test/common/testlib.py", line 166, in raise_cdp_exception raise Error("%s(%s): %s" % (func, arg, msg)) Error: timeout # Flake likely: 0.89 Sample Result - Flake Analysis 24
  25. 25. Demo Using OpenShift and Ceph Storage 25
  26. 26. 26
  27. 27. Conclusion ● Challenges in adopting Machine Learning ● AI-Library ● Open Data Hub 27
  28. 28. References AI-Library https://gitlab.com/opendatahub/ai-library Open Data Hub https://opendatahub.io/ 28
  29. 29. THANK YOU 29

×