K-CAI NEURAL API is a Keras based neural network API for machine learning that will allow you to prototype with a lots of possibilities of Tensorflow! Python, Free Pascal and Delphi together in Google Colab, Git or the Community Edition.
MLflow 1.0 is coming soon as the first stable release of MLflow. It also packs many cleanups and improvements, such as simpler metadata management, search APIs and HDFS support. In this talk, we’ll present these new features in detail, and then discuss additional MLflow components that Databricks and other companies are working on for the rest of 2019. These new tools include a model registry to share and track models, as well as a multi-step workflow abstraction, both of which were announced at Spark + AI Summit 2019.
Title
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU
Video
https://youtu.be/vaB4IM6ySD0
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Reproduce Model Training with TFX Metadata Store and Pachyderm
12. Deploy the Model to Production with TensorFlow Serving and Istio
13. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
Related Links
1. PipelineAI Home: https://pipeline.ai
2. PipelineAI Community Edition: http://community.pipeline.ai
3. PipelineAI GitHub: https://github.com/PipelineAI/pipeline
4. Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup
5. YouTube Videos: https://youtube.pipeline.ai
6. SlideShare Presentations: https://slideshare.pipeline.ai
7. Slack Support: https://joinslack.pipeline.ai
8. Web Support and Knowledge Base: https://support.pipeline.ai
9. Email Support: support@pipeline.ai
Reuse, Reduce, Recycle in Serverless WorldDmitri Zimine
Slides for the talk at @ServerlessConf San Francisco 2018
Reuse is fundamental to any software development. Serverless development, however, still misses a coherent end-to-end resuability story. AWS Application Repository, Serverless Components from @goserverless, and LogicApps' Connectors are all the steps in the right direction. But we are still far away from npm/pip install developer's paradise. What is missing, and the what is path forward?
In this talk, I reflect on the current state of reusability in Serverless, share relevant learnings from establishing reusability in DevOps tools, and show a working code, a proof of concept for an open-source catalog of reusable Serverless functions. How exactly? We recycled StackStorm Exchange - a mature opensource action catalog - with a plugin to serverless framework. Come and see the details, and bring your ideas to discuss how we promote reusability in Serverless.
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Databricks
“In Spark 2.0, we have extended DataFrames and Datasets to handle real time streaming data. This not only provides a single programming abstraction for batch and streaming data, it also brings support for event-time based processing, out-or-order/delayed data, sessionization and tight integration with non-streaming data sources and sinks. In this talk, I will take a deep dive into the concepts and the API and show how this simplifies building complex “Continuous Applications”.” - T.D.
Databricks Blog: "Structured Streaming In Apache Spark 2.0: A new high-level API for streaming"
https://databricks.com/blog/2016/07/28/structured-streaming-in-apache-spark.html
// About the Presenter //
Tathagata Das is an Apache Spark Committer and a member of the PMC. He’s the lead developer behind Spark Streaming, and is currently employed at Databricks. Before Databricks, you could find him at the AMPLab of UC Berkeley, researching datacenter frameworks and networks with professors Scott Shenker and Ion Stoica.
Follow T.D. on -
Twitter: https://twitter.com/tathadas
LinkedIn: https://www.linkedin.com/in/tathadas
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward
This session will introduce a new open-source project - Flink TensorFlow - that enables Flink programs to operate on data using TensorFlow machine learning models. Applications include real-time image processing, NLP, and anomaly detection. The session will: - Introduce TensorFlow and describe its component model which allows for model reuse across environments - Demonstrate how to use TensorFlow models in Flink ML and Flink Streaming environments - Present a roadmap and provide opportunities to contribute
MLflow 1.0 is coming soon as the first stable release of MLflow. It also packs many cleanups and improvements, such as simpler metadata management, search APIs and HDFS support. In this talk, we’ll present these new features in detail, and then discuss additional MLflow components that Databricks and other companies are working on for the rest of 2019. These new tools include a model registry to share and track models, as well as a multi-step workflow abstraction, both of which were announced at Spark + AI Summit 2019.
Title
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU
Video
https://youtu.be/vaB4IM6ySD0
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Reproduce Model Training with TFX Metadata Store and Pachyderm
12. Deploy the Model to Production with TensorFlow Serving and Istio
13. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
Related Links
1. PipelineAI Home: https://pipeline.ai
2. PipelineAI Community Edition: http://community.pipeline.ai
3. PipelineAI GitHub: https://github.com/PipelineAI/pipeline
4. Advanced Spark and TensorFlow Meetup (SF-based, Global Reach): https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup
5. YouTube Videos: https://youtube.pipeline.ai
6. SlideShare Presentations: https://slideshare.pipeline.ai
7. Slack Support: https://joinslack.pipeline.ai
8. Web Support and Knowledge Base: https://support.pipeline.ai
9. Email Support: support@pipeline.ai
Reuse, Reduce, Recycle in Serverless WorldDmitri Zimine
Slides for the talk at @ServerlessConf San Francisco 2018
Reuse is fundamental to any software development. Serverless development, however, still misses a coherent end-to-end resuability story. AWS Application Repository, Serverless Components from @goserverless, and LogicApps' Connectors are all the steps in the right direction. But we are still far away from npm/pip install developer's paradise. What is missing, and the what is path forward?
In this talk, I reflect on the current state of reusability in Serverless, share relevant learnings from establishing reusability in DevOps tools, and show a working code, a proof of concept for an open-source catalog of reusable Serverless functions. How exactly? We recycled StackStorm Exchange - a mature opensource action catalog - with a plugin to serverless framework. Come and see the details, and bring your ideas to discuss how we promote reusability in Serverless.
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Databricks
“In Spark 2.0, we have extended DataFrames and Datasets to handle real time streaming data. This not only provides a single programming abstraction for batch and streaming data, it also brings support for event-time based processing, out-or-order/delayed data, sessionization and tight integration with non-streaming data sources and sinks. In this talk, I will take a deep dive into the concepts and the API and show how this simplifies building complex “Continuous Applications”.” - T.D.
Databricks Blog: "Structured Streaming In Apache Spark 2.0: A new high-level API for streaming"
https://databricks.com/blog/2016/07/28/structured-streaming-in-apache-spark.html
// About the Presenter //
Tathagata Das is an Apache Spark Committer and a member of the PMC. He’s the lead developer behind Spark Streaming, and is currently employed at Databricks. Before Databricks, you could find him at the AMPLab of UC Berkeley, researching datacenter frameworks and networks with professors Scott Shenker and Ion Stoica.
Follow T.D. on -
Twitter: https://twitter.com/tathadas
LinkedIn: https://www.linkedin.com/in/tathadas
Flink Forward SF 2017: Eron Wright - Introducing Flink TensorflowFlink Forward
This session will introduce a new open-source project - Flink TensorFlow - that enables Flink programs to operate on data using TensorFlow machine learning models. Applications include real-time image processing, NLP, and anomaly detection. The session will: - Introduce TensorFlow and describe its component model which allows for model reuse across environments - Demonstrate how to use TensorFlow models in Flink ML and Flink Streaming environments - Present a roadmap and provide opportunities to contribute
Video: https://youtu.be/T0L0JxDaPkc
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, Airflow, and MLflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning and data engineering.
MLflow is a lightweight experiment-tracking system recently open-sourced by Databricks, the creators of Apache Spark. MLflow supports Python, Java/Scala, and R - and offers native support for TensorFlow, Keras, and Scikit-Learn.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
The link will be sent a few hours before the start of the workshop.
Only registered users will receive the link.
If you do not receive the link a few hours before the start of the workshop, please send your Eventbrite registration confirmation to support@pipeline.ai for help.
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Run Multiple Experiments with MLflow Experiment Tracking
12. Reproduce Model Training with TFX Metadata Store
13. Deploy the Model to Production with TensorFlow Serving and Istio
14. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
https://youtu.be/T0L0JxDaPkc
Kubeflow at Spotify (For the Kubeflow Summit)Josh Baer
A lightning talk discussing some important challenges facing ML engineers and how the introduction of Kubeflow Pipelines will help.
Full slides w/ speaker notes here: https://docs.google.com/presentation/d/12dwhS_x4568G6XQjI9SEUacD-n4hFQczBcRBLdbHNEM/edit
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
Introduction to Apache Airflow (Incubating), best practices and roadmap. Airflow is a platform to programmatically author, schedule and monitor workflows.
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...Provectus
What's a machine learning workflow? What open source tools can you use to automate ML workflow?
Reproducible ML pipelines in research and production with monitoring insights from live inference clusters could enable and accelerate the delivery of AI solutions for enterprises. There is a growing ecosystem of tools that augment researchers and machine learning engineers in their day to day operations.
Still, there are big gaps in the machine learning workflow when it comes to training dataset versioning, training performance and metadata tracking, integration testing, inferencing quality monitoring, bias detection, concept drift detection and other aspects that prevent the adoption of AI in organizations of all sizes.
Fast and Reliable Apache Spark SQL EngineDatabricks
Building the next generation Spark SQL engine at speed poses new challenges to both automation and testing. At Databricks, we are implementing a new testing framework for assessing the quality and performance of new developments as they produced. Having more than 1,200 worldwide contributors, Apache Spark follows a rapid pace of development. At this scale, new testing tooling such as random query and data generation, fault injection, longevity stress, and scalability tests are essential to guarantee a reliable and performance Spark later in production. By applying such techniques, we will demonstrate the effectiveness of our testing infrastructure by drilling-down into cases where correctness and performance regressions have been found early. In addition, showing how they have been root-caused and fixed to prevent regressions in production and boosting the continuous delivery of new features.
ESUG 2014, Cambridge
Wed, August 20, 11:00am – 11:45am
Video:
Part1: https://www.youtube.com/watch?v=_Mv7SX-8Vlk
Part2: https://www.youtube.com/watch?v=qdZq2IZBm4k
Description
Abstract: In this talk we will present the advances and new features in Pharo 3.0. We will present the current work on Pharo 4.0 and beyond.
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Abstract: We will introduce RAPIDS, a suite of open source libraries for GPU-accelerated data science, and illustrate how it operates seamlessly with MLflow to enable reproducible training, model storage, and deployment. We will walk through a baseline example that incorporates MLflow locally, with a simple SQLite backend, and briefly introduce how the same workflow can be deployed in the context of GPU enabled Kubernetes clusters.
Using Terraform to manage the configuration of a Cisco ACI fabric.Joel W. King
Terraform is an open-source infrastructure as code software tool created by HashiCorp. It is written in GoLang. Cisco has developed an ACI terraform provider used to interact with the Cisco APIC. Network engineers define and provision the ACI infrastructure using a declarative configuration language known as HCL, HashiCorp Configuration Language.
This session will begin with a short presentation on Terraform and how it can be used to manage resources in an ACI fabric. There is a companion GitLab repository (https://gitlab.com/joelwking/terraform_aci) which will be used as a demo environment. Attendees can download Vagrant and VirtualBox to their laptop and execute the demonstration using the Cisco DevNet Always-on ACI sandbox.
Suneel Marthi - Deep Learning with Apache Flink and DL4JFlink Forward
http://flink-forward.org/kb_sessions/deep-learning-with-apache-flink-and-dl4j/
Deep Learning has become very popular over the last few years in areas such as Image Recognition, Fraud Detection, Machine Translation etc. Deep Learning has proved to be very useful in handling unstructured data and extracting value from them. A big challenge with having to build deep learning models was the high cost of training them. With the recent advent of distributed frameworks like Apache Flink, Apache Spark etc.. it’s faster to train Deep Learning models in parallel on modern platform architecture. In this talk, we’ll be showing how to use Apache Flink Streaming with the open source Deep Learning framework, DeepLearning4j to perform large scale deep learning model training. We will show a demo of a Recurrent Neural Net that is trained for language modeling and have it generate text.
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Databricks
Transformer-based pretrained language models such as BERT, XLNet, Roberta and Albert significantly advance the state-of-the-art of NLP and open doors for solving practical business problems with high performance transfer learning. However, operationalizing these models with production-quality continuous integration/ delivery (CI/CD) end-to-end pipelines that cover the full machine learning life cycle stages of train, test, deploy and serve while managing associated data and code repositories is still a challenging task.
Video: https://youtu.be/T0L0JxDaPkc
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, Airflow, and MLflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning and data engineering.
MLflow is a lightweight experiment-tracking system recently open-sourced by Databricks, the creators of Apache Spark. MLflow supports Python, Java/Scala, and R - and offers native support for TensorFlow, Keras, and Scikit-Learn.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
The link will be sent a few hours before the start of the workshop.
Only registered users will receive the link.
If you do not receive the link a few hours before the start of the workshop, please send your Eventbrite registration confirmation to support@pipeline.ai for help.
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Run Multiple Experiments with MLflow Experiment Tracking
12. Reproduce Model Training with TFX Metadata Store
13. Deploy the Model to Production with TensorFlow Serving and Istio
14. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
https://youtu.be/T0L0JxDaPkc
Kubeflow at Spotify (For the Kubeflow Summit)Josh Baer
A lightning talk discussing some important challenges facing ML engineers and how the introduction of Kubeflow Pipelines will help.
Full slides w/ speaker notes here: https://docs.google.com/presentation/d/12dwhS_x4568G6XQjI9SEUacD-n4hFQczBcRBLdbHNEM/edit
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
Introduction to Apache Airflow (Incubating), best practices and roadmap. Airflow is a platform to programmatically author, schedule and monitor workflows.
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...Provectus
What's a machine learning workflow? What open source tools can you use to automate ML workflow?
Reproducible ML pipelines in research and production with monitoring insights from live inference clusters could enable and accelerate the delivery of AI solutions for enterprises. There is a growing ecosystem of tools that augment researchers and machine learning engineers in their day to day operations.
Still, there are big gaps in the machine learning workflow when it comes to training dataset versioning, training performance and metadata tracking, integration testing, inferencing quality monitoring, bias detection, concept drift detection and other aspects that prevent the adoption of AI in organizations of all sizes.
Fast and Reliable Apache Spark SQL EngineDatabricks
Building the next generation Spark SQL engine at speed poses new challenges to both automation and testing. At Databricks, we are implementing a new testing framework for assessing the quality and performance of new developments as they produced. Having more than 1,200 worldwide contributors, Apache Spark follows a rapid pace of development. At this scale, new testing tooling such as random query and data generation, fault injection, longevity stress, and scalability tests are essential to guarantee a reliable and performance Spark later in production. By applying such techniques, we will demonstrate the effectiveness of our testing infrastructure by drilling-down into cases where correctness and performance regressions have been found early. In addition, showing how they have been root-caused and fixed to prevent regressions in production and boosting the continuous delivery of new features.
ESUG 2014, Cambridge
Wed, August 20, 11:00am – 11:45am
Video:
Part1: https://www.youtube.com/watch?v=_Mv7SX-8Vlk
Part2: https://www.youtube.com/watch?v=qdZq2IZBm4k
Description
Abstract: In this talk we will present the advances and new features in Pharo 3.0. We will present the current work on Pharo 4.0 and beyond.
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Abstract: We will introduce RAPIDS, a suite of open source libraries for GPU-accelerated data science, and illustrate how it operates seamlessly with MLflow to enable reproducible training, model storage, and deployment. We will walk through a baseline example that incorporates MLflow locally, with a simple SQLite backend, and briefly introduce how the same workflow can be deployed in the context of GPU enabled Kubernetes clusters.
Using Terraform to manage the configuration of a Cisco ACI fabric.Joel W. King
Terraform is an open-source infrastructure as code software tool created by HashiCorp. It is written in GoLang. Cisco has developed an ACI terraform provider used to interact with the Cisco APIC. Network engineers define and provision the ACI infrastructure using a declarative configuration language known as HCL, HashiCorp Configuration Language.
This session will begin with a short presentation on Terraform and how it can be used to manage resources in an ACI fabric. There is a companion GitLab repository (https://gitlab.com/joelwking/terraform_aci) which will be used as a demo environment. Attendees can download Vagrant and VirtualBox to their laptop and execute the demonstration using the Cisco DevNet Always-on ACI sandbox.
Suneel Marthi - Deep Learning with Apache Flink and DL4JFlink Forward
http://flink-forward.org/kb_sessions/deep-learning-with-apache-flink-and-dl4j/
Deep Learning has become very popular over the last few years in areas such as Image Recognition, Fraud Detection, Machine Translation etc. Deep Learning has proved to be very useful in handling unstructured data and extracting value from them. A big challenge with having to build deep learning models was the high cost of training them. With the recent advent of distributed frameworks like Apache Flink, Apache Spark etc.. it’s faster to train Deep Learning models in parallel on modern platform architecture. In this talk, we’ll be showing how to use Apache Flink Streaming with the open source Deep Learning framework, DeepLearning4j to perform large scale deep learning model training. We will show a demo of a Recurrent Neural Net that is trained for language modeling and have it generate text.
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...Databricks
Transformer-based pretrained language models such as BERT, XLNet, Roberta and Albert significantly advance the state-of-the-art of NLP and open doors for solving practical business problems with high performance transfer learning. However, operationalizing these models with production-quality continuous integration/ delivery (CI/CD) end-to-end pipelines that cover the full machine learning life cycle stages of train, test, deploy and serve while managing associated data and code repositories is still a challenging task.
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
Slides from the TensorFlow meetup hosted on October 9th at the ML6 offices in Ghent. Join our Meetup group for updates and future sessions: https://www.meetup.com/TensorFlow-Belgium/
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Animesh Singh
Kubeflow Pipelines and TensorFlow Extended (TFX) together is end-to-end platform for deploying production ML pipelines. It provides a configuration framework and shared libraries to integrate common components needed to define, launch, and monitor your machine learning system. In this talk we describe how how to run TFX in hybrid cloud environments.
One bite and all your dreams will come true: Analyzing and Attacking Apple Ke...Priyanka Aash
"Though many security mechanisms are deployed in Apple's macOS and iOS systems, some old-fashioned or poor-quality kernel code still leaves the door widely open to attackers. Especially, as kernel's critical components, device drivers are frequently exploited to attack Apple systems. In fact, bug hunting in Apple kernel drivers is not easy since they are mostly closed-source and heavily relying on object-oriented programming. In this talk, we will share our experience of analyzing and attacking Apple kernel drivers. In specific, we will introduce a new tool called Ryuk. Ryuk employs static analysis techniques to discover bugs by itself or assist manual review.
In addition, we further combine static analysis with dynamic fuzzing for bug hunting in Apple drivers. In specific, we will introduce how we integrate Ryuk to the state-of-art Apple driver fuzzer, PassiveFuzzFrameworkOSX, for finding exploitable bugs.
Most importantly, we will illustrate Ryuk's power with several new vulnerabilities that are recently discovered by Ryuk. In specific, we will show how we exploit these vulnerabilities for privilege escalation on macOS 10.13.3 and 10.13.2. We will not only explain why these bugs occur and how we find them, but also demonstrate how we exploit them with innovative kernel exploitation techniques."
Machine learning techniques are powerful, but building and deploying such models for production use require a lot of care and expertise.
A lot of books, articles, and best practices have been written and discussed on machine learning techniques and feature engineering, but putting those techniques into use on a production environment is usually forgotten and under- estimated , the aim of this talk is to shed some lights on current machine learning deployment practices, and go into details on how to deploy sustainable machine learning pipelines.
Building a Feature Store around Dataframes and Apache SparkDatabricks
A Feature Store enables machine learning (ML) features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems. Feature stores can also enable consistent engineering of features between training and inference, but to do so, they need a common data processing platform.
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks
A long time ago, there was Caffe and Theano, then came Torch and CNTK and Tensorflow, Keras and MXNet and Pytorch and Caffe2….a sea of Deep learning tools but none for Spark developers to dip into. Finally, there was BigDL, a deep learning library for Apache Spark. While BigDL is integrated into Spark and extends its capabilities to address the challenges of Big Data developers, will a library alone be enough to simplify and accelerate the deployment of ML/DL workloads on production clusters? From high level pipeline API support to feature transformers to pre-defined models and reference use cases, a rich repository of easy to use tools are now available with the ‘Analytics Zoo’. We’ll unpack the production challenges and opportunities with ML/DL on Spark and what the Zoo can do
April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark ClustersYahoo Developer Network
Deep learning is a critical capability for gaining intelligence from datasets. Many existing frameworks require a separated cluster for deep learning, and multiple programs have to be created for a typical machine learning pipeline. The separated clusters require large datasets to be transferred between clusters, and introduce unwanted system complexity and latency for end-to-end learning.
Yahoo introduced CaffeOnSpark to alleviate those pain points and bring deep learning onto Hadoop and Spark clusters. By combining salient features from deep learning framework Caffe and big-data framework Apache Spark, CaffeOnSpark enables distributed deep learning on a cluster of GPU and CPU servers. The framework is complementary to non-deep learning libraries MLlib and Spark SQL, and its data-frame style API provides Spark applications with an easy mechanism to invoke deep learning over distributed datasets. Its server-to-server direct communication (Ethernet or InfiniBand) achieves faster learning and eliminates scalability bottleneck.
Recently, we have released CaffeOnSpark at github.com/yahoo/CaffeOnSpark under Apache 2.0 License. In this talk, we will provide a technical overview of CaffeOnSpark, its API and deployment on a private cloud or public cloud (AWS EC2). A demo of IPython notebook will also be given to demonstrate how CaffeOnSpark will work with other Spark packages (ex. MLlib).
Speakers:
Andy Feng is a VP Architecture at Yahoo, leading the architecture and design of big data and machine learning initiatives. He has architected major platforms for personalization, ads serving, NoSQL, and cloud infrastructure.
Jun Shi is a Principal Engineer at Yahoo who specializes in machine learning platforms and large-scale machine learning algorithms. Prior to Yahoo, he was designing wireless communication chips at Broadcom, Qualcomm and Intel.
Mridul Jain is Senior Principal at Yahoo, focusing on machine learning and big data platforms (especially realtime processing). He has worked on trending algorithms for search, unstructured content extraction, realtime processing for central monitoring platform, and is the co-author of Pig on Storm.
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this session, we introduce MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size. In this deep-dive session, through a complete ML model life-cycle example, you will walk away with:
MLflow concepts and abstractions for models, experiments, and projects
How to get started with MLFlow
Understand aspects of MLflow APIs
Using tracking APIs during model training
Using MLflow UI to visually compare and contrast experimental runs with different tuning parameters and evaluate metrics
Package, save, and deploy an MLflow model
Serve it using MLflow REST API
What’s next and how to contribute
StackStrom: If-This-Than-That for Devops AutomationDmitri Zimine
Slides for my talk at Scale15x: https://www.socallinuxexpo.org/scale/15x/presentations/stackstorm-if-devops-automation
Devops automation, open-source,
Demo was at the core of the talk, the video is at https://youtu.be/3TjhBGshvvY?t=3h31m5s
Combining the Strenghts of Python and Delphi
Links replay and more
https://blogs.embarcadero.com/combining-the-strengths-of-delphi-and-python/
Python4Delphi repository
https://github.com/pyscripter/python4delphi
Part 1
https://blogs.embarcadero.com/webinar-replay-python-for-delphi-developers-part-1-introduction/
Kubernetes has become the defacto standard as a platform for container orchestration. Its ease of extending and many integrations has paved the way for a wide variety of data science and research tooling to be built on top of it.
From all encompassing tools like Kubeflow that make it easy for researchers to build end-to-end Machine Learning pipelines to specific orchestration of analytics engines such as Spark; Kubernetes has made the deployment and management of these things easy. This presentation will showcase some of the larger research tools in the ecosystem and go into how Kubernetes has enabled this easy form of application management.
In the last sessions we have seen that P4D (Python 4 Delphi) is powerful enough to offer components, Python packages or libraries in Delphi or Lazarus (FPC). This time we go the other way of usage and integration; how does the Python or web world in the shell benefit from the VCL components as GUI controls. We create a Python extension module from Delphi classes, packages or functions. Building Delphi’s VCL library as a specific Python module in a console or editor and launching a complete Windows GUI from a script can be the start of a long journey.
The flood of Open APIs is now so blatant that we take a closer look at some basics and principles. Of course, the best way to understand how APIs work is to try them. While most APIs require access via API keys or have complicated authentication and authorization methods, there are also open APIs with no requirements or licenses whatsoever. This is especially useful for beginners as we can start exploring different APIs right away. It’s also useful for web developers who want easy access to a sample dataset for their app; e.g. most weather apps get their weather forecast data from a weather API instead of building weather stations themselves.
Faker is a Python library that generates fake data. Fake data is often used for testing or filling databases with some dummy data. Faker is heavily inspired by PHP's Faker, Perl's Data::Faker, and by Ruby's Faker.
Many of the applications and organizations provide avatar features. Finally, synthetic datasets can minimize privacy concerns. Attempts to anonymize data can be ineffective, as even if sensitive/identifying variables are removed from the dataset
Python for Delphi (P4D) is a set of free components that wrap up the Python DLL into Delphi and Lazarus (FPC). They let you easily execute Python scripts, create new Python modules and new Python types. You can create Python extensions as DLLs and much more like scripting. P4D provides different levels of functionality: Low-level access to the python API High-level bi-directional interaction with Python Access to Python objects using Delphi custom variants (VarPyth.pas).
Python for Delphi (P4D) is a set of free components that wrap up the Python DLL into Delphi and Lazarus (FPC). They let you easily execute Python scripts, create new Python modules and new Python types. You can create Python extensions as DLLs and much more like scripting. P4D provides different levels of functionality:
Low-level access to the python API
High-level bi-directional interaction with Python
Access to Python objects using Delphi custom variants (VarPyth.pas)
Wrapping of Delphi objects for use in python scripts using RTTI (WrapDelphi.pas)
Creating python extension modules with Delphi classes and functions
Generate Scripts in maXbox from Python Installation
With the following report I show how to host and execute a deep learning project on a cloud. The cloud is hosted by google Colab and enables working and testing in teams. Lazarus and FreePascal is also being built in colab and the deep learning network is compiled and trained too in a Jupyter notebook with Python scripts.
The portable pixmap format(PPM), the portable graymap format(PGM) and portable bitmap format(PBM) are image file formats designed to be easily exchanged between platforms. They are also sometimes referred collectively as the portable anymap format(PNM). These formats are a convenient (simple) method of saving image data. And the format is not even limited to graphics, its definition allowing it to be used for arbitrary three-dimensional matrices or cubes of unsigned integers.
This tutor puts a trip to the kingdom of object recognition with computer vision knowledge and an image classifier.
Object detection has been witnessing a rapid revolutionary change in some fields of computer vision. Its involvement in the combination of object classification
as well as object recognition makes it one of the most challenging topics in the domain of machine learning & vision.
How can we visualize data in machine learning with VS Code? This is a C# wrapper for the GraphViz graph generator for dotnet core. Further bindings for Python GraphViz are shown and exports to MS Power BI all in MS Visual Code, Jupyter and dotnet core.
Software is changing the world. CGC is a Common Gateway Coding as the name says, it is a "common" language approach for almost everything. I want to show how a multi-language approach to infrastructure as code using general purpose programming languages lets cloud engineers and code producers unlocking the same software engineering techniques commonly used for applications.
Code Review Checklist: How far is a code review going? "Metrics measure the design of code after it has been written, a Review proofs it and Refactoring improves code."
In this paper a document structure is shown and tips for a code review.
Some checks fits with your existing tools and simply raises a hand when the quality or security of your codebase is impaired.
Open LDAP as A directory serviceis a system for storing and retrieving information in a tree-like structure with the following key properties:
Optimized for reading Distributed storage model Extensible data storage types Advanced search capabilities Consistent replication possibilities
They are a block of code plus the bindings to the environment they came from (RagusaIdiom).
Closures are reusable blocks of code that capture the environment and can be passed around as method arguments for immediate or deferred execution.
This tutor explains a solution to attach a console to your app. Basically we want an app to have two modes, a GUI mode and a non-GUI mode for any humans and robots. A NoGUI app provides a mechanism for storage and retrieval of data and functions in means other than the normal GUI used in operating systems.
Introduction to use machine learning in python and pascal to do such a thing like train prime numbers when there are algorithms in place to determine prime numbers. See a dataframe, feature extracting and a few plots to re-search for another hot experiment to predict prime numbers.
This tutor shows the train and test set split with binary classifying, clustering and 3D plots and discuss a probability density function in scikit-learn on synthetic datasets. The dataset is very simple as a reference of understanding.
This tutor shows the train and test set split with histogram and a probability density function in scikit-learn on synthetic datasets. The dataset is very simple as a reference of understanding.
In this article you will learn hot to use tensorflow Softmax Classifier estimator to classify MNIST dataset in one script.
This paper introduces also the basic idea of a artificial neural network.
The term “machine learning” is used to describe one kind of “artificial intelligence” (AI) where a machine is able to learn and adapt through its own experience. We crawled and collected 30 top overview diagrams which shows the topic of methods, algorithms and concepts.
TensorFlow is a Python-friendly open source library for numerical computation that makes machine learning faster and easier and ease the process of acquiring data, training models, serving predictions, and refining future results.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
6. 6
FANN
• Fast Artificial Neural Network (FANN) Library is a free open
source neural network library, which implements multilayer
artificial neural networks in C with support for both fully
connected and sparsely connected networks.
• Cross-platform execution in both fixed and floating point are
supported. It includes a framework for easy handling of
training data sets. It is easy to use, versatile, well
documented, and fast.
• Bindings to more than 15 programming languages are
available.
• https://github.com/libfann/fann
7. 7
Delphi Wrapper
• Fast Artificial Neural Network Library (FANN) v2.2.0
(Original: https://github.com/libfann/fann)
• Fast Artificial Neural Network Library (FANN) v2.2.0
(https://github.com/hatsunearu/fann with FANN_RELU
and FANN_LEAKY_RELU)
• TensorFlow 1.3.0 → Demo
•
https://github.com/Laex/Delphi-Artificial-Neural-Network-Library
•
8. 8
FANN Scripting
NN:= TFannNetwork.create(self)
with NN do begin
Layers.add('2')
Layers.add('3')
Layers.add('1')
LearningRate:= 0.699999988079071100
ConnectionRate:= 1.000
TrainingAlgorithm:= taFANN_TRAIN_RPROP
ActivationFunctionHidden:= afFANN_SIGMOID
ActivationFunctionOutput:= afFANN_SIGMOID
end;
C:maXboxEKON24examples814_FANN_XorSample2.pas
9. 9
CAI NEURAL API
• K-CAI NEURAL API is a Keras based neural
network API for machine learning that will
allow you to prototype with a lots of
possibilities of Tensorflow! Python, Free
Pascal and Delphi together in Google Colab,
Git or the Community Edition.
• https://github.com/joaopauloschuler/neural-api
10. 10
CAI NEURAL API II
• CAI NEURAL API is a pascal based neural
network API optimized for AVX, AVX2 and
AVX512 instruction sets plus OpenCL capable
devices including AMD, Intel and NVIDIA. This
API has been tested under Windows and Linux.
• This project is a subproject from a bigger and
older project called CAI and is sister to Keras
based K-CAI NEURAL API.
https://github.com/joaopauloschuler/neural-api
11. 11
Colab as Universal Platform
Simple Image Classification with any Dataset:
this example shows how to create a model and train it
with a dataset (samples and features) passed as
parameter. Open In Colab
https://colab.research.google.com/github/maxkleiner/maXbox/blob/master/Copy_of_simple_image_classification_with_any_dataset.ipynb
https://colab.research.google.com/github/maxkleiner/
maXbox/blob/master/Copy_of_simple_image_classificat
ion_with_any_dataset.ipynb
12. 12
CIFAR-10 Image Classifier
This example has interesting aspects to look at:
Its source code is very small.
Layers are added sequentially.
Training hyper-parameters are defined before calling fit method.
Model parameters are saved as hdf5 EKONSimpleImageClassifier.nn
https://github.com/maxkleiner/maXbox/blob/master/EKON24_SimpleImageClassificationCPU.ipynb
and the same in colab.research:
https://colab.research.google.com/github/maxkleiner/maXbox/blob/master/EKON24_SimpleImageClassificationCPU.ipynb
15. 15
Save the model
• Keras separates the concerns of saving your model
architecture and saving your model weights.
• Model weights are saved to HDF5 format. This is a grid
format that is ideal for storing multi-dimensional arrays of
numbers.
•
•Layer 11 Max Output: 0.812 Min Output: 0.000 TNNetSoftMax 10,1,1 Times: 0.00s 0.00s Parent:10
•Starting Testing.
•Epochs: 50 Examples seen:2000000 Test Accuracy: 0.8383 Test Error: 0.4463 Test Loss: 0.4969 Total
time: 162.32min
•Epoch time: 2.7 minutes. 100 epochs: 4.5 hours.
•Epochs: 50. Working time: 2.71 hours.
•Finished.
16. 16
Save files local
• import os
• import urllib.request
• import tarfile
•
• if not os.path.isfile('cifar-10-batches-bin/data_batch_1.bin'):
• print("Downloading CIFAR-10 Files")
• url = 'https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz'
• urllib.request.urlretrieve(url, './file.tar')
•
• tar = tarfile.open("file.tar")
• tar.extractall()
• tar.close()
17. 17
Finally you can get with git
• C:maXboxmX399100maxbox4maxbox42maxbox4>git clone
https://github.com/joaopau
• loschuler/k-neural-api.git k
• Cloning into 'k'...
• remote: Enumerating objects: 65, done.
• remote: Counting objects: 100% (65/65), done.
• remote: Compressing objects: 100% (43/43), done.
• remote: Total 356 (delta 38), reused 38 (delta 18), pack-reused 291
• Receiving objects: 100% (356/356), 224.47 KiB | 1.57 MiB/s, done.
• Resolving deltas: 100% (225/225), done.
http://docs.codehaus.org/display/SONAR/Developers%27+Seven+Deadly+Sins