TensorFlow is a Python-friendly open source library for numerical computation that makes machine learning faster and easier and ease the process of acquiring data, training models, serving predictions, and refining future results.
In this session Max Kleiner shows four groups of the ML: Regression, Dimension Reduction, Clustering and Classification. ML recognizes patterns and laws in the learning data. Most ML projects allegedly fail due to lack of data consolidation and due to lack of hypothesis. On the basis of the well-known IRIS dataset the 4 groups with 4 algorithms each are gone through and this lack is avoided.
This document discusses machine learning pipelines. It explains that ML pipelines involve multiple steps like data preprocessing, feature extraction, training models with different hyperparameters, and testing models. The goal is to build the best model by evaluating many variations systematically. Key steps are often cross-validation to evaluate models and hyperparameter tuning to find the best configuration. Well-designed ML pipelines can help improve model performance and make the process more efficient and reproducible.
Dbms plan - A swiss army knife for performance engineersRiyaj Shamsudeen
This document discusses dbms_xplan, a tool for performance engineers to analyze execution plans. It provides options for displaying plans from the plan table, shared SQL area in memory, and AWR history. Dbms_xplan provides more detailed information than traditional tools like tkprof, including predicates, notes, bind values, and plan history. It requires privileges to access dictionary views for displaying plans from memory and AWR. The document also demonstrates usage examples and output formats for dbms_xplan.analyze.
This document discusses why SQL optimizers sometimes produce suboptimal query plans. It begins by introducing concepts like selectivity, cardinality, and histograms which are important for query optimization. It then describes issues like correlation between predicates that can cause underestimation of cardinality. The document recommends gathering statistics on columns and indexes, using histograms, and explains how Oracle 11g's extended statistics feature can help address correlation problems.
The document discusses administering parallel execution in Oracle databases. It describes how parallel query uses slave processes to perform work across instances, and how the placement of slaves can be controlled using services or parallel instance groups. It provides an example execution plan showing how slaves perform different tasks like scanning and sorting. It also covers best practices, new features in Oracle 11g like parallel statement queueing, and how parallel DML works.
TensorFlow is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.
Performance and how to measure it - ProgSCon London 2016Matt Warren
Starting with the premise that "Performance is a Feature", this session will look at how to measure, what to measure and how get the best performance from your .NET code.
We will look at real-world examples from the Roslyn code-base and StackOverflow (the product), including how the .NET Garbage Collector needs to be tamed!
In this session Max Kleiner shows four groups of the ML: Regression, Dimension Reduction, Clustering and Classification. ML recognizes patterns and laws in the learning data. Most ML projects allegedly fail due to lack of data consolidation and due to lack of hypothesis. On the basis of the well-known IRIS dataset the 4 groups with 4 algorithms each are gone through and this lack is avoided.
This document discusses machine learning pipelines. It explains that ML pipelines involve multiple steps like data preprocessing, feature extraction, training models with different hyperparameters, and testing models. The goal is to build the best model by evaluating many variations systematically. Key steps are often cross-validation to evaluate models and hyperparameter tuning to find the best configuration. Well-designed ML pipelines can help improve model performance and make the process more efficient and reproducible.
Dbms plan - A swiss army knife for performance engineersRiyaj Shamsudeen
This document discusses dbms_xplan, a tool for performance engineers to analyze execution plans. It provides options for displaying plans from the plan table, shared SQL area in memory, and AWR history. Dbms_xplan provides more detailed information than traditional tools like tkprof, including predicates, notes, bind values, and plan history. It requires privileges to access dictionary views for displaying plans from memory and AWR. The document also demonstrates usage examples and output formats for dbms_xplan.analyze.
This document discusses why SQL optimizers sometimes produce suboptimal query plans. It begins by introducing concepts like selectivity, cardinality, and histograms which are important for query optimization. It then describes issues like correlation between predicates that can cause underestimation of cardinality. The document recommends gathering statistics on columns and indexes, using histograms, and explains how Oracle 11g's extended statistics feature can help address correlation problems.
The document discusses administering parallel execution in Oracle databases. It describes how parallel query uses slave processes to perform work across instances, and how the placement of slaves can be controlled using services or parallel instance groups. It provides an example execution plan showing how slaves perform different tasks like scanning and sorting. It also covers best practices, new features in Oracle 11g like parallel statement queueing, and how parallel DML works.
TensorFlow is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.
Performance and how to measure it - ProgSCon London 2016Matt Warren
Starting with the premise that "Performance is a Feature", this session will look at how to measure, what to measure and how get the best performance from your .NET code.
We will look at real-world examples from the Roslyn code-base and StackOverflow (the product), including how the .NET Garbage Collector needs to be tamed!
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
This document discusses challenges with running containers at scale and how artificial intelligence for IT operations (AIOps) can help address those challenges. It defines AIOps and outlines how it utilizes techniques like machine learning and analytics to provide proactive, personalized insights for infrastructure and application monitoring. Specific challenges covered include reactive monitoring of dynamic container environments, metrics explosions, and performing proactive tasks like capacity planning, cluster scheduling, and dynamic configuration optimization. The document provides examples of how AIOps has helped companies optimize infrastructure usage through techniques like exhaustive testing of hardware/software combinations, live traffic load testing, bottleneck identification, batch scheduling, and controlled resource oversubscription while maintaining service level objectives.
This document summarizes research conducted by Preferred Networks to train a ResNet-50 model on the ImageNet dataset using a minibatch size of 32,000 across 1024 Tesla P100 GPUs. They were able to complete 90 training epochs in 15 minutes, achieving a validation accuracy of 74.9%. To enable training with such a large minibatch, they employed techniques like RMSprop warmup, slow start learning rates, and batch normalization without moving averages. The training was conducted on Preferred Networks' in-house cluster MN-1 which has 128 nodes each with 8 GPUs connected via InfiniBand FDR interconnect.
Learning Predictive Modeling with TSA and KaggleYvonne K. Matos
This document summarizes Yvonne Matos' presentation on learning predictive modeling by participating in Kaggle challenges using TSA passenger screening data.
The key points are:
1) Matos started with a small subset of 120 images from one body zone to build initial neural network models and address challenges of large data sizes and compute requirements.
2) Through iterative tuning, her best model achieved good performance identifying non-threat images but had a high false negative rate for threats.
3) Her next steps were to reduce the false negative rate, run models on Google Cloud to handle full data sizes, and prepare the best model for real-world use.
The document describes an approach called Snorkel that can generate training data for machine learning models from unlabeled text documents without requiring manual labeling. It works by encoding domain knowledge into labeling functions or rules and using those rules to assign weak labels to candidate examples. These weak labels are then used to train an underlying machine learning model like logistic regression. The approach is presented as an alternative to manual labeling that scales more easily. Key steps include writing rules, validating rules, running learning algorithms on the weakly labeled data, and iterating to improve the rules. Examples of using Snorkel for relationship extraction tasks are also provided.
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Codemotion
We are going to talk about Prometheus and how to use to monitor micro-services "Cloud-Native" application s. We are going to dive deep into the Prometheus monitoring model, we will see what are the components be hind this system and how they integrate with each others to provide an efficient and modern monitoring sy stem. We will also have a glance on Prometheus native integrations for cloud-native environments such as Kubernetes.
Query Optimization with MySQL 8.0 and MariaDB 10.3: The BasicsJaime Crespo
Query optimization tutorial for Beginners using MySQL 8.0 and MariaDB 10.3 presented at the Open Source Database Percona Live Europe 2018 organized in Frankfurt. The source can be found and errors can be reported at https://github.com/jynus/query-optimization
Material URL moved to: http://jynus.com/dbahire/pleu18
Starting with the premise that "Performance is a Feature", Matt Warren will show you how to measure, what to measure and how to get the best performance from your .NET code.
We will look at real-world examples from the Roslyn code-base and StackOverflow (the product), including how the .NET Garbage Collector needs to be tamed!
The presentation covers:
Why we should care about performance
Pitfalls to avoid when measuring performance
How the .NET Garbage Collector can hurt performance
Real-world performance lessons from open-source code
The webinar recording can be found here: http://www.postsharp.net/blog/post/webinar-recording-performance-is-a-feature
Nyc open-data-2015-andvanced-sklearn-expandedVivian S. Zhang
Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners.
This talk will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, model evaluation, parameter search, and out-of-core learning.
Apart from metrics for model evaluation, we will cover how to evaluate model complexity, and how to tune parameters with grid search, randomized parameter search, and what their trade-offs are. We will also cover out of core text feature processing via feature hashing.
---------------------------------------------------------
Andreas is an Assistant Research Scientist at the NYU Center for Data Science, building a group to work on open source software for data science. Previously he worked as a Machine Learning Scientist at Amazon, working on computer vision and forecasting problems. He is one of the core developers of the scikit-learn machine learning library, and maintained it for several years.
Material will be posted here:
https://github.com/amueller/pydata-nyc-advanced-sklearn
Blog:
peekaboo-vision.blogspot.com
Twitter:
https://twitter.com/t3kcit
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...Databricks
We all know what they say – the bigger the data, the better. But when the data gets really big, how do you mine it and what deep learning framework to use? This talk will survey, with a developer’s perspective, three of the most popular deep learning frameworks—TensorFlow, Keras, and PyTorch—as well as when to use their distributed implementations.
We’ll compare code samples from each framework and discuss their integration with distributed computing engines such as Apache Spark (which can handle massive amounts of data) as well as help you answer questions such as:
As a developer how do I pick the right deep learning framework?
Do I want to develop my own model or should I employ an existing one?
How do I strike a trade-off between productivity and control through low-level APIs?
What language should I choose?
In this session, we will explore how to build a deep learning application with Tensorflow, Keras, or PyTorch in under 30 minutes. After this session, you will walk away with the confidence to evaluate which framework is best for you.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Performance is a feature! - London .NET User GroupMatt Warren
Starting with the premise that "Performance is a Feature", this session will look at how to measure, what to measure and how get the best performance from your .NET code.
We will look at real-world examples from the Roslyn code-base and StackOverflow (the product), including how the .NET Garbage Collector needs to be tamed!
Towards a Unified Data Analytics Optimizer with Yanlei DiaoDatabricks
Today’s big data analytics systems are best effort only: despite the wide adoption, they still lack the ability to take user monetary constraints and performance goals, and automatically configure an analytic job to achieve those goals. Our work aims to take a step further towards building a new data analytics optimizer that works for arbitrary dataflow programs and determines the job configuration in an automated manner based on user objectives regarding latency, throughput, monetary cost, etc.
At the core of the optimizer are a principled multi-objective optimization framework that enables one to explore the tradeoffs between different objectives, and a deep learning-based modeling approach that can learn a model for each user objective as complex as necessary for the user computing environment. Using both SQL-like and machine learning jobs in Spark, we show that our techniques can learn a model of each objective with high accuracy, and the multi-objective optimizer can automatically recommend new configurations that significantly improve performance from the configurations manually set by engineers.
Summarizing Software API Usage Examples Using Clustering TechniquesNikos Katirtzis
This document presents CLAMS, an approach for automatically mining API usage examples from client code. It clusters API usage sequences, generates summarized snippets from the top clusters, and selects the most representative snippet from each cluster. The methodology can be easily adapted to new programming languages as it relies on abstract syntax trees and API call sequences rather than detailed semantic analysis. The system is evaluated on datasets from several Java libraries and is shown to produce more concise, readable snippets that better match handwritten examples compared to approaches that output API call sequences or less summarized snippets. The clustering approach that allows similar rather than just identical sequences leads to improved results.
Java In-Process Caching - Performance, Progress and PitfallsJens Wilke
How to speed up in-process caching and why you should not use LRU. Comparison of EHCache, Google Guava, Caffeine and cache2k. With benchmarks of throughput and eviction efficiency.
Java In-Process Caching - Performance, Progress and Pittfallscruftex
This document discusses Java in-process caching and summarizes benchmarks of various caching libraries. It finds that Caffeine and cache2k have faster read throughput than Google Guava Cache and EHCache3 when the number of threads increases. Cache2k is the fastest overall. Benchmarking eviction quality shows Caffeine and cache2k have more efficient eviction algorithms than LRU. While Clock is O(n) in theory, cache2k optimizes it to have little increase in scan counts even for large caches. Modern caching libraries use improved algorithms over LRU to achieve better performance.
Caching and tuning fun for high scalabilityWim Godden
Caching has been a 'hot' topic for a few years. But caching takes more than merely taking data and putting it in a cache : the right caching techniques can improve performance and reduce load significantly. But we'll also look at some major pitfalls, showing that caching the wrong way can bring down your site. If you're looking for a clear explanation about various caching techniques and tools like Memcached, Nginx and Varnish, as well as ways to deploy them in an efficient way, this talk is for you.
This document provides an overview of machine learning concepts and code examples in Python. It discusses the typical 5 steps of machine learning projects: collaboration, data collection, clustering, classification, and conclusion. Code snippets demonstrate each step, including collecting data with Scrapy, clustering with k-means, classification with support vector machines, and evaluating results with a confusion matrix. Dimensionality reduction techniques like principal component analysis are also covered.
Starting with the premise that "Performance is a Feature", this session will look at how to measure, what to measure and how get the best performance from your .NET code.
We will look at real-world examples from the Roslyn code-base and StackOverflow (the product), including how the .NET Garbage Collector needs to be tamed!
Shifu (www.shifu.ml) is a fast and scalable machine learning platform. This presentation briefly describes how to convert the Logistic Regression and Neural Network model in Encog, Mahout, and Spark.
In the last sessions we have seen that P4D (Python 4 Delphi) is powerful enough to offer components, Python packages or libraries in Delphi or Lazarus (FPC). This time we go the other way of usage and integration; how does the Python or web world in the shell benefit from the VCL components as GUI controls. We create a Python extension module from Delphi classes, packages or functions. Building Delphi’s VCL library as a specific Python module in a console or editor and launching a complete Windows GUI from a script can be the start of a long journey.
The flood of Open APIs is now so blatant that we take a closer look at some basics and principles. Of course, the best way to understand how APIs work is to try them. While most APIs require access via API keys or have complicated authentication and authorization methods, there are also open APIs with no requirements or licenses whatsoever. This is especially useful for beginners as we can start exploring different APIs right away. It’s also useful for web developers who want easy access to a sample dataset for their app; e.g. most weather apps get their weather forecast data from a weather API instead of building weather stations themselves.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
This document discusses challenges with running containers at scale and how artificial intelligence for IT operations (AIOps) can help address those challenges. It defines AIOps and outlines how it utilizes techniques like machine learning and analytics to provide proactive, personalized insights for infrastructure and application monitoring. Specific challenges covered include reactive monitoring of dynamic container environments, metrics explosions, and performing proactive tasks like capacity planning, cluster scheduling, and dynamic configuration optimization. The document provides examples of how AIOps has helped companies optimize infrastructure usage through techniques like exhaustive testing of hardware/software combinations, live traffic load testing, bottleneck identification, batch scheduling, and controlled resource oversubscription while maintaining service level objectives.
This document summarizes research conducted by Preferred Networks to train a ResNet-50 model on the ImageNet dataset using a minibatch size of 32,000 across 1024 Tesla P100 GPUs. They were able to complete 90 training epochs in 15 minutes, achieving a validation accuracy of 74.9%. To enable training with such a large minibatch, they employed techniques like RMSprop warmup, slow start learning rates, and batch normalization without moving averages. The training was conducted on Preferred Networks' in-house cluster MN-1 which has 128 nodes each with 8 GPUs connected via InfiniBand FDR interconnect.
Learning Predictive Modeling with TSA and KaggleYvonne K. Matos
This document summarizes Yvonne Matos' presentation on learning predictive modeling by participating in Kaggle challenges using TSA passenger screening data.
The key points are:
1) Matos started with a small subset of 120 images from one body zone to build initial neural network models and address challenges of large data sizes and compute requirements.
2) Through iterative tuning, her best model achieved good performance identifying non-threat images but had a high false negative rate for threats.
3) Her next steps were to reduce the false negative rate, run models on Google Cloud to handle full data sizes, and prepare the best model for real-world use.
The document describes an approach called Snorkel that can generate training data for machine learning models from unlabeled text documents without requiring manual labeling. It works by encoding domain knowledge into labeling functions or rules and using those rules to assign weak labels to candidate examples. These weak labels are then used to train an underlying machine learning model like logistic regression. The approach is presented as an alternative to manual labeling that scales more easily. Key steps include writing rules, validating rules, running learning algorithms on the weakly labeled data, and iterating to improve the rules. Examples of using Snorkel for relationship extraction tasks are also provided.
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Codemotion
We are going to talk about Prometheus and how to use to monitor micro-services "Cloud-Native" application s. We are going to dive deep into the Prometheus monitoring model, we will see what are the components be hind this system and how they integrate with each others to provide an efficient and modern monitoring sy stem. We will also have a glance on Prometheus native integrations for cloud-native environments such as Kubernetes.
Query Optimization with MySQL 8.0 and MariaDB 10.3: The BasicsJaime Crespo
Query optimization tutorial for Beginners using MySQL 8.0 and MariaDB 10.3 presented at the Open Source Database Percona Live Europe 2018 organized in Frankfurt. The source can be found and errors can be reported at https://github.com/jynus/query-optimization
Material URL moved to: http://jynus.com/dbahire/pleu18
Starting with the premise that "Performance is a Feature", Matt Warren will show you how to measure, what to measure and how to get the best performance from your .NET code.
We will look at real-world examples from the Roslyn code-base and StackOverflow (the product), including how the .NET Garbage Collector needs to be tamed!
The presentation covers:
Why we should care about performance
Pitfalls to avoid when measuring performance
How the .NET Garbage Collector can hurt performance
Real-world performance lessons from open-source code
The webinar recording can be found here: http://www.postsharp.net/blog/post/webinar-recording-performance-is-a-feature
Nyc open-data-2015-andvanced-sklearn-expandedVivian S. Zhang
Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners.
This talk will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, model evaluation, parameter search, and out-of-core learning.
Apart from metrics for model evaluation, we will cover how to evaluate model complexity, and how to tune parameters with grid search, randomized parameter search, and what their trade-offs are. We will also cover out of core text feature processing via feature hashing.
---------------------------------------------------------
Andreas is an Assistant Research Scientist at the NYU Center for Data Science, building a group to work on open source software for data science. Previously he worked as a Machine Learning Scientist at Amazon, working on computer vision and forecasting problems. He is one of the core developers of the scikit-learn machine learning library, and maintained it for several years.
Material will be posted here:
https://github.com/amueller/pydata-nyc-advanced-sklearn
Blog:
peekaboo-vision.blogspot.com
Twitter:
https://twitter.com/t3kcit
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...Databricks
We all know what they say – the bigger the data, the better. But when the data gets really big, how do you mine it and what deep learning framework to use? This talk will survey, with a developer’s perspective, three of the most popular deep learning frameworks—TensorFlow, Keras, and PyTorch—as well as when to use their distributed implementations.
We’ll compare code samples from each framework and discuss their integration with distributed computing engines such as Apache Spark (which can handle massive amounts of data) as well as help you answer questions such as:
As a developer how do I pick the right deep learning framework?
Do I want to develop my own model or should I employ an existing one?
How do I strike a trade-off between productivity and control through low-level APIs?
What language should I choose?
In this session, we will explore how to build a deep learning application with Tensorflow, Keras, or PyTorch in under 30 minutes. After this session, you will walk away with the confidence to evaluate which framework is best for you.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Performance is a feature! - London .NET User GroupMatt Warren
Starting with the premise that "Performance is a Feature", this session will look at how to measure, what to measure and how get the best performance from your .NET code.
We will look at real-world examples from the Roslyn code-base and StackOverflow (the product), including how the .NET Garbage Collector needs to be tamed!
Towards a Unified Data Analytics Optimizer with Yanlei DiaoDatabricks
Today’s big data analytics systems are best effort only: despite the wide adoption, they still lack the ability to take user monetary constraints and performance goals, and automatically configure an analytic job to achieve those goals. Our work aims to take a step further towards building a new data analytics optimizer that works for arbitrary dataflow programs and determines the job configuration in an automated manner based on user objectives regarding latency, throughput, monetary cost, etc.
At the core of the optimizer are a principled multi-objective optimization framework that enables one to explore the tradeoffs between different objectives, and a deep learning-based modeling approach that can learn a model for each user objective as complex as necessary for the user computing environment. Using both SQL-like and machine learning jobs in Spark, we show that our techniques can learn a model of each objective with high accuracy, and the multi-objective optimizer can automatically recommend new configurations that significantly improve performance from the configurations manually set by engineers.
Summarizing Software API Usage Examples Using Clustering TechniquesNikos Katirtzis
This document presents CLAMS, an approach for automatically mining API usage examples from client code. It clusters API usage sequences, generates summarized snippets from the top clusters, and selects the most representative snippet from each cluster. The methodology can be easily adapted to new programming languages as it relies on abstract syntax trees and API call sequences rather than detailed semantic analysis. The system is evaluated on datasets from several Java libraries and is shown to produce more concise, readable snippets that better match handwritten examples compared to approaches that output API call sequences or less summarized snippets. The clustering approach that allows similar rather than just identical sequences leads to improved results.
Java In-Process Caching - Performance, Progress and PitfallsJens Wilke
How to speed up in-process caching and why you should not use LRU. Comparison of EHCache, Google Guava, Caffeine and cache2k. With benchmarks of throughput and eviction efficiency.
Java In-Process Caching - Performance, Progress and Pittfallscruftex
This document discusses Java in-process caching and summarizes benchmarks of various caching libraries. It finds that Caffeine and cache2k have faster read throughput than Google Guava Cache and EHCache3 when the number of threads increases. Cache2k is the fastest overall. Benchmarking eviction quality shows Caffeine and cache2k have more efficient eviction algorithms than LRU. While Clock is O(n) in theory, cache2k optimizes it to have little increase in scan counts even for large caches. Modern caching libraries use improved algorithms over LRU to achieve better performance.
Caching and tuning fun for high scalabilityWim Godden
Caching has been a 'hot' topic for a few years. But caching takes more than merely taking data and putting it in a cache : the right caching techniques can improve performance and reduce load significantly. But we'll also look at some major pitfalls, showing that caching the wrong way can bring down your site. If you're looking for a clear explanation about various caching techniques and tools like Memcached, Nginx and Varnish, as well as ways to deploy them in an efficient way, this talk is for you.
This document provides an overview of machine learning concepts and code examples in Python. It discusses the typical 5 steps of machine learning projects: collaboration, data collection, clustering, classification, and conclusion. Code snippets demonstrate each step, including collecting data with Scrapy, clustering with k-means, classification with support vector machines, and evaluating results with a confusion matrix. Dimensionality reduction techniques like principal component analysis are also covered.
Starting with the premise that "Performance is a Feature", this session will look at how to measure, what to measure and how get the best performance from your .NET code.
We will look at real-world examples from the Roslyn code-base and StackOverflow (the product), including how the .NET Garbage Collector needs to be tamed!
Shifu (www.shifu.ml) is a fast and scalable machine learning platform. This presentation briefly describes how to convert the Logistic Regression and Neural Network model in Encog, Mahout, and Spark.
Similar to Ekon22 tensorflow machinelearning2 (20)
In the last sessions we have seen that P4D (Python 4 Delphi) is powerful enough to offer components, Python packages or libraries in Delphi or Lazarus (FPC). This time we go the other way of usage and integration; how does the Python or web world in the shell benefit from the VCL components as GUI controls. We create a Python extension module from Delphi classes, packages or functions. Building Delphi’s VCL library as a specific Python module in a console or editor and launching a complete Windows GUI from a script can be the start of a long journey.
The flood of Open APIs is now so blatant that we take a closer look at some basics and principles. Of course, the best way to understand how APIs work is to try them. While most APIs require access via API keys or have complicated authentication and authorization methods, there are also open APIs with no requirements or licenses whatsoever. This is especially useful for beginners as we can start exploring different APIs right away. It’s also useful for web developers who want easy access to a sample dataset for their app; e.g. most weather apps get their weather forecast data from a weather API instead of building weather stations themselves.
Faker is a Python library that generates fake data. Fake data is often used for testing or filling databases with some dummy data. Faker is heavily inspired by PHP's Faker, Perl's Data::Faker, and by Ruby's Faker.
Many of the applications and organizations provide avatar features. Finally, synthetic datasets can minimize privacy concerns. Attempts to anonymize data can be ineffective, as even if sensitive/identifying variables are removed from the dataset
Python for Delphi (P4D) is a set of free components that wrap up the Python DLL into Delphi and Lazarus (FPC). They let you easily execute Python scripts, create new Python modules and new Python types. You can create Python extensions as DLLs and much more like scripting. P4D provides different levels of functionality: Low-level access to the python API High-level bi-directional interaction with Python Access to Python objects using Delphi custom variants (VarPyth.pas).
Python for Delphi (P4D) is a set of free components that wrap up the Python DLL into Delphi and Lazarus (FPC). They let you easily execute Python scripts, create new Python modules and new Python types. You can create Python extensions as DLLs and much more like scripting. P4D provides different levels of functionality:
Low-level access to the python API
High-level bi-directional interaction with Python
Access to Python objects using Delphi custom variants (VarPyth.pas)
Wrapping of Delphi objects for use in python scripts using RTTI (WrapDelphi.pas)
Creating python extension modules with Delphi classes and functions
Generate Scripts in maXbox from Python Installation
The document describes steps to build and train an image classification model using Lazarus, the neural-api library, and Google Colab. It clones the neural-api GitHub repository, installs dependencies like FPC and Lazarus, builds and trains a simple image classifier on the CIFAR-10 dataset, and exports the trained model weights and training logs. The process demonstrates how to leverage Google Colab's GPUs to train deep learning models using Lazarus and Pascal.
The portable pixmap format(PPM), the portable graymap format(PGM) and portable bitmap format(PBM) are image file formats designed to be easily exchanged between platforms. They are also sometimes referred collectively as the portable anymap format(PNM). These formats are a convenient (simple) method of saving image data. And the format is not even limited to graphics, its definition allowing it to be used for arbitrary three-dimensional matrices or cubes of unsigned integers.
This tutor puts a trip to the kingdom of object recognition with computer vision knowledge and an image classifier.
Object detection has been witnessing a rapid revolutionary change in some fields of computer vision. Its involvement in the combination of object classification
as well as object recognition makes it one of the most challenging topics in the domain of machine learning & vision.
How can we visualize data in machine learning with VS Code? This is a C# wrapper for the GraphViz graph generator for dotnet core. Further bindings for Python GraphViz are shown and exports to MS Power BI all in MS Visual Code, Jupyter and dotnet core.
K-CAI NEURAL API is a Keras based neural network API for machine learning that will allow you to prototype with a lots of possibilities of Tensorflow! Python, Free Pascal and Delphi together in Google Colab, Git or the Community Edition.
Software is changing the world. CGC is a Common Gateway Coding as the name says, it is a "common" language approach for almost everything. I want to show how a multi-language approach to infrastructure as code using general purpose programming languages lets cloud engineers and code producers unlocking the same software engineering techniques commonly used for applications.
Code Review Checklist: How far is a code review going? "Metrics measure the design of code after it has been written, a Review proofs it and Refactoring improves code."
In this paper a document structure is shown and tips for a code review.
Some checks fits with your existing tools and simply raises a hand when the quality or security of your codebase is impaired.
Open LDAP as A directory serviceis a system for storing and retrieving information in a tree-like structure with the following key properties:
Optimized for reading Distributed storage model Extensible data storage types Advanced search capabilities Consistent replication possibilities
This document discusses closures and functional programming. It begins with an agenda that covers closures as code blocks, their history in languages like Lisp and Scheme, examples of functional programming, and using closures for refactoring. It then discusses a case study on experiences with a polygraph design, including optimizations with closures, packaging, and applying the Demeter principle. Finally, it provides links for further reading on closures.
This document explains how to redirect console output from a GUI application to the parent command prompt process. It describes using the AttachConsole and FreeConsole functions to attach and detach a process from the console. The GetParentProcessName function is used to get the name of the parent process (e.g. cmd.exe or powershell.exe) to determine if output should be redirected. The code sample shows attaching the console, writing sample output, and detaching when complete.
Introduction to use machine learning in python and pascal to do such a thing like train prime numbers when there are algorithms in place to determine prime numbers. See a dataframe, feature extracting and a few plots to re-search for another hot experiment to predict prime numbers.
This tutor shows the train and test set split with binary classifying, clustering and 3D plots and discuss a probability density function in scikit-learn on synthetic datasets. The dataset is very simple as a reference of understanding.
This document discusses machine learning techniques including linear support vector machines (SVMs), data splitting, model fitting and prediction, and histograms. It summarizes an SVM tutorial for predicting samples and evaluating models using classification reports and confusion matrices. It also covers kernel density estimation, PCA, and comparing different classifiers.
In this article you will learn hot to use tensorflow Softmax Classifier estimator to classify MNIST dataset in one script.
This paper introduces also the basic idea of a artificial neural network.
The term “machine learning” is used to describe one kind of “artificial intelligence” (AI) where a machine is able to learn and adapt through its own experience. We crawled and collected 30 top overview diagrams which shows the topic of methods, algorithms and concepts.
Natural Language Processing (NLP), RAG and its applications .pptxfkyes25
1. In the realm of Natural Language Processing (NLP), knowledge-intensive tasks such as question answering, fact verification, and open-domain dialogue generation require the integration of vast and up-to-date information. Traditional neural models, though powerful, struggle with encoding all necessary knowledge within their parameters, leading to limitations in generalization and scalability. The paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" introduces RAG (Retrieval-Augmented Generation), a novel framework that synergizes retrieval mechanisms with generative models, enhancing performance by dynamically incorporating external knowledge during inference.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
3. CASSANDRASystem
3
What is TensorFlow?
Intro Link to EuroPython
Probabilistic (NB, MNB, PD,
BN, BC, PLSA, LDA, ...)
Independence assumptions
made
Stochastic or Distance-based
(SVM, matching, VSIM, k-
NN,CR, PageRank,Kmeans)
Features used
Structures exploited
Model-based (rules, BC, BN,
boosting)
Social media driven
7. CASSANDRASystem
7
from keras.datasets import mnist
0 1 2 3 4
5 6 7 8 9
76 english, 38 content="voa,
36 美国之音 " 74 special hand written
44 Modified National Institute of Standards and
Technology database)
36 голос 36 америки
The MNIST dataset is comprised of 70,000 handwritten numeric
digit images and their respective labels 0..9.
2. Main: C:maXboxmX46210DataScienceplot_confusion_matrix_vectormachine.py
3. Second C:maXboxmX46210DataScienceconfusionlistmnist_softmax21.py
https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/4_Utils/tensorboard_basic.ipynb
There are 60,000 training images and 10,000 test images, all of
which are 28 pixels by 28 pixels.
8. CASSANDRASystem
8
QT from keras import backend as K
@ex.automain
def define_and_train(batch_size, epochs,
convolution_layers, maxpooling_pool_size, maxpooling_dropout,
dense_layers, dense_dropout, final_dropout,_run):
from keras.models import Sequential #convolution
from keras.layers import Dense, Dropout, Flatten, Conv2D,
from keras.utils import to_categorical
from keras.losses import categorical_crossentropy
from keras.optimizers import Adadelta
from keras import backend as K
from keras.callbacks import ModelCheckpoint, Callback
4. C:maXboxmX46210ntwdblib.dllUnsharpDetector-masterUnsharpDetector-masterinference_gui.py
9. CASSANDRASystem
9
MongoDB/Sacred module import class
0 1 2 3 4
5 6 7 8 9
from __future__ import division, print_function,
unicode_literals
from sacred import Experiment
from sacred.observers import MongoObserver
from sacred.utils import apply_backspaces_and_linefeeds
import pymongo, pickle, os
import pydot as pdot
import numpy as np
import tensorflow as tf
5. C:maXboxmX46210DataScienceconfusionlisttrain_convnet_tf.py
19. 19
19
Machine Learning Jokes ;-)
1. There are two kinds of data scientists:
1) Those who can extrapolate from
incomplete data.
2. In data science, 80 percent of time
spent is preparing data, 20 percent of
time is spent complaining about the need
to prepare data.
https://maxbox4.wordpress.com/
http://www.softwareschule.ch/examples/machinelearning.jpg