How to organise a Jupyter IPython notebook research project, so that yourself, as well as others, be able to read, understand and reproduce your work? How big should a notebook be? What to put in one cell? How do Clean Code principles outlined by Robert C. Martin aka Uncle Bob relate to Python and more specifically to IPython?
Why is Python emerging technology?
Python with DataSciences and Machine Learning is future.
Python can also be used with Electronics.
Python as Scripting Language
Data Scientists and Machine Learning practitioners, nowadays, seem to be churning out models by the dozen and they continuously experiment to find ways to improve their accuracies. They also use a variety of ML and DL frameworks & languages , and a typical organization may find that this results in a heterogenous, complicated bunch of assets that require different types of runtimes, resources and sometimes even specialized compute to operate efficiently.
But what does it mean for an enterprise to actually take these models to "production" ? How does an organization scale inference engines out & make them available for real-time applications without significant latencies ? There needs to be different techniques for batch (offline) inferences and instant, online scoring. Data needs to be accessed from various sources and cleansing, transformations of data needs to be enabled prior to any predictions. In many cases, there maybe no substitute for customized data handling with scripting either.
Enterprises also require additional auditing and authorizations built in, approval processes and still support a "continuous delivery" paradigm whereby a data scientist can enable insights faster. Not all models are created equal, nor are consumers of a model - so enterprises require both metering and allocation of compute resources for SLAs.
In this session, we will take a look at how machine learning is operationalized in IBM Data Science Experience (DSX), a Kubernetes based offering for the Private Cloud and optimized for the HortonWorks Hadoop Data Platform. DSX essentially brings in typical software engineering development practices to Data Science, organizing the dev->test->production for machine learning assets in much the same way as typical software deployments. We will also see what it means to deploy, monitor accuracies and even rollback models & custom scorers as well as how API based techniques enable consuming business processes and applications to remain relatively stable amidst all the chaos.
Speaker
Piotr Mierzejewski, Program Director Development IBM DSX Local, IBM
For real world application, convolutional neural network(CNN) model can take more than 100MB of space and can be computationally too expensive. Therefore, there are multiple methods to reduce this complexity in the state of art. Ristretto is a plug-in to Caffe framework that employs several model approximation methods. For this projects, first a CNN model is trained for Cifar-10 dataset with Caffe, then Ristretto will be use to generate multiple approximated version of the trained model using different schemes. The goal of this projects is comparison of the models in terms of execution performance, model size and cache utilizations in the test or inference phase. The same steps are done with Tensorflow and Quantisation tool. The quantisation schemes of Tensorflow and Ristretto are then compared.
Analysis of data in Python with SciPy and pandas, Ubuntu installation, PyCharm configuration, Series, DataFrame, big data, medical data, merging data, groupby, graphing data, iPython using Wakari.io, and analyzing stock prices of US automakers including Ford and Telsa. As presented at Penguicon 2016.
Why is Python emerging technology?
Python with DataSciences and Machine Learning is future.
Python can also be used with Electronics.
Python as Scripting Language
Data Scientists and Machine Learning practitioners, nowadays, seem to be churning out models by the dozen and they continuously experiment to find ways to improve their accuracies. They also use a variety of ML and DL frameworks & languages , and a typical organization may find that this results in a heterogenous, complicated bunch of assets that require different types of runtimes, resources and sometimes even specialized compute to operate efficiently.
But what does it mean for an enterprise to actually take these models to "production" ? How does an organization scale inference engines out & make them available for real-time applications without significant latencies ? There needs to be different techniques for batch (offline) inferences and instant, online scoring. Data needs to be accessed from various sources and cleansing, transformations of data needs to be enabled prior to any predictions. In many cases, there maybe no substitute for customized data handling with scripting either.
Enterprises also require additional auditing and authorizations built in, approval processes and still support a "continuous delivery" paradigm whereby a data scientist can enable insights faster. Not all models are created equal, nor are consumers of a model - so enterprises require both metering and allocation of compute resources for SLAs.
In this session, we will take a look at how machine learning is operationalized in IBM Data Science Experience (DSX), a Kubernetes based offering for the Private Cloud and optimized for the HortonWorks Hadoop Data Platform. DSX essentially brings in typical software engineering development practices to Data Science, organizing the dev->test->production for machine learning assets in much the same way as typical software deployments. We will also see what it means to deploy, monitor accuracies and even rollback models & custom scorers as well as how API based techniques enable consuming business processes and applications to remain relatively stable amidst all the chaos.
Speaker
Piotr Mierzejewski, Program Director Development IBM DSX Local, IBM
For real world application, convolutional neural network(CNN) model can take more than 100MB of space and can be computationally too expensive. Therefore, there are multiple methods to reduce this complexity in the state of art. Ristretto is a plug-in to Caffe framework that employs several model approximation methods. For this projects, first a CNN model is trained for Cifar-10 dataset with Caffe, then Ristretto will be use to generate multiple approximated version of the trained model using different schemes. The goal of this projects is comparison of the models in terms of execution performance, model size and cache utilizations in the test or inference phase. The same steps are done with Tensorflow and Quantisation tool. The quantisation schemes of Tensorflow and Ristretto are then compared.
Analysis of data in Python with SciPy and pandas, Ubuntu installation, PyCharm configuration, Series, DataFrame, big data, medical data, merging data, groupby, graphing data, iPython using Wakari.io, and analyzing stock prices of US automakers including Ford and Telsa. As presented at Penguicon 2016.
K-Folds cross-validation is one method that attempts to maximize the use of the available data for training and then testing a model. It is particularly useful for assessing model performance, as it provides a range of accuracy scores across (somewhat) different data sets.
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Edureka!
(Python Certification Training for Data Science: https://www.edureka.co/python)
This Edureka video on "Scikit-learn Tutorial" introduces you to machine learning in Python. It will also takes you through regression and clustering techniques along with a demo on SVM classification on the famous iris dataset. This video helps you to learn the below topics:
1. Machine learning Overview
2. Introduction to Scikit-learn
3. Installation of Scikit-learn
4. Regression and Classification
5. Demo
Subscribe to our channel to get video updates. Hit the subscribe button and click the bell icon.
PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of environment. This talk is a practical demo using PyCaret in your existing workflows and supercharges your data science team's productivity.
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
Over the last year there has been a lot of buzz about Clean Architecture in the Android community, but what is Clean Architecture? How does it work? And should I be using it? Recently at Badoo we decided to rewrite our messenger component.
Over the years this core piece of functionality in our app has become large and unwieldy. We wanted to take a fresh approach to try and prevent this from happening again. We choose to use Clean Architecture to achieve our goal. This talk intends to share our journey from theory to implementation in an application with over 100 million downloads. By the end, you should not only understand what Clean Architecture is, but how to implement it, and whether you should.
« Le « Machine Learning » – « Apprentissage statistique » ou « Analyse prédictive » - sort des labos de recherche et des cercles de spécialistes pour être de plus en plus être utilisé au sein des entreprises, et pas seulement les startups. En témoigne l’essor de la toolkit OpenSource Scikit-learn très vite répandue internationalement comme l’un des nouveaux standards de cette nouvelle façon de faire du logiciel, mais aussi la disponibilité depuis juillet 2014 d’Azure ML, le service de Machine Learning de Microsoft Azure. Dans cette session nous vous proposons un aperçu du développement de logiciel d’apprentissage statistique en Python avec SciKit-Learn. Nous invitons l'un des principaux contributeurs de cette toolkit, Olivier Grisel , ingénieur de recherche dans l’équipe équipe Inria PARIETAL à Saclay, à venir nous en présenter un aperçu dans une session interactive et basée sur de nombreux exemples et démos. Pour en savoir plus: http://scikit-learn.org https://team.inria.fr/parietal/ https://twitter.com/ogrisel
Abstract: This PDSG workshop introduces basic concepts of splitting a dataset for training a model in machine learning. Concepts covered are training, test and validation data, serial and random splitting, data imbalance and k-fold cross validation.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Deep Learning With Python Tutorial | EdurekaEdureka!
** Tensorflow Training: https://www.edureka.co/ai-deep-learning-with-tensorflow **
This PPT on "Deep Learning with Python" will provide you with detailed and comprehensive knowledge of Deep Learning, How it came into the emergence. The various subparts of Data Science, how they are related and How Deep Learning is revolutionalizing the world we live in. This Tutorial covers the following topics:
Introduction To AI, ML, and DL
What is Deep Learning
Applications of Deep Learning
What is a Neural Network?
Structure of Perceptron
Demo: Perceptron from scratch
Demo: Creating Deep Neural Nets
Deep Learning blog series: https://bit.ly/2xVIMe1
Deep Learning With TensorFlow Playlist: https://goo.gl/cck4hE
Instagram:https://www.instagram.com/edureka_lea...
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
DVC - Git-like Data Version Control for Machine Learning projectsFrancesco Casalegno
DVC is an open-source tool for versioning datasets, artifacts, and models in Machine Learning projects.
This extremely powerful tool allows you to leverage an intuitive git-like interface to seamlessly
1. track datasets version updates
2. have reproducible and sharable machine learning pipelines (e.g. model training)
3. compare model performance scores
4. integrate your data and model versioning with git
5. deploy the desired version of your trained models
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...Altinity Ltd
OSA Con 2022: What Data Engineering Can Learn from Frontend Engineering
Pete Hunt - Elementl
Frontend engineering went through a revolution in the last decade. I'll recap what happened, and how a similar revolution started in data engineering.
K-Folds cross-validation is one method that attempts to maximize the use of the available data for training and then testing a model. It is particularly useful for assessing model performance, as it provides a range of accuracy scores across (somewhat) different data sets.
Scikit Learn Tutorial | Machine Learning with Python | Python for Data Scienc...Edureka!
(Python Certification Training for Data Science: https://www.edureka.co/python)
This Edureka video on "Scikit-learn Tutorial" introduces you to machine learning in Python. It will also takes you through regression and clustering techniques along with a demo on SVM classification on the famous iris dataset. This video helps you to learn the below topics:
1. Machine learning Overview
2. Introduction to Scikit-learn
3. Installation of Scikit-learn
4. Regression and Classification
5. Demo
Subscribe to our channel to get video updates. Hit the subscribe button and click the bell icon.
PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of environment. This talk is a practical demo using PyCaret in your existing workflows and supercharges your data science team's productivity.
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
Over the last year there has been a lot of buzz about Clean Architecture in the Android community, but what is Clean Architecture? How does it work? And should I be using it? Recently at Badoo we decided to rewrite our messenger component.
Over the years this core piece of functionality in our app has become large and unwieldy. We wanted to take a fresh approach to try and prevent this from happening again. We choose to use Clean Architecture to achieve our goal. This talk intends to share our journey from theory to implementation in an application with over 100 million downloads. By the end, you should not only understand what Clean Architecture is, but how to implement it, and whether you should.
« Le « Machine Learning » – « Apprentissage statistique » ou « Analyse prédictive » - sort des labos de recherche et des cercles de spécialistes pour être de plus en plus être utilisé au sein des entreprises, et pas seulement les startups. En témoigne l’essor de la toolkit OpenSource Scikit-learn très vite répandue internationalement comme l’un des nouveaux standards de cette nouvelle façon de faire du logiciel, mais aussi la disponibilité depuis juillet 2014 d’Azure ML, le service de Machine Learning de Microsoft Azure. Dans cette session nous vous proposons un aperçu du développement de logiciel d’apprentissage statistique en Python avec SciKit-Learn. Nous invitons l'un des principaux contributeurs de cette toolkit, Olivier Grisel , ingénieur de recherche dans l’équipe équipe Inria PARIETAL à Saclay, à venir nous en présenter un aperçu dans une session interactive et basée sur de nombreux exemples et démos. Pour en savoir plus: http://scikit-learn.org https://team.inria.fr/parietal/ https://twitter.com/ogrisel
Abstract: This PDSG workshop introduces basic concepts of splitting a dataset for training a model in machine learning. Concepts covered are training, test and validation data, serial and random splitting, data imbalance and k-fold cross validation.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Deep Learning With Python Tutorial | EdurekaEdureka!
** Tensorflow Training: https://www.edureka.co/ai-deep-learning-with-tensorflow **
This PPT on "Deep Learning with Python" will provide you with detailed and comprehensive knowledge of Deep Learning, How it came into the emergence. The various subparts of Data Science, how they are related and How Deep Learning is revolutionalizing the world we live in. This Tutorial covers the following topics:
Introduction To AI, ML, and DL
What is Deep Learning
Applications of Deep Learning
What is a Neural Network?
Structure of Perceptron
Demo: Perceptron from scratch
Demo: Creating Deep Neural Nets
Deep Learning blog series: https://bit.ly/2xVIMe1
Deep Learning With TensorFlow Playlist: https://goo.gl/cck4hE
Instagram:https://www.instagram.com/edureka_lea...
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
DVC - Git-like Data Version Control for Machine Learning projectsFrancesco Casalegno
DVC is an open-source tool for versioning datasets, artifacts, and models in Machine Learning projects.
This extremely powerful tool allows you to leverage an intuitive git-like interface to seamlessly
1. track datasets version updates
2. have reproducible and sharable machine learning pipelines (e.g. model training)
3. compare model performance scores
4. integrate your data and model versioning with git
5. deploy the desired version of your trained models
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...Altinity Ltd
OSA Con 2022: What Data Engineering Can Learn from Frontend Engineering
Pete Hunt - Elementl
Frontend engineering went through a revolution in the last decade. I'll recap what happened, and how a similar revolution started in data engineering.
How to write maintainable code - Peter Hilton - Codemotion Amsterdam 2017Codemotion
The problem that new technology doesn’t fix is unmaintainable code. Clean code with good tests is essential, but not enough. This talk introduces techniques like getting better at naming, explaining code with tests, the few code comments you actually need, README-driven development and writing Minimum Viable Documentation. After the excitement of adopting new technology and software craftsmanship comes the horror of your next software maintenance project. As Jean-Paul Sartre said*, ‘Hell is other people’s code’. Whatever your level, your future happiness depends on maintainable code.
Modern languages’ biggest problem isn’t having enough cool features, it’s unmaintainable code. The core of maintainable code is clean code with good tests, but that by itself is not enough. This talk introduces a range of techniques for writing and improving code for maintainability, including how to get better at naming, explaining code with tests, the few code comments you actually need, README-driven development and how to write Minimum Viable Documentation.
Attendees will see how to combine a number of techniques they have already encountered separately, plus at least one technique they’ve never heard of and can use immediately. Naming and abstraction are too hard to get right every single time, so you need to know when to add small doses of comments and documentation. More importantly, you need to know how to avoid wasting time on bad comments and unnecessary documentation without giving up entirely and not having any at all.
After the excitement of early adoption, and the joy of coding among experts comes the horror of your first maintenance project. As Jean-Paul Sartre said*, ‘Hell is other people’s code’. Whether you are a new Scala developer or an experienced team lead, your future developer experience and happiness depends on maintainable code.
ABSTRACT: The ongoing big data revolution has revolutionized the way in which technology is used to empower new business segments like social networking and transform old business segments like traditional retail. However, the DNA that is used to build data processing platform is evolving quite rapidly. There is a plethora of competing tools, technologies, and “religion” for how to build state-of-the-art data analysis frameworks. In this talk, I will go over five ways to build scalable high-performance long-lasting data analysis frameworks in the wrong way. Surprisingly, the industry is full of examples of organization building frameworks in this “wrong” way. Since the “right” way to build a technology framework is dependent on the key business drivers, it is my hope that this talk will spur a discussion on what is the “right” way for Pinterest. The talk will focus on technologies including “data plumbing” (e.g. tools in the Hadoop ecosystem), and statistical modeling methods (e.g. R and Python). In this talk, I’ll try to connect to platform builders, data scientists, and business decision makers.
BIO: Jignesh Patel is a Professor in Computer Sciences at the University of Wisconsin-Madison, where he also earned his Ph.D. He has worked in the area of databases (now fashionably called “big data”) for over two decades. He has won several best paper awards, and industry research awards. He is the recipient of the Wisconsin COW teaching award, and the U. Michigan College of Engineering Education Excellence Award. He has a strong interest in seeing research ideas transition to actual products. His Ph.D. thesis work was acquired by NCR/Teradata in 1997, and he also co-founded Locomatix -- a startup that built a platform to power real-time data-driven mobile services. Locomatix became part of Twitter in 2013. He is an ACM Distinguished Scientist and an IEEE Senior Member. He also serves on the board of Lands’ End, and advises a number of startups.
Prometheus as exposition format for eBPF programs running on KubernetesLeonardo Di Donato
The kernel knows more than our programs. Stop bloating our applications with copy-and-paste instrumentation code for metrics. Let's go look under the hoods!
Nowadays every application exposes their metrics via an HTTP endpoint readable by using Prometheus. Nevertheless, this very common pattern, by definition only exposes metrics regarding the specific applications being observed.
This talk, and its companion slides, wants to expose the idea, and a reference implementation (https://github.com/bpftools/kube-bpf), of using eBPF programs to collect and automatically expose applications and kernel metrics via a Prometheus endpoint.
It walks through the architecture of the proposed reference implementation - a Kubernetes operator with a custom resource for eBPF programs - and finally links to a simple demo showing how to use it to grab and present some metrics without having touched any application running on the demo cluster.
---
Talk given at Cloud_Native Rejekts EU - Barcelona, Spain - on May 18th, 2019
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2J5O3XV.
Howard Chu gives tips and techniques for writing highly efficient and scalable software drawn from decades of experience. The guiding principle is a simple one, and can be applied nearly everywhere. The talk is focused on programming in C. Filmed at qconlondon.com.
Howard Chu founded Symas Corp. with 5 other partners and serves as its CTO. His work has spanned a wide range of computing topics, including most of the GNU utilities, networking protocols and tools, kernel and filesystem drivers, and focused on maximizing the useful work from a system. His current focus is database oriented, covering LDAP, LMDB, and other non-relational database technologies.
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)Igalia
By Daniel Ehrenberg.
Have you ever found yourself with a disappointing rounding error due to how Numb
ers work, or using awkward workarounds to avoid thinking about decimals? It's a
shame that JavaScript chose binary floating point as its basic numeric type when
it causes all of these problems. What can we do about it?
(c) Node.TLV 2020
https://www.nodetlv.com/
https://www.youtube.com/watch?v=G3Q4vWf8Peo
A short introduction to the more advanced python and programming in general. Intended for users that has already learned the basic coding skills but want to have a rapid tour of more in-depth capacities offered by Python and some general programming background.
Execrices are available at: https://github.com/chiffa/Intermediate_Python_programming
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SBrandon Liu
Implementing a fixed point int16_t integer matrix vector multiplication kernel for Intel processors with AVX-512 and the Xbyak just-in-time compiler (what Intel MKL jit_cgemm uses)
Big Decimal: Avoid Rounding Errors on Decimals in JavaScriptIgalia
Daniel Ehrenberg talks at Node.TLV 2020 about TC39, how it works, and some intersting proposals about dealing with numbers in JavaScript working their way through the process: BigInt and the newer BigDecimal.
In both business and academia, reproducibility is the root of trust in computational results. Software engineering is critical to consistently support the work. Basic practices including systematic build, automatic tests, version control, and documentation will be discussed. The practices and a mix of programming languages enable developers to address the challenges in scientific computing by creating a flexible and high-performance platform. The fundamental idea is to generalize and segregate common constructs so that they can be reinforced during the whole cycles of code development, to enhance the confidence in the produced results. It is also necessary to engage in software development communities, as a means to improve productivity and give back.
Slides from PyData Berlin, July 2017 meetup
Plotly Dash is a newest addition to a rich ecosystem of tools to build visual data science and BI applications in Python.
Overview of classical Statistical Tests and how to apply them in python to a real world problems in online setting. A/B testing. Confidence intervals. Bayesian and. Frequentist methods. Presented at PyData Dallas 2015.
What is good data visualisation. How do we apply best practises of data visualisation at scale? How do we make sure that all visualisations produced by your analytics team both look good and is effortless.
Is Agile Data Science just two buzzwords put together? I argue that agile is a very practical and applicable methodology, that does work well in the real world for all sorts of Analytics and Data Science workflows.
http://theinnovationenterprise.com/summits/digital-web-analytics-summit-london-2015/schedule
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
2. @KNerush @Volodymyrk
Volodymyr (Vlad) Kazantsev
Head of Data @ product madness
Product Manager
MBA @LBS
Graphics programming
Writes code for money since 2002
Math degree
2
Kateryna (Katya) Nerush
Mobile Dev @ Octopus Labs
Dev Lead in Finance
Data Engineer
Web Developer
Writes code for money since 2003
CS degree
4. @KNerush @Volodymyrk
Who are Data Scientists, really?
4
Coding
Stats Business “In a nutshell, coding is telling a computer to do
something using a language it understands.”
Data Science with Python
6. @KNerush @Volodymyrk
“Any fool can write code that a computer can understand. Good programmers write
code that humans can understand” - Kent Beck, 1999
6
WTF! How am I suppose to
validate this??
Sorry, but how do
can I calculate
7 day retention ?
8. @KNerush @Volodymyrk
You do it for your own good..
8
Re-run all AB tests
analysis for the
last months, by
tomorrow
Ideas &
Questions
Data
Analysis
Insights
Impact
11. @KNerush @Volodymyrk
“Clean Code” ?
11
Pleasingly graceful and stylish in appearance
or manner
Bjarne Stroustrup
Inventor of C++
Clean code reads like well written prose
Grady Booch
creator of UML
.. each routine turns out to be pretty much what
you expected
Ward Cunningham
inventor of Wiki and XP
12. @KNerush @Volodymyrk
One does not simply start writing clean code..
12
First make it work,
Then make it Right,
Then make it fast and small
Kent Beck
co-inventor of XP and TDD
Leave the campground cleaner than you found it
- Run all the tests
- Contains no duplicate code
- Expresses all ideas...
- Minimize classes and methods
Ron Jeffries
author of Extreme
Programming Installed
The Boy Scouts of America
Applied to programming by
Uncle Bob
14. @KNerush @Volodymyrk
“There are only two hard problems in Computer Science:
cache invalidation and naming things" - Phil Karlton
● long_descriptive_names
○ Avoid: x, i, stuff, do_blah()
● Pronounceable and Searchable
○ revenue_per_payer vs. arpdpu
● Avoid encodings, abbreviations, prefixes, suffixes.. if possible
○ bonus_points_on_iphone vs. cns_crm_dip
● Add meaningful context
○ daily_revenue_per_payer
● Don’t be lazy.
○ Spend time naming and renaming things.
14
15. @KNerush @Volodymyrk
“each routine turns out to be pretty much what you
expected” - Ward Cunningham
● Small
● Do one thing
● One Level of Abstraction
● Have only few arguments (one is the best)
○ Less important in Python, with named arguments.
15
16. @KNerush @Volodymyrk
● Use good names
● Avoid obvious comments.
● Dead Commented-out Code
● ToDo, licenses, history, markup for documentation and other nonsense
● But there are exceptions..
“When you feel the need to write a comment, first try to refactor
the code so that any comment becomes superfluous” Kent Beck
16
20. @KNerush @Volodymyrk
“Long functions is where classes are trying to hide” -
Robert C. Martin
20
● Small
● Do one thing
● SOLID, Design Patterns, etc.
21. @KNerush @Volodymyrk
Code conventions
● Team should produce same style code as if that was one person
● Team conventions over language one, over personal ones
● Automate style formatting
21
23. @KNerush @Volodymyrk
● Indentation
● Tabs or Spaces?
● Maximum Line Length
● Should a line break before or after a binary operator?
● Blank Lines
● Imports
● Comments
● Naming Conventions
Example:
PEP 8 -- Style Guide for Python Code
23
foo = long_function_name(var_one, var_two,
var_three, var_four)
foo = long_function_name(var_one, var_two,
var_three, var_four)
Good Bad
https://www.python.org/dev/peps/pep-0008/
25. @KNerush @Volodymyrk25
My favourite !
This is not Java or C++
● Functions are first-class objects
● Duck-typing as an interface
● No setters/getters
● Itertools, zip, enumerate
● etc.
27. @KNerush @Volodymyrk
1. Imports
27
2. Get Data
5.Visualisation
6. Making sense of the data
4. Modelling
3. Transform Data
Typical structure of the ipynb
37. @KNerush @Volodymyrk
● One “idea - execution - output” triplet per cell
● Import Cell: expected output is no import errors
● CMD+SHIFT+P
37
Tip 4: each cell should have one logical output
38. @KNerush @Volodymyrk
Tip 5: write tests .. in jupyter notebooks
38
https://pypi.python.org/pypi/pytest-ipynb
40. @KNerush @Volodymyrk
Code Smells .. in ipynb
- Cells can’t be executed in order (with runAll and Restart&RunAll)
- Prototype (check ideas) code is mixed with “analysis” code
- Debugging cells
- Copy-paste cells
- Duplicate code (in general)
- Multiple notebooks that re-implement the same function
40
43. @KNerush @Volodymyrk
Summary: How to organise a Jupyter project
1. Notebook should have one Hypothesis-Data-Interpretation loop
2. Make a multi-project utils library
3. Good jupyter notebook reads like a well written prose
4. Each cell should have one and only one output
5. Write tests in notebooks
6. Deploy a shared Jupyter server
7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible.
43