District Data Labs Workshop
Current Workshop: August 23, 2014
Previous Workshops:
- April 5, 2014
Data products are usually software applications that derive their value from data by leveraging the data science pipeline and generate data through their operation. They aren’t apps with data, nor are they one time analyses that produce insights - they are operational and interactive. The rise of these types of applications has directly contributed to the rise of the data scientist and the idea that data scientists are professionals “who are better at statistics than any software engineer and better at software engineering than any statistician.”
These applications have been largely built with Python. Python is flexible enough to develop extremely quickly on many different types of servers and has a rich tradition in web applications. Python contributes to every stage of the data science pipeline including real time ingestion and the production of APIs, and it is powerful enough to perform machine learning computations. In this class we’ll produce a data product with Python, leveraging every stage of the data science pipeline to produce a book recommender.
An Interactive Visual Analytics Dashboard for the Employment Situation ReportBenjamin Bengfort
The Employment Situation Report is a monthly news release by the Bureau of Labor Statistics which describes the results of the Current Population Survey. Its release is widely anticipated by economists, journalists, and politicians as it is used to forecast the economic condition of the United States by describing ongoing trends and has a broad impact on public and corporate economic confidence leading directly to investment decisions. The report itself is in a PDF format that is comprised primarily of text and tabular information. Quickly and correctly interpreting the results of the jobs report is vital for quality reporting and decision making, but the report is more suited for longer study than deriving insights. In this project we explore the use of an interactive dashboard for visual analytics upon the released BLS data. Using an application demonstration and a usability study we will show that visually interacting with the most current employment data, users are able to rapidly achieve rich insights similar to those reported on in the text of the Employment Situation Report.
This presentation about Scikit-learn will help you understand what is Scikit-learn, what can we achieve using Scikit-learn and a demo on how to use Scikit-learn in Python. Scikit is a powerful and modern machine learning python library. It's a great tool for fully and semi-automated advanced data analysis and information extraction. There are a lot of reasons why Scikit-Learn is a preferred machine learning tool. It has efficient tools to identify and organize problems, such as whether it fits a supervised or unsupervised learning model. It contains many free and open data sets. It has a rich set of built-in libraries for learning and predicting. It provides model support for every problem type. It also has built-in functions such as pickle for model persistence. It is supported by a huge open source community and vendor base. Now, let us get started and understand Sciki-Learn in detail.
Below topics are explained in this Scikit-Learn presentation:
1. What is Scikit-learn?
2. What we can achieve using Scikit-learn
3. Demo
Simplilearn’s Python Training Course is an all-inclusive program that will introduce you to the Python development language and expose you to the essentials of object-oriented programming, web development with Django and game development. Python has surpassed Java as the top language used to introduce U.S. students to programming and computer science. This course will give you hands-on development experience and prepare you for a career as a professional Python programmer.
What is this course about?
The All-in-One Python course enables you to become a professional Python programmer. Any aspiring programmer can learn Python from the basics and go on to master web development & game development in Python. Gain hands-on experience creating a flappy bird game clone & website functionalities in Python.
What are the course objectives?
By the end of this online Python training course, you will be able to:
1. Internalize the concepts & constructs of Python
2. Learn to create your own Python programs
3. Master Python Django & advanced web development in Python
4. Master PyGame & game development in Python
5. Create a flappy bird game clone
The Python training course is recommended for:
1. Any aspiring programmer can take up this bundle to master Python
2. Any aspiring web developer or game developer can take up this bundle to meet their training needs
Learn more at https://www.simplilearn.com/mobile-and-software-development/python-development-training
Interpreting deep learning and machine learning models is not just another regulatory burden to be overcome. Scientists, physicians, researchers, and analyst that use these technologies for their important work have the right to trust and understand their models and the answers they generate. This talk is an overview of several techniques for interpreting deep learning and machine learning models and telling stories from their results.
Speaker: Patrick Hall is a Data Scientist and Product Engineer at H2O.ai. He’s also an Adjunct Professor at George Washington University for the Department of Decision Sciences. Prior to joining H2O, Patrick spent many years as a Senior Data Scientist SAS and has worked with many Fortune 500 companies on their data science and machine learning problems. https://www.linkedin.com/in/jpatrickhall
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
Given at the PyData NYC 2013 conference (http://vimeo.com/79517341), and will be given at PyTennessee 2014.
Scikit-learn is one of the most well-known machine learning Python modules in existence. But how does it work, and what, for that matter, is machine learning? For those with programming experience but who are new to machine learning, this talk gives a beginner-level overview of how machine learning can be useful, important machine learning concepts, and how to implement them with scikit-learn. We’ll use real world data to look at supervised and unsupervised machine learning algorithms and why scikit-learn is useful for performing these tasks.
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiSri Ambati
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/vUqC8UPw9SU
Description:
The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes! This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models: *Model visualizations including decision tree surrogate models, individual conditional expectation (ICE) plots, partial dependence plots, and residual analysis. *Reason code generation techniques like LIME, Shapley explanations, and Treeinterpreter. *Sensitivity Analysis. Plenty of guidance on when, and when not, to use these techniques will also be shared, and the talk will conclude by providing guidelines for testing generated explanations themselves for accuracy and stability. Open source examples (with lots of comments and helpful hints) for building interpretable machine learning systems are available to accompany the talk at: https://github.com/jphall663/interpretable_machine_learning_with_python Bio: Patrick Hall is senior director for data science products at H2O.ai where he focuses mainly on model interpretability. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning. Prior to joining H2O.ai, Patrick held global customer facing roles and research and development roles at SAS Institute.
Speaker's Bio:
Patrick Hall is a senior director for data science products at H2o.ai where he focuses mainly on model interpretability. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning. Prior to joining H2o.ai, Patrick held global customer facing roles and R & D research roles at SAS Institute. He holds multiple patents in automated market segmentation using clustering and deep neural networks. Patrick was the 11th person worldwide to become a Cloudera certified data scientist. He studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University.
An Interactive Visual Analytics Dashboard for the Employment Situation ReportBenjamin Bengfort
The Employment Situation Report is a monthly news release by the Bureau of Labor Statistics which describes the results of the Current Population Survey. Its release is widely anticipated by economists, journalists, and politicians as it is used to forecast the economic condition of the United States by describing ongoing trends and has a broad impact on public and corporate economic confidence leading directly to investment decisions. The report itself is in a PDF format that is comprised primarily of text and tabular information. Quickly and correctly interpreting the results of the jobs report is vital for quality reporting and decision making, but the report is more suited for longer study than deriving insights. In this project we explore the use of an interactive dashboard for visual analytics upon the released BLS data. Using an application demonstration and a usability study we will show that visually interacting with the most current employment data, users are able to rapidly achieve rich insights similar to those reported on in the text of the Employment Situation Report.
This presentation about Scikit-learn will help you understand what is Scikit-learn, what can we achieve using Scikit-learn and a demo on how to use Scikit-learn in Python. Scikit is a powerful and modern machine learning python library. It's a great tool for fully and semi-automated advanced data analysis and information extraction. There are a lot of reasons why Scikit-Learn is a preferred machine learning tool. It has efficient tools to identify and organize problems, such as whether it fits a supervised or unsupervised learning model. It contains many free and open data sets. It has a rich set of built-in libraries for learning and predicting. It provides model support for every problem type. It also has built-in functions such as pickle for model persistence. It is supported by a huge open source community and vendor base. Now, let us get started and understand Sciki-Learn in detail.
Below topics are explained in this Scikit-Learn presentation:
1. What is Scikit-learn?
2. What we can achieve using Scikit-learn
3. Demo
Simplilearn’s Python Training Course is an all-inclusive program that will introduce you to the Python development language and expose you to the essentials of object-oriented programming, web development with Django and game development. Python has surpassed Java as the top language used to introduce U.S. students to programming and computer science. This course will give you hands-on development experience and prepare you for a career as a professional Python programmer.
What is this course about?
The All-in-One Python course enables you to become a professional Python programmer. Any aspiring programmer can learn Python from the basics and go on to master web development & game development in Python. Gain hands-on experience creating a flappy bird game clone & website functionalities in Python.
What are the course objectives?
By the end of this online Python training course, you will be able to:
1. Internalize the concepts & constructs of Python
2. Learn to create your own Python programs
3. Master Python Django & advanced web development in Python
4. Master PyGame & game development in Python
5. Create a flappy bird game clone
The Python training course is recommended for:
1. Any aspiring programmer can take up this bundle to master Python
2. Any aspiring web developer or game developer can take up this bundle to meet their training needs
Learn more at https://www.simplilearn.com/mobile-and-software-development/python-development-training
Interpreting deep learning and machine learning models is not just another regulatory burden to be overcome. Scientists, physicians, researchers, and analyst that use these technologies for their important work have the right to trust and understand their models and the answers they generate. This talk is an overview of several techniques for interpreting deep learning and machine learning models and telling stories from their results.
Speaker: Patrick Hall is a Data Scientist and Product Engineer at H2O.ai. He’s also an Adjunct Professor at George Washington University for the Department of Decision Sciences. Prior to joining H2O, Patrick spent many years as a Senior Data Scientist SAS and has worked with many Fortune 500 companies on their data science and machine learning problems. https://www.linkedin.com/in/jpatrickhall
A Beginner's Guide to Machine Learning with Scikit-LearnSarah Guido
Given at the PyData NYC 2013 conference (http://vimeo.com/79517341), and will be given at PyTennessee 2014.
Scikit-learn is one of the most well-known machine learning Python modules in existence. But how does it work, and what, for that matter, is machine learning? For those with programming experience but who are new to machine learning, this talk gives a beginner-level overview of how machine learning can be useful, important machine learning concepts, and how to implement them with scikit-learn. We’ll use real world data to look at supervised and unsupervised machine learning algorithms and why scikit-learn is useful for performing these tasks.
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiSri Ambati
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/vUqC8UPw9SU
Description:
The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes! This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models: *Model visualizations including decision tree surrogate models, individual conditional expectation (ICE) plots, partial dependence plots, and residual analysis. *Reason code generation techniques like LIME, Shapley explanations, and Treeinterpreter. *Sensitivity Analysis. Plenty of guidance on when, and when not, to use these techniques will also be shared, and the talk will conclude by providing guidelines for testing generated explanations themselves for accuracy and stability. Open source examples (with lots of comments and helpful hints) for building interpretable machine learning systems are available to accompany the talk at: https://github.com/jphall663/interpretable_machine_learning_with_python Bio: Patrick Hall is senior director for data science products at H2O.ai where he focuses mainly on model interpretability. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning. Prior to joining H2O.ai, Patrick held global customer facing roles and research and development roles at SAS Institute.
Speaker's Bio:
Patrick Hall is a senior director for data science products at H2o.ai where he focuses mainly on model interpretability. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning. Prior to joining H2o.ai, Patrick held global customer facing roles and R & D research roles at SAS Institute. He holds multiple patents in automated market segmentation using clustering and deep neural networks. Patrick was the 11th person worldwide to become a Cloudera certified data scientist. He studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University.
"Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand."
In this talk we will discuss how QuSandbox and the Model Analytics Studio can be used in the selection of machine learning models. We will also illustrate AutoML frameworks through demos and examples and show you how to get started
A presentation covers how data science is connected to build effective machine learning solutions. How to build end to end solutions in Azure ML. How to build, model, and evaluate algorithms in Azure ML.
Artificial Intelligence for Automating Data AnalysisManuel Martín
The requirements for analysing big volumes of data have increased over the last few decades. The process of selecting, cleaning, modelling and interpreting data is called the KDD process. The decision of how to approach each step in this process has often been made manually by experts. However, experts cannot be aware of all methods, nor is it feasible to try all of them. Researchers have proposed different approaches for automating, or at least advising, the stages of the KDD process. This talk will outline the different types of Intelligent Discovery Assistants as described in the work of Serban et al. “A survey of intelligent assistants for data analysis” and point out some future directions.
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...Athens Big Data
Title: An Introduction to Machine Learning with Python and Scikit-Learn
Speaker: Julien Simon (https://linkedin.com/in/juliensimon/)
Date: Thursday, March 14, 2019
Event: https://www.meetup.com/Athens-Big-Data/events/259091496/
In this talk, we will present an overview of Azure Machine Learning, a fully managed cloud service that enables you to easily build, deploy, and share predictive analytics solutions. We will start with the basics of machine learning and end with a demo that uses real world data.
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEDatabricks
For companies that solve real-world problems and generate revenue from the data science products, being able to understand why a model makes a certain prediction can be as crucial as achieving high prediction accuracy in many applications. However, as data scientists pursuing higher accuracy by implementing complex algorithms such as ensemble or deep learning models, the algorithm itself becomes a blackbox and it creates the trade-off between accuracy and interpretability of a model’s output.
To address this problem, a unified framework SHAP (SHapley Additive exPlanations) was developed to help users interpret the predictions of complex models. In this session, we will talk about how to apply SHAP to various modeling approaches (GLM, XGBoost, CNN) to explain how each feature contributes and extract intuitive insights from a particular prediction. This talk is intended to introduce the concept of general purpose model explainer, as well as help practitioners understand SHAP and its applications.
The key challenge in making AI technology more accessible to the broader community is the scarcity of AI experts. Most businesses simply don’t have the much needed resources or skills for modeling and engineering. This is why automated machine learning and deep learning technologies (AutoML and AutoDL) are increasingly valued by academics and industry. The core of AI is the model design. Automated machine learning technology reduces the barriers to AI application, enabling developers with no AI expertise to independently and easily develop and deploy AI models. Automated machine learning is expected to completely overturn the AI industry in the next few years, making AI ubiquitous.
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Sri Ambati
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/xc3j20Om3UM
Description:
Data science is indeed one of the sexy jobs of the 21st century. But it is also a lot of hard work. And the hard work is seldom about the math or the algorithms. It is about building relevant machine learning products for the real world. We will go over some of the must-haves as you take your machine learning model out of the sandbox and make it work in the big, bad world outside.
Speaker's Bio:
Krish Swamy is an experienced professional with deep skills in applying analytics and BigData capabilities to challenging business problems and driving customer insights. Krish's analytic experience includes marketing and pricing, credit risk, digital analytics and most recently, big data analytics and data transformation. His key experiences lie in banking and financial services, the digital customer experience domain, with a background in management consulting. Other key skills include influencing organizational change towards a data and analytics driven culture, and building teams of analysts, statisticians and data scientists.
Creating ML models is just the starting of a long journey. In this presentation which was given as a talk on e2e AI talks, I talk about the various challenges in the machine learning life cycle
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit
Building a machine learning model is an iterative process. A data scientist will build many tens to hundreds of models before arriving at one that meets some acceptance criteria. However, the current style of model building is ad-hoc and there is no practical way for a data scientist to manage models that are built over time. In addition, there are no means to run complex queries on models and related data.
In this talk, we present ModelDB, a novel end-to-end system for managing machine learning (ML) models. Using client libraries, ModelDB automatically tracks and versions ML models in their native environments (e.g. spark.ml, scikit-learn). A common set of abstractions enable ModelDB to capture models and pipelines built across different languages and environments. The structured representation of models and metadata then provides a platform for users to issue complex queries across various modeling artifacts. Our rich web frontend provides a way to query ModelDB at varying levels of granularity.
ModelDB has been open-sourced at https://github.com/mitdbg/modeldb.
Meetup sthlm - introduction to Machine Learning with demo casesZenodia Charpy
Data science and Machine Learning
Machine Learning vs Artificial Intelligence
Machine Learning Algorithms
How to choose ML algorithm mindmap
Supervised Learning generic flow
Unsupervised Learning generic flow
Example cases for supervised and unsupervised learning
Building a performing Machine Learning model from A to ZCharles Vestur
A 1-hour read to become highly knowledgeable about Machine learning and the machinery underneath, from scratch!
A presentation introducing to all fundamental concepts of Machine Learning step by step, following a classical approach to build a performing model. Simple examples and illustrations are used all along the presentation to make the concepts easier to grasp.
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina StoyLazarinaStoyanova
In this webinar, I will go through the benefits and limitations of Data Studio, tips and tricks for turning spreadsheets into cool reports, and share some hot dashboard templates
"Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand."
In this talk we will discuss how QuSandbox and the Model Analytics Studio can be used in the selection of machine learning models. We will also illustrate AutoML frameworks through demos and examples and show you how to get started
A presentation covers how data science is connected to build effective machine learning solutions. How to build end to end solutions in Azure ML. How to build, model, and evaluate algorithms in Azure ML.
Artificial Intelligence for Automating Data AnalysisManuel Martín
The requirements for analysing big volumes of data have increased over the last few decades. The process of selecting, cleaning, modelling and interpreting data is called the KDD process. The decision of how to approach each step in this process has often been made manually by experts. However, experts cannot be aware of all methods, nor is it feasible to try all of them. Researchers have proposed different approaches for automating, or at least advising, the stages of the KDD process. This talk will outline the different types of Intelligent Discovery Assistants as described in the work of Serban et al. “A survey of intelligent assistants for data analysis” and point out some future directions.
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...Athens Big Data
Title: An Introduction to Machine Learning with Python and Scikit-Learn
Speaker: Julien Simon (https://linkedin.com/in/juliensimon/)
Date: Thursday, March 14, 2019
Event: https://www.meetup.com/Athens-Big-Data/events/259091496/
In this talk, we will present an overview of Azure Machine Learning, a fully managed cloud service that enables you to easily build, deploy, and share predictive analytics solutions. We will start with the basics of machine learning and end with a demo that uses real world data.
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEDatabricks
For companies that solve real-world problems and generate revenue from the data science products, being able to understand why a model makes a certain prediction can be as crucial as achieving high prediction accuracy in many applications. However, as data scientists pursuing higher accuracy by implementing complex algorithms such as ensemble or deep learning models, the algorithm itself becomes a blackbox and it creates the trade-off between accuracy and interpretability of a model’s output.
To address this problem, a unified framework SHAP (SHapley Additive exPlanations) was developed to help users interpret the predictions of complex models. In this session, we will talk about how to apply SHAP to various modeling approaches (GLM, XGBoost, CNN) to explain how each feature contributes and extract intuitive insights from a particular prediction. This talk is intended to introduce the concept of general purpose model explainer, as well as help practitioners understand SHAP and its applications.
The key challenge in making AI technology more accessible to the broader community is the scarcity of AI experts. Most businesses simply don’t have the much needed resources or skills for modeling and engineering. This is why automated machine learning and deep learning technologies (AutoML and AutoDL) are increasingly valued by academics and industry. The core of AI is the model design. Automated machine learning technology reduces the barriers to AI application, enabling developers with no AI expertise to independently and easily develop and deploy AI models. Automated machine learning is expected to completely overturn the AI industry in the next few years, making AI ubiquitous.
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Sri Ambati
This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/xc3j20Om3UM
Description:
Data science is indeed one of the sexy jobs of the 21st century. But it is also a lot of hard work. And the hard work is seldom about the math or the algorithms. It is about building relevant machine learning products for the real world. We will go over some of the must-haves as you take your machine learning model out of the sandbox and make it work in the big, bad world outside.
Speaker's Bio:
Krish Swamy is an experienced professional with deep skills in applying analytics and BigData capabilities to challenging business problems and driving customer insights. Krish's analytic experience includes marketing and pricing, credit risk, digital analytics and most recently, big data analytics and data transformation. His key experiences lie in banking and financial services, the digital customer experience domain, with a background in management consulting. Other key skills include influencing organizational change towards a data and analytics driven culture, and building teams of analysts, statisticians and data scientists.
Creating ML models is just the starting of a long journey. In this presentation which was given as a talk on e2e AI talks, I talk about the various challenges in the machine learning life cycle
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit
Building a machine learning model is an iterative process. A data scientist will build many tens to hundreds of models before arriving at one that meets some acceptance criteria. However, the current style of model building is ad-hoc and there is no practical way for a data scientist to manage models that are built over time. In addition, there are no means to run complex queries on models and related data.
In this talk, we present ModelDB, a novel end-to-end system for managing machine learning (ML) models. Using client libraries, ModelDB automatically tracks and versions ML models in their native environments (e.g. spark.ml, scikit-learn). A common set of abstractions enable ModelDB to capture models and pipelines built across different languages and environments. The structured representation of models and metadata then provides a platform for users to issue complex queries across various modeling artifacts. Our rich web frontend provides a way to query ModelDB at varying levels of granularity.
ModelDB has been open-sourced at https://github.com/mitdbg/modeldb.
Meetup sthlm - introduction to Machine Learning with demo casesZenodia Charpy
Data science and Machine Learning
Machine Learning vs Artificial Intelligence
Machine Learning Algorithms
How to choose ML algorithm mindmap
Supervised Learning generic flow
Unsupervised Learning generic flow
Example cases for supervised and unsupervised learning
Building a performing Machine Learning model from A to ZCharles Vestur
A 1-hour read to become highly knowledgeable about Machine learning and the machinery underneath, from scratch!
A presentation introducing to all fundamental concepts of Machine Learning step by step, following a classical approach to build a performing model. Simple examples and illustrations are used all along the presentation to make the concepts easier to grasp.
Data Studio for SEOs: Reporting Automation Tips - Weekly SEO with Lazarina StoyLazarinaStoyanova
In this webinar, I will go through the benefits and limitations of Data Studio, tips and tricks for turning spreadsheets into cool reports, and share some hot dashboard templates
Building machine learning muscle in your team & transitioning to make them do machine learning at scale. We also discuss about Spark & other relevant technologies.
This talk was held at the 11th meeting on April 7 2014 by Karolina Alexiou.
Analysis of big data is useless (and a lot harder to sell) when you can't measure whether the resulting insights are correct. In order to develop sophisticated data analysis methodologies tailored to your particular use-case, you need to be able to figure out what works and what doesn't. It is crucial to gather data independently to your analysis (ground truth) and compare it to your results using the correct metrics and account for biases. The sheer volume of data means that you also need to have a strategy for slicing and dicing the data to isolate the really valuable parts, and also, a keen eye for visualization so that you can quickly compare methodologies and support the validity of your insights to third parties.
Data is growing exponentially. What should business managers do to make better business decisions? I explain three key things step by step. Just start today!
BDT has moved from SAS-based workflow a cloud-based workflow leveraging tools like BigQuery, Looker, and Apache Airflow. Originally presented at the 2018 Pennsylvania Data Users Conference: https://pasdcconference.org/
A summary of the philosophy and approach taken by the TravelBird Data Science team (and company as a whole) that allows rapid development of new machine learning algorithms, data insights, and integration into production and operations.
Overview of a Machine Learning 11 week course I developed and trained software engineers at Dell on their way to become Data Scientists. Class is outline of Predictive Analytics methods using Python. I taught this class 8 separate occasions over 3 years.
The talk is on How to become a data scientist. This was at 2ns Annual event of Pune Developer's Community. It focuses on Skill Set required to become data scientist. And also based on who you are what you can be.
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...Bhakthi Liyanage
Windows Azure Machine Learning and Data Analytics platform offers a streamlined experience, from setting up with only a web browser to using drag-and-drop gestures and simple data-flow graphs to set up experiments. Azure Machine Learning Studio features a library of time-saving sample experiments, R and Python packages, and best-in-class algorithms from Microsoft businesses like Xbox and Bing. Learn how the Azure Machine Learning service in the cloud lets you easily build, deploy, and share advanced analytics solutions into your SharePoint platform. Attendees will also gain knowledge on special considerations that should be taken in to account when creating analytical models. The demo will walk you through creating an analytic model in Azure ML studio and consume the model within SharePoint online.
Lambda architecture for real time big dataTrieu Nguyen
Lambda Architecture in Real-time Big Data Project
Concepts & Techniques “Thinking with Lambda”
Case study in some real projects
Why lambda architecture is correct solution for big data?
This deck is meant to help compliance officers and technical implementors at Virtual Asset Service Providers (VASPs) implement TRISA for travel rule compliance.
The goal of TRISA is to enable compliance with the FATF and FinCEN Travel Rules, as well as Travel Rules implemented by equivalent authorities, without: modifying the core blockchain protocols, incurring increased transaction costs, or modifying virtual currency peer-to-peer transaction flows.
Visual diagnostics for more effective machine learningBenjamin Bengfort
The model selection process is a search for the best combination of features, algorithm, and hyperparameters that maximize F1, R2, or silhouette scores after cross-validation. This view of machine learning often leads us toward automated processes such as grid searches and random walks. Although this approach allows us to try many combinations, we are often left wondering if we have actually succeeded.
By enhancing model selection with visual diagnostics, data scientists can inject human guidance to steer the search process. Visualizing feature transformations, algorithmic behavior, cross-validation methods, and model performance allows us a peek into the high dimensional realm that our models operate. As we continue to tune our models, trying to minimize both bias and variance, these glimpses allow us to be more strategic in our choices. The result is more effective modeling, speedier results, and greater understanding of underlying processes.
Visualization is an integral part of the data science workflow, but visual diagnostics are directly tied to machine learning transformers and models. The Yellowbrick library extends the scikit-learn API providing a Visualizer object, an estimator that learns from data and produces a visualization as a result. In this talk, we will explore feature visualizers, visualizers for classification, clustering, and regression, as well as model analysis visualizers. We'll work through several examples and show how visual diagnostics steer model selection, making machine learning more effective.
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
This is an overview of the goals and roadmap for the Yellowbrick model visualization library (www.scikit-yb.org). If you're interested in contributing to Yellowbrick or writing visualizers, this is a good place to get started.
In the presentation we discuss the expected workflow of data scientists interacting with the model selection triple and Scikit-Learn. We describe the Yellowbrick API and it's relationship to the Scikit-Learn API. We introduce our primary object: the Visualizer, an estimator that learns from data and displays it visually. Finally we describe the requirements for developing for Yellowbrick, the tools and utilities in place and how to get started.
Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the Scikit-Learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines Scikit-Learn with Matplotlib in the best tradition of the Scikit-Learn documentation, but to produce visualizations for your models!
This presentation was given during the opening session of the 2017 Spring DDL Research Labs.
Network analyses are powerful methods for both visual analytics and machine learning but can suffer as their complexity increases. By embedding time as a structural element rather than a property, we will explore how time series and interactive analysis can be improved on Graph structures. Primarily we will look at decomposition in NLP-extracted concept graphs using NetworkX and Graph Tool.
Machine learning is the hacker art of describing the features of instances that we want to make predictions about, then fitting the data that describes those instances to a model form. Applied machine learning has come a long way from it's beginnings in academia, and with tools like Scikit-Learn, it's easier than ever to generate operational models for a wide variety of applications. Thanks to the ease and variety of the tools in Scikit-Learn, the primary job of the data scientist is model selection. Model selection involves performing feature engineering, hyperparameter tuning, and algorithm selection. These dimensions of machine learning often lead computer scientists towards automatic model selection via optimization (maximization) of a model's evaluation metric. However, the search space is large, and grid search approaches to machine learning can easily lead to failure and frustration. Human intuition is still essential to machine learning, and visual analysis in concert with automatic methods can allow data scientists to steer model selection towards better fitted models, faster. In this talk, we will discuss interactive visual methods for better understanding, steering, and tuning machine learning models.
Data products derive their value from data and generate new data in return; as a result, machine learning techniques must be applied to their architecture and their development. Machine learning fits models to make predictions on unknown inputs and must be generalizable and adaptable. As such, fitted models cannot exist in isolation; they must be operationalized and user facing so that applications can benefit from the new data, respond to it, and feed it back into the data product. Data product architectures are therefore life cycles and understanding the data product lifecycle will enable architects to develop robust, failure free workflows and applications. In this talk we will discuss the data product life cycle, explore how to engage a model build, evaluation, and selection phase with an operation and interaction phase. Following the lambda architecture, we will investigate wrapping a central computational store for speed and querying, as well as incorporating a discussion of monitoring, management, and data exploration for hypothesis driven development. From web applications to big data appliances; this architecture serves as a blueprint for handling data services of all sizes!
Entity Resolution is the task of disambiguating manifestations of real world entities through linking and grouping and is often an essential part of the data wrangling process. There are three primary tasks involved in entity resolution: deduplication, record linkage, and canonicalization; each of which serve to improve data quality by reducing irrelevant or repeated data, joining information from disparate records, and providing a single source of information to perform analytics upon. However, due to data quality issues (misspellings or incorrect data), schema variations in different sources, or simply different representations, entity resolution is not a straightforward process and most ER techniques utilize machine learning and other stochastic approaches.
Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets.
The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.
In this one day workshop, we will introduce Spark at a high level context. Spark is fundamentally different than writing MapReduce jobs so no prior Hadoop experience is needed. You will learn how to interact with Spark on the command line and conduct rapid in-memory data analyses. We will then work on writing Spark applications to perform large cluster-based analyses including SQL-like aggregations, machine learning applications, and graph algorithms. The course will be conducted in Python using PySpark.
Using only simple rules for local interactions, groups of agents can form self-organizing super-organisms or “flocks” that show global emergent behavior. When agents are also extended with memory and goals the resulting flock not only demonstrates emergent behavior, but also collective intelligence: the ability for the group to solve problems that might be beyond the ability of the individual alone. Until now, research has focused on the improvement of particle design for global behavior; however, techniques for human-designed particles are task-specific. In this paper we will demonstrate that evolutionary computing techniques can be applied to design particles, not only to optimize the parameters for movement but also the structure of controlling finite state machines that enable collective intelligence. The evolved design not only exhibits emergent, self-organizing behavior but also significantly outperforms a human design in a specific problem domain. The strategy of the evolved design may be very different from what is intuitive to humans and perhaps reflects more accurately how nature designs systems for problem solving. Furthermore, evolutionary design of particles for collective intelligence is more flexible and able to target a wider array of problems either individually or as a whole.
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
Spanner is a globally distributed database that provides external consistency between data centers and stores data in a schema based semi-relational data structure. Not only that, Spanner provides a versioned view of the data that allows for instantaneous snapshot isolation across any segment of the data. This versioned isolation allows Spanner to provide globally consistent reads of the database at a particular time allowing for lock-free read-only transactions (and therefore no communications overhead for consensus during these types of reads). Spanner also provides externally consistent reads and writes with a timestamp-based linear execution of transactions and two phase commits. Spanner is the first distributed database that provides global sharding and replication with strong consistency semantics.
Social networks are not new, even though websites like Facebook and Twitter might make you want to believe they are; and trust me- I’m not talking about Myspace! Social networks are extremely interesting models for human behavior, whose study dates back to the early twentieth century. However, because of those websites, data scientists have access to much more data than the anthropologists who studied the networks of tribes!
Because networks take a relationship-centered view of the world, the data structures that we will analyze model real world behaviors and community. Through a suite of algorithms derived from mathematical Graph theory we are able to compute and predict behavior of individuals and communities through these types of analyses. Clearly this has a number of practical applications from recommendation to law enforcement to election prediction, and more.
Natural Language Processing (NLP) is often taught at the academic level from the perspective of computational linguists. However, as data scientists, we have a richer view of the world of natural language - unstructured data that by its very nature has important latent information for humans. NLP practitioners have benefitted from machine learning techniques to unlock meaning from large corpora, and in this class we’ll explore how to do that particularly with Python, the Natural Language Toolkit (NLTK), and to a lesser extent, the Gensim Library.
NLTK is an excellent library for machine learning-based NLP, written in Python by experts from both academia and industry. Python allows you to create rich data applications rapidly, iterating on hypotheses. Gensim provides vector-based topic modeling, which is currently absent in both NLTK and Scikit-Learn. The combination of Python + NLTK means that you can easily add language-aware data products to your larger analytical workflows and applications.
A presentation of Hermansky & Morgan's 1994 paper, RASTA Processing of Speech. Learn the dramatic effect of RASTA on critical band analysis when combined with PLP to do speech detection!
Hermansky, Hynek, and Nelson Morgan. "RASTA processing of speech." Speech and Audio Processing, IEEE Transactions on 2.4 (1994): 578-589.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
2. Links to various resources
Introduction to Python
http://bit.ly/1gJ73Tt
Github Repository
http://bit.ly/1eLBzki
3. About the Instructor
Benjamin Bengfort
Data Science:
● MS Computer Science from North Dakota State
● PhD Candidate in CS at the University of Maryland
● Data Scientist at Cobrain Company in Bethesda, MD
● Board member of Data Community DC
● Lecturer at Georgetown University
Python Programmer:
● Python developer for 7 years
● Open source contributor
● My work on Github: https://github.com/bbengfort
4. About the Instructor
Benjamin Bengfort
I am available to collaborate and
answer questions for all of my
students.
Twitter: twitter.com/bbengfort
LinkedIn: linkedin.com/in/bbengfort
Github: github.com/bbengfort
Email: benjamin@bengfort.com
5. About the Teaching Assistant
Keshav Magge
● MS Computer Science from University of Houston
● Lead Data/Software Engineer at Cobrain Company in
Bethesda, MD
Python Programmer:
● Python developer for 7 years
● Plone/Zope for 2 years, Django for 5 years
● My work on Github: https://github.com/keshavmagge
6. About the Teaching Assistant
Keshav Magge
Reach out to me to talk about all
things python/data or just about life
Twitter: twitter.com/keshavmagge
LinkedIn: linkedin.com/pub/keshav-magge/12/a2a/324/
Github: github.com/keshavmagge
Email: keshav@keshavmagge.com
8. Hilary Mason
A data product is a product that is
based on the combination of data
and algorithms.
”
“
9.
10. Mike Loukides
A data application acquires its value from the
data itself, and creates more data as a result.
It’s not just an application with data; it’s a
data product. Data science enables the
creation of data products.
”
“
14. Data Ingestion
● There is a world of data out
there- how to get it? Web
crawlers, APIs, Sensors? Python
and other web scripting
languages are custom made for
this task.
● The real question is how can we
deal with such a giant volume
and velocity of data?
● Big Data and Data Science often
require ingestion specialists!
15. ● Warehousing the data means
storing the data in as raw a form
as possible.
● Extract, transform, and load
operations move data to
operational storage locations.
● Filtering, aggregation,
normalization and
denormalization all ensure data is
in a form it can be computed on.
● Annotated training sets must be
created for ML tasks.
Data Wrangling
16. ● Hypothesis driven computation
includes design and development
of predictive models.
● Many models have to be trained
or constrained into a
computational form like a Graph
database, and this is time
consuming.
● Other data products like indices,
relations, classifications, and
clusters may be computed.
Computation and Analyses
17. Modeling and Application
This is the part we’re most familiar with.
Supervised classification, Unsupervised
clustering - Bayes, Logistic Regression,
Decision Trees, and other models.
This is also where the money is.
18. ● Often overlooked, this part is
crucial, even if we have data
products.
● Humans recognize patterns
better than machines. Human
feedback is crucial in Active
Learning and remodeling (error
detection).
● Mashups and collaborations
generate more data- and
therefore more value!
Reporting and Visualization
20. What we’re going to build today
SCIENCE BOOKCLUB!!
● A book club that chooses what to
read via a recommender system.
● Uses GoodReads data to ingest
and return feedback on books.
● Statistical model is a non-
negative matrix factorization
● Reporting using Jinja (almost a
web app)
21. Workflow
1. Setting up a Python skeleton
2. Creating and Running Tests
3. Wading in with a configuration
4. Ingestion with urllib and requests
5. Creating a command line admin with argparse
6. Wrangling with BeautifulSoup and SQLAlchemy
7. Modeling with numpy
8. Reporting with Jinja2
22. Octavo Architecture (really clear DSP)
requests.py
Ingestion
Module
Raw Data
Storage Computational
Data Storage
Wrangling
Module
BeautifulSou
p
SQLAlchemy
Recommender
Module
Numpy
Reporting
Module
Jinja2Matplotlib
23. requests.py
Octavo Architecture (really clear DSP)
requests.py
Ingestion
Module
Raw Data
Storage
Computational
Data Storage
Wrangling
Module
BeautifulSoup
SQLAlchemy
Recommender
Module
Numpy
Reporting
Module
Jinja2
Matplotlib
25. How to tackle this course ...
Lean into it- absorb as much as
possible, don’t worry about falling
behind - it will be in your head!
Then afterwards - lets all digest it
together (keep in touch)