Overview of the structured data science domain, OSS machine learning platforms and algorithms. Using the JPMML family of libraries to implement a unified, production-oriented workflow.
As the complexity of choosing optimised and task specific steps and ML models is often beyond non-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge. We call the resulting research area that targets progressive automation of machine learning AutoML.
Although it focuses on end users without expert knowledge, AutoML also offers new tools to machine learning experts, for example to:
1. Perform architecture search over deep representations
2. Analyse the importance of hyperparameters.
Artificial Intelligence Layer: Mahout, MLLib, and other projectsVictor Sanchez Anguix
This is part of an introductory course on Big Data tools for Artificial Intelligence. These final slides gather some of the most important AI layers in Big Data.
A tremendous backlog of predictive modeling problems in the industry and short supply of trained data scientists have spiked interest in automation over the last few years. A new academic field, AutoML, has emerged. However, there is a significant gap between the topics that are academically interesting and automation capabilities that are necessary to solve real-world industrial problems end-to-end. An even greater challenge is enabling a non-expert to build a robust and trustworthy AI solution for their company. In this talk, we’ll discuss what an industry-grade AutoML system consists of and the scientific and engineering challenges of building it.
Many state of the art machine learning applications today are based on artifical neural networks. In this talk we explore several commonly used neural network architectures. We identify the ideas behind their design, describe their topologies, outline their properties and discuss their use.
You might be enjoy this talk if you are interested in:
* Discovering some of the popular neural network types
* Learning about their design and how they work
* Understanding what are they are good for
Deep Learning and Automatic Differentiation from Theano to PyTorchinside-BigData.com
Inquisitive minds want to know what causes the universe to expand, how M-theory binds the smallest of the small particles or how social dynamics can lead to revolutions. In recent centuries, developments in science and technology brought us closer to explore the expanding universe, discover unknown particles like bosons or find out how and why a society interacts and reacts. To explain the fascinating phenomena of nature, Natural scientists develop complex 'mechanistic models' of deterministic or stochastic nature. But the hard question is how to choose the best model for our data or how to calibrate the model given the data.
The way that statisticians answer these questions is with Approximate Bayesian Computation (ABC), which we learn on the first day of the summer school and which we combine with High Performance Computing (HPC). The second day focuses on a popular machine learning approach 'Deep-learning' which mimics the deep neural network structure in our brain, in order to predict complex phenomena of nature. The summer school takes a route of open discussion and brainstorming sessions where we explore two cornerstones of today's data-science, ABC and Deep Learning being accelerated by HPC with hands on examples and exercises.
Watch the video: https://wp.me/p3RLHQ-hQQ
Learn more: https://github.com/probprog/CSCS-summer-school-2017
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Overview of the structured data science domain, OSS machine learning platforms and algorithms. Using the JPMML family of libraries to implement a unified, production-oriented workflow.
As the complexity of choosing optimised and task specific steps and ML models is often beyond non-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge. We call the resulting research area that targets progressive automation of machine learning AutoML.
Although it focuses on end users without expert knowledge, AutoML also offers new tools to machine learning experts, for example to:
1. Perform architecture search over deep representations
2. Analyse the importance of hyperparameters.
Artificial Intelligence Layer: Mahout, MLLib, and other projectsVictor Sanchez Anguix
This is part of an introductory course on Big Data tools for Artificial Intelligence. These final slides gather some of the most important AI layers in Big Data.
A tremendous backlog of predictive modeling problems in the industry and short supply of trained data scientists have spiked interest in automation over the last few years. A new academic field, AutoML, has emerged. However, there is a significant gap between the topics that are academically interesting and automation capabilities that are necessary to solve real-world industrial problems end-to-end. An even greater challenge is enabling a non-expert to build a robust and trustworthy AI solution for their company. In this talk, we’ll discuss what an industry-grade AutoML system consists of and the scientific and engineering challenges of building it.
Many state of the art machine learning applications today are based on artifical neural networks. In this talk we explore several commonly used neural network architectures. We identify the ideas behind their design, describe their topologies, outline their properties and discuss their use.
You might be enjoy this talk if you are interested in:
* Discovering some of the popular neural network types
* Learning about their design and how they work
* Understanding what are they are good for
Deep Learning and Automatic Differentiation from Theano to PyTorchinside-BigData.com
Inquisitive minds want to know what causes the universe to expand, how M-theory binds the smallest of the small particles or how social dynamics can lead to revolutions. In recent centuries, developments in science and technology brought us closer to explore the expanding universe, discover unknown particles like bosons or find out how and why a society interacts and reacts. To explain the fascinating phenomena of nature, Natural scientists develop complex 'mechanistic models' of deterministic or stochastic nature. But the hard question is how to choose the best model for our data or how to calibrate the model given the data.
The way that statisticians answer these questions is with Approximate Bayesian Computation (ABC), which we learn on the first day of the summer school and which we combine with High Performance Computing (HPC). The second day focuses on a popular machine learning approach 'Deep-learning' which mimics the deep neural network structure in our brain, in order to predict complex phenomena of nature. The summer school takes a route of open discussion and brainstorming sessions where we explore two cornerstones of today's data-science, ABC and Deep Learning being accelerated by HPC with hands on examples and exercises.
Watch the video: https://wp.me/p3RLHQ-hQQ
Learn more: https://github.com/probprog/CSCS-summer-school-2017
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
If we could only predict the future of the software industry, we could make better investments and decisions. We could waste less resources on technology and processes we know will not last, or at least be conscious in our decisions to choose solutions with a limited life time. It turns out that for data engineering, we can predict the future, because it has already happened. Not in our workplace, but at a few leading companies that are blazing ahead. It has also already happened in the neighbouring field of software engineering, which is two decades ahead of data engineering regarding process maturity. In this presentation, we will glimpse into the future of data engineering. Data engineering has gone from legacy data warehouses with stored procedures, to big data with Hadoop and data lakes, on to a new form of modern data warehouses and low code tools aka "the modern data stack". Where does it go from here? We will look at the points where data leaders differ from the crowd and combine with observations on how software engineering has evolved, to see that it points towards a new, more industrialised form of data engineering - "data factory engineering".
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
Dan Lynn (AgilData) & Patrick Russell (Craftsy) present on how to do data science in the real world. We discuss data cleansing, ETL, pipelines, hosting, and share several tools used in the industry.
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesScyllaDB
In this talk, Pascal Desmarets, CEO and Founder of Hackolade discusses the foundations of NoSQL data modeling. He highlights:
- Why is data modeling a key success factor?
- “Sweet spot” use cases where NoSQL shines the most
- Basic principles of Data Modeling for ScyllaDB
How to Deliver Machine Learning To Production - Zdenek LetkoMatěj Jakimov
Video from presentation: https://youtu.be/pDi-npLE3Zo
Wandera, the company recognized by both Gartner and IDC in their Q3/2017 reports on Mobile Threat Defense, offers organizations a solution for Enterprise Mobile Security and Data Management.In this talk, we will introduce the backend, which powers the Wandera service with emphasis on data flow, analysis and processing. Then we will discuss how our Machine Learning models are trained, evaluated and are then verified by humans via QA before being deployed into a production environment. The final part of the presentation will be devoted to lessons learned.
Speaker: Zdenek Letko, Software Engineer at Wandera, https://www.linkedin.com/in/zdenek-letko-26064848/
In data science, the scientific part is often forgotten - workflows, tools, and practices that are popular tend to yield experiments that cannot be repeated. Experiments are not reliable cannot tell us whether changes improve products or not. What works fine during initial development is inadequate for sustainable development of machine learning products. In this presentation, you will learn:
- Why reproducibility matters for data science.
- The practices and workflows that cause reproducibility problems.
- How to build technical environments and processes that enable reproducibility and iterative development of machine learning products.
Lessons learned from designing a QA Automation for analytics databases (big d...Omid Vahdaty
Have a big data product / database / DBMS? need to test it? don't know where to start? some things to consider while you design your Automation QA.
Link to Video
https://www.youtube.com/watch?v=MlT4pP7BGFQ
Can we use data to train Machine Learning models, perform statistical analysis, yet without putting private data on risk? There are tools and techniques such as Federated Learning, Differential Privacy or Homomorphic Encryption enabling safer work on the data.
Scalameta is a library for static analysis and processing of Scala source code, which supports syntactic and semantic analysis. In this presentation, we explain how Scalameta works, and how you can use Scalameta for custom code analysis. We demonstrate how we have used scalameta to automate schema management and privacy protection.
This talk provides an engineering perspective on privacy protection. The intended audience is architects, developers, data scientists, and engineering managers that build applications handling user data. We highlight topics that require attention at an early design stage, and go through pitfalls and potentially expensive architectural mistakes. We describe a number of technical patterns for complying with privacy regulations without sacrificing the ability to use data for product features. The content of the talk is based on real world experience from handling privacy protection in large scale data processing environments.
DataOps requires a cultural shift that brings the principles of lean manufacturing and DevOps to data analytics. It breaks down silos between developers, data scientists, and operators, resulting in rapid cycle times and low error rates.
At Spotify in 2013, the concept of DataOps did not exist but the Swedish company needed a way to align the people, processes, and technologies of the data organization to accelerate the development of high-quality analytics. The result was a Swedish-style DataOps, influenced by Scandinavian culture and agile principles, that enabled the company to become a true data-driven leader.
Netflix success is credited to pioneering ways that the company introduced AI and ML into its products, services and infrastructure. ML learning is applied to solve a wide range of problems at Netflix.
infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...Infoshare
Adam Karwan (Groupon) "Jak wykorzystać uczenie maszynowe (Machine Learning) w celu usprawnienia procesów biznesowych organizacji. Automatyczna kategoryzacja tekstu (Deep Learning vs Standard NLP), chatboty, robonomika" to prezentacja wygłoszona podczas meetupu infoShare AI Roadshow w Katowicach, dnia 15.11.2018
If we could only predict the future of the software industry, we could make better investments and decisions. We could waste less resources on technology and processes we know will not last, or at least be conscious in our decisions to choose solutions with a limited life time. It turns out that for data engineering, we can predict the future, because it has already happened. Not in our workplace, but at a few leading companies that are blazing ahead. It has also already happened in the neighbouring field of software engineering, which is two decades ahead of data engineering regarding process maturity. In this presentation, we will glimpse into the future of data engineering. Data engineering has gone from legacy data warehouses with stored procedures, to big data with Hadoop and data lakes, on to a new form of modern data warehouses and low code tools aka "the modern data stack". Where does it go from here? We will look at the points where data leaders differ from the crowd and combine with observations on how software engineering has evolved, to see that it points towards a new, more industrialised form of data engineering - "data factory engineering".
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
Dan Lynn (AgilData) & Patrick Russell (Craftsy) present on how to do data science in the real world. We discuss data cleansing, ETL, pipelines, hosting, and share several tools used in the industry.
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesScyllaDB
In this talk, Pascal Desmarets, CEO and Founder of Hackolade discusses the foundations of NoSQL data modeling. He highlights:
- Why is data modeling a key success factor?
- “Sweet spot” use cases where NoSQL shines the most
- Basic principles of Data Modeling for ScyllaDB
How to Deliver Machine Learning To Production - Zdenek LetkoMatěj Jakimov
Video from presentation: https://youtu.be/pDi-npLE3Zo
Wandera, the company recognized by both Gartner and IDC in their Q3/2017 reports on Mobile Threat Defense, offers organizations a solution for Enterprise Mobile Security and Data Management.In this talk, we will introduce the backend, which powers the Wandera service with emphasis on data flow, analysis and processing. Then we will discuss how our Machine Learning models are trained, evaluated and are then verified by humans via QA before being deployed into a production environment. The final part of the presentation will be devoted to lessons learned.
Speaker: Zdenek Letko, Software Engineer at Wandera, https://www.linkedin.com/in/zdenek-letko-26064848/
In data science, the scientific part is often forgotten - workflows, tools, and practices that are popular tend to yield experiments that cannot be repeated. Experiments are not reliable cannot tell us whether changes improve products or not. What works fine during initial development is inadequate for sustainable development of machine learning products. In this presentation, you will learn:
- Why reproducibility matters for data science.
- The practices and workflows that cause reproducibility problems.
- How to build technical environments and processes that enable reproducibility and iterative development of machine learning products.
Lessons learned from designing a QA Automation for analytics databases (big d...Omid Vahdaty
Have a big data product / database / DBMS? need to test it? don't know where to start? some things to consider while you design your Automation QA.
Link to Video
https://www.youtube.com/watch?v=MlT4pP7BGFQ
Can we use data to train Machine Learning models, perform statistical analysis, yet without putting private data on risk? There are tools and techniques such as Federated Learning, Differential Privacy or Homomorphic Encryption enabling safer work on the data.
Scalameta is a library for static analysis and processing of Scala source code, which supports syntactic and semantic analysis. In this presentation, we explain how Scalameta works, and how you can use Scalameta for custom code analysis. We demonstrate how we have used scalameta to automate schema management and privacy protection.
This talk provides an engineering perspective on privacy protection. The intended audience is architects, developers, data scientists, and engineering managers that build applications handling user data. We highlight topics that require attention at an early design stage, and go through pitfalls and potentially expensive architectural mistakes. We describe a number of technical patterns for complying with privacy regulations without sacrificing the ability to use data for product features. The content of the talk is based on real world experience from handling privacy protection in large scale data processing environments.
DataOps requires a cultural shift that brings the principles of lean manufacturing and DevOps to data analytics. It breaks down silos between developers, data scientists, and operators, resulting in rapid cycle times and low error rates.
At Spotify in 2013, the concept of DataOps did not exist but the Swedish company needed a way to align the people, processes, and technologies of the data organization to accelerate the development of high-quality analytics. The result was a Swedish-style DataOps, influenced by Scandinavian culture and agile principles, that enabled the company to become a true data-driven leader.
Netflix success is credited to pioneering ways that the company introduced AI and ML into its products, services and infrastructure. ML learning is applied to solve a wide range of problems at Netflix.
infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...Infoshare
Adam Karwan (Groupon) "Jak wykorzystać uczenie maszynowe (Machine Learning) w celu usprawnienia procesów biznesowych organizacji. Automatyczna kategoryzacja tekstu (Deep Learning vs Standard NLP), chatboty, robonomika" to prezentacja wygłoszona podczas meetupu infoShare AI Roadshow w Katowicach, dnia 15.11.2018
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
5. DevOps
● Business implementation
● Single new data records in real-time mode, multiple new
data records in streaming and batch modes
● Ease-of-use and robustness typically preferable to raw
performance
8. Rules
● "Black box" (aka "Artificial Intelligence") models:
○ Deep and wide neural networks
● "Grey box" (aka "Machine Learning") models:
○ Shallow and narrow neural networks
○ Ensemble models
● "White box" (aka "Statistical") models:
○ Linear and logistic regression
○ Decision tree
9. Predictive Model Markup Language (PMML)
● A DMG.org effort since 1999
● Representing rules (functions) in terms of standardized
data structures:
○ "Conventions over configuration"
○ Backward- and forward-compatibility
○ Vendor extensions
● Platform-, language- and framework-agnostic
● Human- and machine readable, executable, writable
10. JPMML software stack (1/2)
● "The (Java-) API is the product"
○ PMML producer (aka model conversion) vertical
○ PMML consumer (aka model scoring) vertical
○ ML-framework vs. PMML integration testing suite
● Layered library approach
○ Lower layer(s): BSD 3-Clause License
○ Higher layers: Affero GPLv3