The talk addresses the consequences of transforming the target variable on a conceptual but also a mathematical level. Still, the emphasis is on conveying the notion behind the interplay of your chosen error measure and the transformation of your target variable, so that you get some practical gain from it. Thus, everything will also be demonstrated on some use-case using a Jupyter notebook.
The name WALD-stack stems from the four technologies it is composed of, i.e. a cloud-computing Warehouse like Snowflake or Google BigQuery, the open-source data integration engine Airbyte, the open-source full-stack BI platform Lightdash, and the open-source data transformation tool DBT.
Using a Formula 1 Grand Prix dataset, I will give an overview of how these four tools complement each other perfectly for analytics tasks in an ELT approach. You will learn the specific uses of each tool as well as their particular features. My talk is based on a full tutorial, which you can find under https://waldstack.org.
Forget about AI and do Mathematical Modelling instead!Florian Wilhelm
The term artificial intelligence is ubiquitous these days. It is being used or is supposed to be used, in almost all areas to automate processes or help us make decisions. AI that teaches robots to do backflips, informs passengers about perfect routes, and determines the perfect order quantity for retailers. But what is actually behind it?
In this talk, I would like to take a look behind the scenes of AI and show you what actually happens to solve such exciting use cases intelligently. Using some practical examples from my project work in the field of data science, I will show you the importance of mathematical modelling and understanding the data to solve the use case. The more you understand about it, the less it is artificial intelligence.
Slides from Ideation workshop given to Melbourne Accelerator Programme held at the University of Melbourne on 11 April 2013.
See www.getviable.com for more.
What is Strategy - Thinking like a StrategistAmit Kapoor
What is Strategy? Strategy is a very young concept. Lets explore a little more about strategy and then go down the journey of understanding how to think like a strategist.
Presentation on how industry disruption occurs, the growth of ideas, and creating an entrepreneurial mindset within the workforce. Burcham shares his "Idea Frame" as well as his "Personal Progression Map"
The name WALD-stack stems from the four technologies it is composed of, i.e. a cloud-computing Warehouse like Snowflake or Google BigQuery, the open-source data integration engine Airbyte, the open-source full-stack BI platform Lightdash, and the open-source data transformation tool DBT.
Using a Formula 1 Grand Prix dataset, I will give an overview of how these four tools complement each other perfectly for analytics tasks in an ELT approach. You will learn the specific uses of each tool as well as their particular features. My talk is based on a full tutorial, which you can find under https://waldstack.org.
Forget about AI and do Mathematical Modelling instead!Florian Wilhelm
The term artificial intelligence is ubiquitous these days. It is being used or is supposed to be used, in almost all areas to automate processes or help us make decisions. AI that teaches robots to do backflips, informs passengers about perfect routes, and determines the perfect order quantity for retailers. But what is actually behind it?
In this talk, I would like to take a look behind the scenes of AI and show you what actually happens to solve such exciting use cases intelligently. Using some practical examples from my project work in the field of data science, I will show you the importance of mathematical modelling and understanding the data to solve the use case. The more you understand about it, the less it is artificial intelligence.
Slides from Ideation workshop given to Melbourne Accelerator Programme held at the University of Melbourne on 11 April 2013.
See www.getviable.com for more.
What is Strategy - Thinking like a StrategistAmit Kapoor
What is Strategy? Strategy is a very young concept. Lets explore a little more about strategy and then go down the journey of understanding how to think like a strategist.
Presentation on how industry disruption occurs, the growth of ideas, and creating an entrepreneurial mindset within the workforce. Burcham shares his "Idea Frame" as well as his "Personal Progression Map"
Centre for Entrepreneurship (C4E) of the University of Cyprus and Berklee Institute for Creative Entrepreneurship (ICE) present the:
Why are some designs better than others, and what can you do about it? (The workshop)
If you've ever described a poster as heavy, a website as dense, an app as clumsy or an object as whimsical, you probably already know the answer. Recent psychology research is showing that experiential metaphors are key emotional drivers that impact our perception of the world. Applying these findings to design confirms what designers have learned throughout their careers—good design is subconscious first and rational second. Michael will share stories from this research and the IDEO portfolio then share tools to help you be more consciously subconscious.
This presentation briefly will elaborate how IKEA has adopting Porter's Five Forces and Value Chain Analysis in order to maintain its competitive edges over its rivals in furniture market all over the globe by providing good quality furniture at a lower price tag. Hence by bringing in innovative design, improved functionality, low cost operating expenditures and offering excellent quality at lower prices, IKEA's has proved to be a success.
The design thinking transformation in businessCathy Wang
Presented at Webvisions Barcelona 2015 By Cathy Wang & Nuno Andrew
The definition of design is shifting from being a noun to a verb. We see it moving away from arts and craft into a methodology of delivering value. Adapting to this shift, designers and changemakers are forming a new way of design thinking.
As designer, not only are we crafting products / services, but we are also learning to see a much bigger system with a deep connection to business factors. How can we influence businesses with design thinking in order to build a solid business platform that delivers meaningful products / services.
Systems thinking is an approach to problem solving. Businesses are an intricate ecosystem, from how the organisation is structured, to people, to commercial planning, to processes. As designers, we practice systems thinking everyday. How do we use this knowledge to craft a business? This, is business design.
In this session, we want to explore what business design means. How to use what we know, as designers, to build stronger businesses? As we continue to adapt design methodologies and systems thinking to a business context, what other manifestations that will evolve? How can design thinking be leveraged in even the most straight-laced silos of a business such as Human Resources and Finance? How do we give design thinking the space it needs in the face of traditional business practice? And most importantly, how do we use our existing design thinking knowledge, to design businesses?
Design Thinking explained with project experiences.
- What is Design Thinking
- What are the steps
- What is SAP Apphaus
- The Next View Design Experience Center Amsterdam
Kickstart your Product Backlog with Innovation GamesFrederic Vandaele
How to start your Scrum project? How to initialize your product backlog? You are not alone, in most agile projects, managing the product backlog remains a complex and difficult activity.
Scrum said that it's the Product Owner that manage the product backlog but it does not tell us how (It's a framework you know). However, the product owners are people from the business. They have little or no experience with Agile and what it means in term of contribution to the project.
How to involve a group of users in the creation of product backlog without that they feel cheated or ignored? How to prioritize dozens or even hundreds of user stories of varying sizes with a group of users representing different needs with conflicting interests?
The Innovation Games are techniques that can address these issues. The art is to combine these methods with a view to a common vision to emerge as an initial product backlog that will help the Scrum team to start the project on a solid foundation.
Presented at Agile Tour Brussels 2013
Advertising agencies are obsessed with innovation. They also have one of the most unique sets of creative talent of any industry. Yet the creative department is the most suspicious of "innovation" of any group at the agency. Could it be that actually Creative Directors hold the keys to converting ad agencies into what so many desire: innovation partners to clients?
(special thanks to @seelydiaplay for presentation design help)
Slides presented by Prof. Rishikesha Krishnan at CIO Leadership Summit at Hotel Movenpick on April 26, 2013. It gives an overview of the book "8 steps to innovation: Going from jugaad to excellence" by Vinay Dabholkar and Rishikesha Krishnan.
A summary of the basic principles of design thinking, human centered innovation and its application to strategy. Created by Natalie Nixon of Figure 8 Thinking.
Tarkempi resoluutio SpeakerDeckissä:
https://speakerdeck.com/hponka/2022
Sosiaalisen median katsaus 07/2021: Somebuumi jatkuu, mutta minne nuoret menevät? Harto Pönkä, 8.7.2022, Innowise
SWOT analysis is a strategic planning technique used to help a person or organization identify strengths, weaknesses, opportunities, and threats related to business competition or any project planning.
The definition of R&D following the Frascati manualLEYTON
The Frascati Manual is used as a reference book for defining R&D (Research and Development). We propose to make a dissection of the last revision of the Frascati Manual published on October 8, 2015.
This presentation will focus on chapter 2 concerning the definition of R&D by giving extracts of the Manual.
Stanford and the Silicon Valley Ecosystem - Tom Byers - 2013 HBCU Innovation ...EpicenterUSA
Tom Byers presented "How the Silicon Valley Innovation Ecosystem Works: Stanford University's Contributions" on Thursday, October 31, 2013, during the UNCF HBCU Innovation Summit at Stanford University.
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSHCL Technologies
Though insights from Big Data gives a breakthrough to make better business decision, it poses its own set of challenges. This paper addresses the gap of Variety problem and suggest a way to seamlessly handle data processing even if there is change in data type/processing algorithm. It explores the various map reduce design patterns and comes out with a unified working solution (library). The library has the potential to ‘adapt’ itself to any data processing need which can be achieved by Map Reduce saving lot of man hours and enforce good practices in code.
Building a performing Machine Learning model from A to ZCharles Vestur
A 1-hour read to become highly knowledgeable about Machine learning and the machinery underneath, from scratch!
A presentation introducing to all fundamental concepts of Machine Learning step by step, following a classical approach to build a performing model. Simple examples and illustrations are used all along the presentation to make the concepts easier to grasp.
Centre for Entrepreneurship (C4E) of the University of Cyprus and Berklee Institute for Creative Entrepreneurship (ICE) present the:
Why are some designs better than others, and what can you do about it? (The workshop)
If you've ever described a poster as heavy, a website as dense, an app as clumsy or an object as whimsical, you probably already know the answer. Recent psychology research is showing that experiential metaphors are key emotional drivers that impact our perception of the world. Applying these findings to design confirms what designers have learned throughout their careers—good design is subconscious first and rational second. Michael will share stories from this research and the IDEO portfolio then share tools to help you be more consciously subconscious.
This presentation briefly will elaborate how IKEA has adopting Porter's Five Forces and Value Chain Analysis in order to maintain its competitive edges over its rivals in furniture market all over the globe by providing good quality furniture at a lower price tag. Hence by bringing in innovative design, improved functionality, low cost operating expenditures and offering excellent quality at lower prices, IKEA's has proved to be a success.
The design thinking transformation in businessCathy Wang
Presented at Webvisions Barcelona 2015 By Cathy Wang & Nuno Andrew
The definition of design is shifting from being a noun to a verb. We see it moving away from arts and craft into a methodology of delivering value. Adapting to this shift, designers and changemakers are forming a new way of design thinking.
As designer, not only are we crafting products / services, but we are also learning to see a much bigger system with a deep connection to business factors. How can we influence businesses with design thinking in order to build a solid business platform that delivers meaningful products / services.
Systems thinking is an approach to problem solving. Businesses are an intricate ecosystem, from how the organisation is structured, to people, to commercial planning, to processes. As designers, we practice systems thinking everyday. How do we use this knowledge to craft a business? This, is business design.
In this session, we want to explore what business design means. How to use what we know, as designers, to build stronger businesses? As we continue to adapt design methodologies and systems thinking to a business context, what other manifestations that will evolve? How can design thinking be leveraged in even the most straight-laced silos of a business such as Human Resources and Finance? How do we give design thinking the space it needs in the face of traditional business practice? And most importantly, how do we use our existing design thinking knowledge, to design businesses?
Design Thinking explained with project experiences.
- What is Design Thinking
- What are the steps
- What is SAP Apphaus
- The Next View Design Experience Center Amsterdam
Kickstart your Product Backlog with Innovation GamesFrederic Vandaele
How to start your Scrum project? How to initialize your product backlog? You are not alone, in most agile projects, managing the product backlog remains a complex and difficult activity.
Scrum said that it's the Product Owner that manage the product backlog but it does not tell us how (It's a framework you know). However, the product owners are people from the business. They have little or no experience with Agile and what it means in term of contribution to the project.
How to involve a group of users in the creation of product backlog without that they feel cheated or ignored? How to prioritize dozens or even hundreds of user stories of varying sizes with a group of users representing different needs with conflicting interests?
The Innovation Games are techniques that can address these issues. The art is to combine these methods with a view to a common vision to emerge as an initial product backlog that will help the Scrum team to start the project on a solid foundation.
Presented at Agile Tour Brussels 2013
Advertising agencies are obsessed with innovation. They also have one of the most unique sets of creative talent of any industry. Yet the creative department is the most suspicious of "innovation" of any group at the agency. Could it be that actually Creative Directors hold the keys to converting ad agencies into what so many desire: innovation partners to clients?
(special thanks to @seelydiaplay for presentation design help)
Slides presented by Prof. Rishikesha Krishnan at CIO Leadership Summit at Hotel Movenpick on April 26, 2013. It gives an overview of the book "8 steps to innovation: Going from jugaad to excellence" by Vinay Dabholkar and Rishikesha Krishnan.
A summary of the basic principles of design thinking, human centered innovation and its application to strategy. Created by Natalie Nixon of Figure 8 Thinking.
Tarkempi resoluutio SpeakerDeckissä:
https://speakerdeck.com/hponka/2022
Sosiaalisen median katsaus 07/2021: Somebuumi jatkuu, mutta minne nuoret menevät? Harto Pönkä, 8.7.2022, Innowise
SWOT analysis is a strategic planning technique used to help a person or organization identify strengths, weaknesses, opportunities, and threats related to business competition or any project planning.
The definition of R&D following the Frascati manualLEYTON
The Frascati Manual is used as a reference book for defining R&D (Research and Development). We propose to make a dissection of the last revision of the Frascati Manual published on October 8, 2015.
This presentation will focus on chapter 2 concerning the definition of R&D by giving extracts of the Manual.
Stanford and the Silicon Valley Ecosystem - Tom Byers - 2013 HBCU Innovation ...EpicenterUSA
Tom Byers presented "How the Silicon Valley Innovation Ecosystem Works: Stanford University's Contributions" on Thursday, October 31, 2013, during the UNCF HBCU Innovation Summit at Stanford University.
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSHCL Technologies
Though insights from Big Data gives a breakthrough to make better business decision, it poses its own set of challenges. This paper addresses the gap of Variety problem and suggest a way to seamlessly handle data processing even if there is change in data type/processing algorithm. It explores the various map reduce design patterns and comes out with a unified working solution (library). The library has the potential to ‘adapt’ itself to any data processing need which can be achieved by Map Reduce saving lot of man hours and enforce good practices in code.
Building a performing Machine Learning model from A to ZCharles Vestur
A 1-hour read to become highly knowledgeable about Machine learning and the machinery underneath, from scratch!
A presentation introducing to all fundamental concepts of Machine Learning step by step, following a classical approach to build a performing model. Simple examples and illustrations are used all along the presentation to make the concepts easier to grasp.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2020/08/once-for-all-dnns-simplifying-design-of-efficient-models-for-diverse-hardware-a-presentation-from-mit/
For more information about edge AI and vision, please visit:
http://www.edge-ai-vision.com
Christine Cheng, co-chair of the inference benchmark working group at MLPerf and a senior machine learning optimization engineer at Intel, delivers the presentation “MLPerf: An Industry Standard Performance Benchmark Suite for Machine Learning” at the Edge AI and Vision Alliance’s July 2020 Edge AI and Vision Innovation Forum. Cheng explains how MLPerf’s inference benchmark suite for evaluating processor performance works and is evolving.
In this presentation, you will be introduced to the concept of Integer Programming and its application in conference scheduling. We will delve into the fundamentals of Integer Programming and its practical utilization in optimizing the allocation of talks to specific time slots and rooms within a conference program. By the conclusion of the talk, attendees will gain a clearer comprehension of the potential of this powerful tool in creating a conference schedule that is both efficient and effective, ultimately maximizing attendee satisfaction. Whether you are involved in conference organization or simply curious about optimization algorithms, this presentation is tailored to meet your interests.
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
Pascal Pfeiffer, Principal Data Scientist, H2O.ai
H2O Open Source GenAI World SF 2023
This talk dives into the expansive ecosystem of Large Language Models (LLMs), offering practitioners an insightful guide to various relevant applications, from natural language understanding to creative content generation. While exploring use cases across different industries, it also honestly addresses the current limitations of LLMs and anticipates future advancements.
Reviewing progress in the machine learning certification journey
𝗦𝗽𝗲𝗰𝗶𝗮𝗹 𝗔𝗱𝗱𝗶𝘁𝗶𝗼𝗻 - Short tech talk on How to Network by Qingyue(Annie) Wang
C𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 AI and ML on Google Cloud by Margaret Maynard-Reid
𝗔 𝗳𝗼𝗰𝘂𝘀𝗲𝗱 𝗰𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 𝗠𝗟 𝗽𝗿𝗼𝗯𝗹𝗲𝗺 𝗳𝗿𝗮𝗺𝗶𝗻𝗴, 𝗺𝗼𝗱𝗲𝗹 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝗳𝗮𝗶𝗿𝗻𝗲𝘀𝘀 by Sowndarya Venkateswaran.
A discussion on sample questions to aid certification exam preparation.
An interactive Q&A session to clarify doubts and questions.
Previewing next steps and topics, including course completions and material reviews.
[OFW 14] Prediction of Flow Characteristics by Applying Machine Learning of S...Geon-Hong Kim
This is presented at OFW 14 (OpenFOAM Workshop) in 2019.
It demonstrates an idea to combine the ML to CFD.
A geometry is parameterized by employing the cut-cell method of the IBM (Immersed Boundary Method) and it has been used as an input to the ML framework. CFD is conducted to predict the drag coefficient of an object and it is used as the target of the ML process.
The Energy Management Application enables plant personnel to monitor real-time energy flow and automatically notifies operators, supervisors and cost accountants of energy inefficiencies and non-compliance of configured policies. Application functionality includes recording consumption and demand at main and sub-meters for a wide range of energy types, including power, water, chill, gas, air and steam. For more information, please go through the slide.
iARMS-EMS/PMS is developed by Envision Enterprise Solutions
Envision Enterprise Solutions is one of the fast growing global technology and innovation companies offering services in IT consulting, software implementation, system integration and development, with presence in USA, UAE, Singapore and India. Envision has solutions portfolio that range from Enterprise Asset Management, Port Automation Solutions, ERP Solutions, Application Services, Enterprise Mobility Solutions, Cloud Solutions, Performance Optimization and Real Time Monitoring Solutions. Envision is partnered with leading technology giants like IBM, SAP, Oracle and Microsoft.
Can Machine Learning Models be Trusted? Explaining Decisions of ML ModelsDarek Smyk
The more we involve Artificial Intelligence/Machine Learning in our daily lives, the more we need to be able to trust the decisions that the AI/ML systems make. Providing explanations along with decisions may help establish that trust.
ROS 2 AI Integration Working Group 1: ALMA, SustainML & ROS 2 use case eProsima
The new ROS 2 AI Integration Working Group is focused on enabling Machine Learning technologies for ROS 2.
In this presentation you'll find:
- ALMA: the Human Centric Algebraic Machine Learning project
- SustainML
- Enabling ML technologies for ROS 2 robots with Vulcanexus
IBM i & digital transformation - Presentation & basic demo
IBM Watson Studio, IBM DSX Local w/ Open Source (Spark) & IBM Technology (OpenPower, CAPI, NVLINK)
"Custom ML Models for Each User", Siamion KarasikFwdays
Based on ML1 case, an ML-powered Jira plugin by Exadel we'll overview technical and ML challenges in case of training and serving a separate ML model for every user.
Similar to Honey I Shrunk the Target Variable! Common pitfalls when transforming the target variable and how to exploit transformations. (20)
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
Designed for beginners, this presentation demystifies Python project management using Hatch and delves into pyproject.toml for efficient configuration. We'll guide you through organizing directories, implementing unit testing for code reliability, and using mypy for type checking to enhance code quality. The session concludes with insights into ruff, a modern linter for maintaining Python standards, which is replacing black, isort, flake8. This talk is a comprehensive toolkit for anyone eager to learn and apply the latest practices in Python development.
The talk was given at PyConDE / PyData Berlin 2024. More details here: https://pretalx.com/pyconde-pydata-2024/talk/CBVTEG/
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...Florian Wilhelm
With the increasing use of AI and ML-based systems, interpretability is becoming an increasingly important issue to ensure user trust and safety. This also applies to the area of recommender systems, where methods based on matrix factorization (MF) are among the most popular methods for collaborative filtering tasks with implicit feedback. Despite their simplicity, the latent factors of users and items lack interpretability in the case of the effective, unconstrained MF-based methods. In this work, we propose an extended latent Dirichlet Allocation model (LDAext) that has interpretable parameters such as user cohorts of item preferences and the affiliation of a user with different cohorts. We prove a theorem on how to transform the factors of an unconstrained MF model into the parameters of LDAext. Using this theoretical connection, we train an MF model on different real-world data sets, transform the latent factors into the parameters of LDAext and test their interpretation in several experiments for plausibility. Our experiments confirm the interpretability of the transformed parameters and thus demonstrate the usefulness of our proposed approach.
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...Florian Wilhelm
Matrix factorization-based methods are among the most popular methods for collaborative filtering tasks with implicit feedback. The most effective of these methods do not apply sign constraints, such as non-negativity, to their factors. Despite their simplicity, the latent factors for users and items lack interpretability, which is becoming an increasingly important requirement. In this work, we provide a theoretical link between unconstrained and the interpretable non-negative matrix factorization in terms of the personalized ranking induced by these methods. We also introduce a novel, latent Dirichlet allocation-inspired model for recommenders and extend our theoretical link to also allow the interpretation of an unconstrained matrix factorization as an adjoint formulation of our new model. Our experiments indicate that this novel approach represents the unknown processes of implicit user-item interactions in the real world much better than unconstrained matrix factorization while being interpretable.
This talk was presented at 15th ACM Conference on Recommender Systems in Amsterdam (RecSys 2021). Find more information under https://dl.acm.org/doi/fullHtml/10.1145/3460231.3474266
With the advent of Deep Learning (DL), the field of AI made a giant leap forward and it is nowadays applied in many industrial use-cases. Especially critical systems like autonomous driving, require that DL methods not only produce a prediction but also state the certainty about the prediction in order to assess risks and failure.
In my talk, I will give an introduction to different kinds of uncertainty, i.e. epistemic and aleatoric. To have a baseline for comparison, the classical method of Gaussian Processes for regression problems is presented. I then elaborate on different DL methods for uncertainty quantification like Quantile Regression, Monte-Carlo Dropout, and Deep Ensembles. The talk is concluded with a comparison of these techniques to Gaussian Processes and the current state of the art.
Performance evaluation of GANs in a semisupervised OCR use caseFlorian Wilhelm
Even in the age of big data, labeled data is a scarce resource in many machine learning use cases. Florian Wilhelm evaluates generative adversarial networks (GANs) when used to extract information from vehicle registrations under a varying amount of labeled data, compares the performance with supervised learning techniques, and demonstrates a significant improvement when using unlabeled data.
Bridging the Gap: from Data Science to ProductionFlorian Wilhelm
A recent but quite common observation in industry is that although there is an overall high adoption of data science, many companies struggle to get it into production. Huge teams of well-payed data scientists often present one fancy model after the other to their managers but their proof of concepts never manifest into something business relevant. The frustration grows on both sides, managers and data scientists.
In my talk I elaborate on the many reasons why data science to production is such a hard nut to crack. I start with a taxonomy of data use cases in order to easier assess technical requirements. Based thereon, my focus lies on overcoming the two-language-problem which is Python/R loved by data scientists vs. the enterprise-established Java/Scala. From my project experiences I present three different solutions, namely 1) migrating to a single language, 2) reimplementation and 3) usage of a framework. The advantages and disadvantages of each approach is presented and general advices based on the introduced taxonomy is given.
Additionally, my talk also addresses organisational as well as problems in quality assurance and deployment. Best practices and further references are presented on a high-level in order to cover all facets of data science to production.
With my talk I hope to convey the message that breakdowns on the road from data science to production are rather the rule than the exception, so you are not alone. At the end of my talk, you will have a better understanding of why your team and you are struggling and what to do about it.
How mobile.de brings Data Science to Production for a Personalized Web Experi...Florian Wilhelm
As Germany's biggest online car marketplace, mobile.de provides a personalized web experience. Our Data Team leverages the interactions of our users to infer their preferences. For this tasks we often apply Python and Spark to wrangle massive amounts of data. In this talk, we are going to present our personalization use-cases as well as the application of PySpark in production.
Deep Learning-based Recommendations for Germany's Biggest Vehicle MarketplaceFlorian Wilhelm
As presented at the Düsseldorf Data Science Meetup on March, 12th, the talk covers business as well as technical aspects of recommender systems based on deep learning. It is an extended version of the talk held at Bitkom A.I. Summit 2018 with the same title and covers more technical details in depth.
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...Florian Wilhelm
At mobile.de, Germany’s biggest car marketplace, a dedicated data team, supported by the IT project house inovex, is responsible for creating smart data products. One focus are personalised vehicle recommendations to improve the customer experience during browsing as well as finding the perfect offering.
As an introduction, we briefly mention the traditional approaches for recommendation engines, thereby motivating the need for more sophisticated approaches. We then illustrate how Deep Learning can be leveraged to capture the underlying non-linear correlations of features for personalised recommendations. In particular, we’ve customised Google Play’s algorithm for an online marketplace with a fast-changing inventory. Several variants of our adapted approach are evaluated against traditional methods as well as scalability aspects are addressed.
We conclude our talk by giving an outlook on the importance of personalised user experiences and the application of Deep Learning and AI at mobile.de.
Declarative Programming is a programming paradigm that focuses on describing what should be computed in a problem domain without describing how it should be done. The talk starts by explaining differences between a declarative and imperative approach with the help of examples from everyday life. Having established a clear notion of declarative programming as well as pointed out some advantages, we transfer these concepts to programming in general. For example, the usage of control flow statements like loops over-determine the order of computation which impedes scalable execution as well as it often violates the single level of abstraction principle.
As Germany’s largest online vehicle marketplace mobile.de uses recommendations at scale to help users find the perfect car. We elaborate on collaborative & content-based filtering as well as a hybrid approach addressing the problem of a fast-changing inventory. We then dive into the technical implementation of the recommendation engine, outlining the various challenges faced and experiences made.
In the field of machine learning and particularly in supervised learning, correlation is key in order to predict the target variable with the help of feature variables. Rarely do we think about causation and the actual effect of a single feature variable or covariate on the target or response respectively. Some even go so far saying that “correlation trumps causation” like in the book “Big Data: A Revolution That Will Transform How We Live, Work, and Think” by Viktor Mayer-Schönberger and Kenneth Cukier. Following their reasoning with Big Data there is no need anymore to think about causation since nonparametric models will do just fine using only correlation. For many practical use-cases this point of view seems to be acceptable but surely not for all. In my talk I will present the theory of causal inference and demonstrate it's application with the help of inverse probability of treatment weighting (IPTW) which is a propensity score method on a practical use-case.
Explaining the idea behind automatic relevance determination and bayesian int...Florian Wilhelm
Even in the era of Big Data there are many real-world problems where the number of input features has about the some order of magnitude than the number of samples. Often many of those input features are irrelevant and thus inferring the relevant ones is an important problem in order to prevent over-fitting. Automatic Relevance Determination solves this problem by applying Bayesian techniques.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Honey I Shrunk the Target Variable! Common pitfalls when transforming the target variable and how to exploit transformations.
1. Honey, I Shrunk the Target Variable!
Florian Wilhelm
Common pitfalls when transforming the target variable and
how to exploit transformations
Berlin, April 12th 2022
2. Dein
Foto
hier
Mathematical Modelling
dA Data Science to Production & MLOps
Personalisation & RecSys
Uncertainty Quantification & Causality
Python Data Stack
Creator of PyScaffold
@FlorianWilhelm
FlorianWilhelm
FlorianWilhem.info
2
Dr. Florian Wilhelm
Head of Data Science @ inovex
3. inovex is an IT project house
with focus on digital transformation
› Product Discovery · Product Ownership
› Web · UI/UX · Replatforming · Microservices
› Mobile · Apps · Smart Devices · Robotics
› Big Data & Business Intelligence Platforms
› Data Science · Data Products · Search · Deep Learning
› Data Center Automation · DevOps · Cloud · Hosting
› Agile Training · Technology Training · Coaching
Karlsruhe · Pforzheim · Stuttgart · München · Köln · Hamburg
www.inovex.de/en
Using technology to inspire our clients.
And ourselves.
5. Choosing the Right Metric
› (R)MSE is most often
used in practice
› Scikit-Learn’s
regressors use mostly
MSE as default
5
In which Use-Cases does (R)MSE make sense?
8. 8
How much should
I sell my car for?
Model fitted on
many sold cars
and their features
could provide a
fair market value
9. Our Use-Case Setting
9
1. take used-cars database from Kaggle with 370k cars having
features: vehicle type, model, registration date, gearbox,
powerPS, model, mileage, fuel type, brand and price
2. built a model to estimate the price based on these features
and treat this as a fair market value
3. decide what’s a good/fair/bad price based on this fair
market value
source-code: https://github.com/FlorianWilhelm/used-cars-log-trans/
10. Question 1:
10
What’s worse? Selling 10 equal cars
with an actual price of 50,000 € and
1. getting the actual price for 9
but only 40,000 € for the last car or
2. getting 49,000 € for every car?
● For (R)MSE option 1 is much worse
● For MAE both options are equally good/bad
11. Question 2:
11
Which one is worse?
Getting 1,000 € less if your
car’s actual value is
1. 100,000 € or
2. 10,000 €?
● For RMSE & MAE this makes no difference
● For RMSPE & MAPE option 2 is much worse
12. Learning 1:
The right metric depends on the
use-case and will affect your results!
12
20. Minimizing (R)MSE with log(price)
20
What we gonna do:
1. Take log(price) as target variable
2. Minimize (R)MSE to find ŷ
3. Transform ŷ back with exp(ŷ)
22. … the Median?!?
Mathematically, in case of a
lognormal residual distribution:
› taking the log, minimizing for
RMSE and transforming back
with exp, will lead to the median.
› if we wanted the mean, we need
to correct the transformed result
by adding .
22
On our data (not perfectly lognormal)
https://www.pinterest.de/pin/494973815284951824/
Uploaded by Jittanisa Sukaphatana
a bit higher than the “actual” mean of 6807
23. And there is much more…
Correction terms when applying log to the a target variable
with lognormal residuals and minimizing (R)MSE:
23
(R)MSE MAE MAPE RMSPE
Proofs under https://www.inovex.de/de/blog/honey-i-shrunk-the-target-variable/
26. What To Do If Your Metric Is Not Supported?
26
Imagine you want to optimise for RMSPE, and your data has
a lognormal residual distribution but the ML-library your
are using only supports (R)MSE?
27. One More Time. Instead of doing…
27
model fit with (R)MSE
1. Fitting a model using (R)MSE as loss/metric
2. Evaluating our predictions with another
metric, e.g. MAD, MAPE, RMSPE
28. … We Do for Our Use-Case…
28
transform
model fit with (R)MSE
correction
&
transform
1. Log transformation
2. Fitting a model using (R)MSE as loss/metric
3. Correction & back-transformation
4. Evaluating our predictions with another
metric, e.g. MAD, MAPE, RMSPE
29. Let’s Apply This In Our Use-Case
29
Improvements over raw target when using a log transformation & correction
and evaluating the final prediction under a given metric, e.g. MAPE, …
In case of the Kaggle competition the
transformation was key for winning
negative numbers mean improvement
34. Recap: Linear Model
34
raw features
(non-linear) functions, feature engineering
weights to fit
true latent (unknown) outcome
noise
observations/samples
Normal Distribution