This talk addresses product managers and discusses basics of statistics and analytics and ways to use them effectively in their products.
Video: https://youtu.be/Rsrp040DYKg (orientation is fixed after a few minutes)
April 22, 2017 - Product Folks! Meetup Amman, Jordan
Machine Learning Vital Signs: Metrics and Monitoring of AI in Production
This talk details the tracking of machine learning models in production to ensure model reliability, consistency, and performance into the future. Production models are interacting with the real world, and it is terrifying that often times nobody has any idea how they are performing on live data. The world changes! Bias and variance can creep into your models over time and you should know when that happens.
Are you having trouble getting your bug reports fixed? It could be that you’ve yet to master the craft of bug reporting. It’s a common assumption that bug reports are easy to create, but a well-crafted bug report requires more than innate ability.
In this practical workshop, Neil will share his experiences (good and bad!) from ten years of bug reporting, and show how you can supercharge your bug reports:
First presented at the 2015 TestBash Workshop Day: http://www.ministryoftesting.com/training-events/testbash-workshop-day/
Better products faster: let's bring the user into the userstory // TAPOST_201...Anna Witteman
Why is it that everyone knows the importance of frequent user
testing, yet hardly anyone does it? Because user testing often is time
consuming, complex and expensive. It probably doesn’t fit in your
development process and thus feels like extra work.
To feel reassured you tell yourself to test with users once you have
something working, or at the very end of the process. This is strange,
because everybody knows that changing your product late in the
process will increase costs exponentially.
We created a way so that user testing saves time, improves the
quality and doesn’t cost a lot of money. Team driven, pragmatic and
no extra resources needed.
The talk will show how, with only 2 hours every sprint, we focused on
creating better products faster. We would love to share our learnings
and simple DIY tools that let you start user testing with your current teams tomorrow!
Testing for cognitive bias in ai systemsPeter Varhol
The document discusses how machine learning systems can produce biased results based on issues with the training data used, and provides examples of how biases have emerged in commercial AI systems. It then outlines approaches for testing machine learning systems to identify potential biases, including understanding the training data, defining objective success criteria, and testing with diverse edge cases. The challenges of addressing biases that emerge from limitations in the data or human decisions are also examined.
This document provides guidelines for A/B testing, including prioritizing test ideas based on estimated new conversions per day, creating tests by running a power analysis and having incremental tests, analyzing tests by monitoring health metrics, and making decisions carefully based on analysis results. It recommends calculating potential impact, having a data scientist involved, and not launching on neutral results to avoid technical debt.
I love the smell of data in the morning (getting started with data science) ...Troy Magennis
Data Science 101 for software development. I know it misses the purist view of Data Science, but this is intended to get you started! First presented at Agile 2017 in Florida.
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
This document discusses issues in statistics that data scientists can and cannot ignore when working with large datasets. It begins by outlining the talk and defining key terms in data science. It then explains that model assessment, such as estimating model performance on new data, becomes easier with more data as statistical adjustments are not needed. However, more data and variables are not always better, as noise, collinearity, and overfitting can still occur. Several examples are given where common machine learning algorithms can be fooled into achieving high accuracy on training data even when the target variable is random. The conclusion emphasizes that data science, statistics, and domain expertise each provide unique perspectives, and effective teams need to understand all views.
Bad AI showing sexist or racist correlations makes headlines. Nobody sets out to make a bad system, so why does this happen. I take a look at all the ways bias creeps into AI and where you should put effort to avoid it.
Slides annotated from a talk given at ImpactfulAI meetup 19th June 2019 London
Machine Learning Vital Signs: Metrics and Monitoring of AI in Production
This talk details the tracking of machine learning models in production to ensure model reliability, consistency, and performance into the future. Production models are interacting with the real world, and it is terrifying that often times nobody has any idea how they are performing on live data. The world changes! Bias and variance can creep into your models over time and you should know when that happens.
Are you having trouble getting your bug reports fixed? It could be that you’ve yet to master the craft of bug reporting. It’s a common assumption that bug reports are easy to create, but a well-crafted bug report requires more than innate ability.
In this practical workshop, Neil will share his experiences (good and bad!) from ten years of bug reporting, and show how you can supercharge your bug reports:
First presented at the 2015 TestBash Workshop Day: http://www.ministryoftesting.com/training-events/testbash-workshop-day/
Better products faster: let's bring the user into the userstory // TAPOST_201...Anna Witteman
Why is it that everyone knows the importance of frequent user
testing, yet hardly anyone does it? Because user testing often is time
consuming, complex and expensive. It probably doesn’t fit in your
development process and thus feels like extra work.
To feel reassured you tell yourself to test with users once you have
something working, or at the very end of the process. This is strange,
because everybody knows that changing your product late in the
process will increase costs exponentially.
We created a way so that user testing saves time, improves the
quality and doesn’t cost a lot of money. Team driven, pragmatic and
no extra resources needed.
The talk will show how, with only 2 hours every sprint, we focused on
creating better products faster. We would love to share our learnings
and simple DIY tools that let you start user testing with your current teams tomorrow!
Testing for cognitive bias in ai systemsPeter Varhol
The document discusses how machine learning systems can produce biased results based on issues with the training data used, and provides examples of how biases have emerged in commercial AI systems. It then outlines approaches for testing machine learning systems to identify potential biases, including understanding the training data, defining objective success criteria, and testing with diverse edge cases. The challenges of addressing biases that emerge from limitations in the data or human decisions are also examined.
This document provides guidelines for A/B testing, including prioritizing test ideas based on estimated new conversions per day, creating tests by running a power analysis and having incremental tests, analyzing tests by monitoring health metrics, and making decisions carefully based on analysis results. It recommends calculating potential impact, having a data scientist involved, and not launching on neutral results to avoid technical debt.
I love the smell of data in the morning (getting started with data science) ...Troy Magennis
Data Science 101 for software development. I know it misses the purist view of Data Science, but this is intended to get you started! First presented at Agile 2017 in Florida.
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
This document discusses issues in statistics that data scientists can and cannot ignore when working with large datasets. It begins by outlining the talk and defining key terms in data science. It then explains that model assessment, such as estimating model performance on new data, becomes easier with more data as statistical adjustments are not needed. However, more data and variables are not always better, as noise, collinearity, and overfitting can still occur. Several examples are given where common machine learning algorithms can be fooled into achieving high accuracy on training data even when the target variable is random. The conclusion emphasizes that data science, statistics, and domain expertise each provide unique perspectives, and effective teams need to understand all views.
Bad AI showing sexist or racist correlations makes headlines. Nobody sets out to make a bad system, so why does this happen. I take a look at all the ways bias creeps into AI and where you should put effort to avoid it.
Slides annotated from a talk given at ImpactfulAI meetup 19th June 2019 London
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisTroy Magennis
Software risk impact is more predictable than you might think. This session discusses similarities of uncertainty in various industries and relates this back to how we can measure and analyze impediments and risk for agile software teams.
This document discusses max-diff (maximum difference) analysis, which is a method for collecting preference data. It covers when to use max-diff, experimental design considerations, problems with simple "counting" analysis, using latent class analysis instead, and computing preference shares from max-diff data. Latent class analysis addresses issues with counting analysis by accounting for experimental design, inconsistencies in preferences, and differences between individuals.
Searching for Anomalies, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
This document summarizes a presentation on model evaluation given at the 4th annual Valencian Summer School in Machine Learning. It discusses the importance of evaluating models to understand how well they will perform on new data and identify mistakes. Various evaluation metrics are introduced like accuracy, precision, recall, F1 score, and Phi coefficient. The dangers of evaluating on training data are explained, and techniques like train-test splits and cross-validation are recommended to get less optimistic evaluations. Regression metrics like MAE, MSE, and R-squared error are also covered. Different evaluation techniques for specific problem types like imbalanced classification, time series forecasting, and model selection are discussed.
The document discusses test design and provides tips for becoming a better test designer. It explains that test design involves coming up with a well-thought-out and broad set of tests based on the application and schedule. Both over-testing and under-testing should be avoided. It also emphasizes practicing testing, collaborating with others, learning about the application, and finding new testing ideas to expand one's toolbox. The best test tool is noted as being one's own brain.
What is the story with agile data keynote agile 2018 (Magennis)Troy Magennis
This document discusses using data to improve agile practices and outcomes. It argues that agile has lost the "data war" by not capturing and utilizing data from teams effectively. It suggests that data needs to be handled safely to avoid embarrassing people and destroying the utility of historical data. Better ways are needed to measure outcomes rather than just output, and to balance predictability with creativity. The document also discusses visualizing and managing dependencies, comparing performance across teams, and using the right metrics depending on a team's characteristics and challenges. The overarching message is that data needs to be used carefully and conversationally to drive the right actions and improve agile practices.
State of the Art in Machine Learning, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
This document discusses various machine learning model validation techniques and ensemble methods such as bagging and boosting. It defines key concepts like overfitting, underfitting, bias-variance tradeoff, and different validation metrics. Cross validation techniques like k-fold and bootstrap are explained as ways to estimate model performance on unseen data. Bagging creates multiple models on resampled data and averages their predictions to reduce variance. Boosting iteratively adjusts weights of misclassified observations to build strong models, but risks overfitting. Gradient boosting and XGBoost are powerful ensemble methods.
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceProduct School
Product Management Event at #ProductCon NY on how to create AI models for fun and for profit by Jason Nichols, Director of Artificial Intelligence at Walmart Intelligent Research Lab.
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들Minho Lee
2021-09-04 프롬 특강 발표자료입니다.
---
많은 사람들이 A/B 테스트가 중요하다고 말합니다.
그런데 우리는 뭘 믿고 A/B 테스트에 의사결정을 맡기는 걸까요?
A/B 테스트는 그냥 돌리면 성과를 만들어주는 마법의 도구가 아닙니다.
신뢰할 수 있는 실험 결과를 위해 어떤 고민이 더 필요한지 살펴보려고 합니다.
Making Machine Learning Work in Practice - StampedeCon 2014StampedeCon
At StampedeCon 2014, Kilian Q. Weinberger (Washington University) presented "Making Machine Learning work in Practice."
Here, Kilian will go over common pitfalls and tricks on how to make machine learning work.
This document discusses data quality issues and methods for addressing them. It defines two perspectives on quality - conformance to requirements and fitness for use. There are four costs of poor quality: reputation, prevention, detection, and repair. Methods for addressing quality issues include data editing, imputation, and fabrication. Data editing involves techniques like range tests, deterministic tests, and probabilistic tests to identify potential errors. Imputation is used to handle missing or misreported data, but care must be taken not to over-impute. Fabrication poses a threat as intentionally entering false data undermines quality. Overall the document emphasizes improving quality through understanding sources of error rather than over-editing data.
Background of measuring and metric usage is traditional waterfall projects, psychology of measuring, agile response to traditional metrics, and suggested agile metrics.
Top 10 Data Science Practitioner PitfallsSri Ambati
Over-fitting, misread data, NAs, collinear column elimination and other common issues play havoc in the day of practicing data scientist. In this talk, Mark Landry, one of the world’s leading Kagglers, will review the top 10 common pitfalls and steps to avoid them.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This document provides an introduction to Lean UX and UserTesting. It defines UX and Lean UX, discusses the benefits of user testing such as increased revenue and decreased costs, and outlines the UserTesting process including defining objectives, writing tasks, analyzing results, and using metrics and notes. UserTesting allows remote, unmoderated usability testing of digital products through video recordings of testers interacting with designs. The document provides tips for effective user testing through UserTesting.
Learn how to transform from a mild-mannered online organizer into a true data-driven mastermind! What to track, how to test, and methods for creating a data-driven culture at your nonprofit.
The document discusses marketing research methods, including sampling techniques, statistical significance, and customer databases. It covers probability and non-probability sampling methods, how to determine sample size, and factors that affect statistical significance. It also addresses advantages and limitations of customer databases, data analysis techniques, and principles of profiling and segmenting customers.
If you are curious what is ML all about, this is a gentle introduction to Machine Learning and Deep Learning. This includes questions such as why ML/Data Analytics/Deep Learning ? Intuitive Understanding o how they work and some models in detail. At last I share some useful resources to get started.
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to MeCraig Sullivan
An updated deck of a short talk (30m) given at the first Brighton CRO meetup. Contains useful AB testing tools as well as full speaker notes for most of the slides.
Unlock Your Data's Potential By Integrating Qualtrics & TableauQualtrics
Find out what happens when you pair the only enterprise customer experience management platform with the world's most powerful data visualisation software.
The new Qualtrics and Tableau Integration allows you to connect your Tableau desktop to Qualtrics so you can gather and view data in real-time. Join Josh Robbins from Qualtrics and Bob Middleton from Tableau for our webinar where you will learn:
The easiest and most efficient way to get Qualtrics data into Tableau for both ad hoc or continual analysis.
Top tips for engaging your target respondents including keeping surveys mobile friendly, utilizing the survey library (not reinventing the wheel), while making questions easy to understand and much more.
Top tips for creating powerful visualisations.
High impact use cases from customers using the connector.
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
H2O World 2015 - Mark Landry
Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
LKNA 2014 Risk and Impediment Analysis and Analytics - Troy MagennisTroy Magennis
Software risk impact is more predictable than you might think. This session discusses similarities of uncertainty in various industries and relates this back to how we can measure and analyze impediments and risk for agile software teams.
This document discusses max-diff (maximum difference) analysis, which is a method for collecting preference data. It covers when to use max-diff, experimental design considerations, problems with simple "counting" analysis, using latent class analysis instead, and computing preference shares from max-diff data. Latent class analysis addresses issues with counting analysis by accounting for experimental design, inconsistencies in preferences, and differences between individuals.
Searching for Anomalies, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
This document summarizes a presentation on model evaluation given at the 4th annual Valencian Summer School in Machine Learning. It discusses the importance of evaluating models to understand how well they will perform on new data and identify mistakes. Various evaluation metrics are introduced like accuracy, precision, recall, F1 score, and Phi coefficient. The dangers of evaluating on training data are explained, and techniques like train-test splits and cross-validation are recommended to get less optimistic evaluations. Regression metrics like MAE, MSE, and R-squared error are also covered. Different evaluation techniques for specific problem types like imbalanced classification, time series forecasting, and model selection are discussed.
The document discusses test design and provides tips for becoming a better test designer. It explains that test design involves coming up with a well-thought-out and broad set of tests based on the application and schedule. Both over-testing and under-testing should be avoided. It also emphasizes practicing testing, collaborating with others, learning about the application, and finding new testing ideas to expand one's toolbox. The best test tool is noted as being one's own brain.
What is the story with agile data keynote agile 2018 (Magennis)Troy Magennis
This document discusses using data to improve agile practices and outcomes. It argues that agile has lost the "data war" by not capturing and utilizing data from teams effectively. It suggests that data needs to be handled safely to avoid embarrassing people and destroying the utility of historical data. Better ways are needed to measure outcomes rather than just output, and to balance predictability with creativity. The document also discusses visualizing and managing dependencies, comparing performance across teams, and using the right metrics depending on a team's characteristics and challenges. The overarching message is that data needs to be used carefully and conversationally to drive the right actions and improve agile practices.
State of the Art in Machine Learning, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
This document discusses various machine learning model validation techniques and ensemble methods such as bagging and boosting. It defines key concepts like overfitting, underfitting, bias-variance tradeoff, and different validation metrics. Cross validation techniques like k-fold and bootstrap are explained as ways to estimate model performance on unseen data. Bagging creates multiple models on resampled data and averages their predictions to reduce variance. Boosting iteratively adjusts weights of misclassified observations to build strong models, but risks overfitting. Gradient boosting and XGBoost are powerful ensemble methods.
AI Models For Fun and Profit by Walmart Director of Artificial IntelligenceProduct School
Product Management Event at #ProductCon NY on how to create AI models for fun and for profit by Jason Nichols, Director of Artificial Intelligence at Walmart Intelligent Research Lab.
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들Minho Lee
2021-09-04 프롬 특강 발표자료입니다.
---
많은 사람들이 A/B 테스트가 중요하다고 말합니다.
그런데 우리는 뭘 믿고 A/B 테스트에 의사결정을 맡기는 걸까요?
A/B 테스트는 그냥 돌리면 성과를 만들어주는 마법의 도구가 아닙니다.
신뢰할 수 있는 실험 결과를 위해 어떤 고민이 더 필요한지 살펴보려고 합니다.
Making Machine Learning Work in Practice - StampedeCon 2014StampedeCon
At StampedeCon 2014, Kilian Q. Weinberger (Washington University) presented "Making Machine Learning work in Practice."
Here, Kilian will go over common pitfalls and tricks on how to make machine learning work.
This document discusses data quality issues and methods for addressing them. It defines two perspectives on quality - conformance to requirements and fitness for use. There are four costs of poor quality: reputation, prevention, detection, and repair. Methods for addressing quality issues include data editing, imputation, and fabrication. Data editing involves techniques like range tests, deterministic tests, and probabilistic tests to identify potential errors. Imputation is used to handle missing or misreported data, but care must be taken not to over-impute. Fabrication poses a threat as intentionally entering false data undermines quality. Overall the document emphasizes improving quality through understanding sources of error rather than over-editing data.
Background of measuring and metric usage is traditional waterfall projects, psychology of measuring, agile response to traditional metrics, and suggested agile metrics.
Top 10 Data Science Practitioner PitfallsSri Ambati
Over-fitting, misread data, NAs, collinear column elimination and other common issues play havoc in the day of practicing data scientist. In this talk, Mark Landry, one of the world’s leading Kagglers, will review the top 10 common pitfalls and steps to avoid them.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This document provides an introduction to Lean UX and UserTesting. It defines UX and Lean UX, discusses the benefits of user testing such as increased revenue and decreased costs, and outlines the UserTesting process including defining objectives, writing tasks, analyzing results, and using metrics and notes. UserTesting allows remote, unmoderated usability testing of digital products through video recordings of testers interacting with designs. The document provides tips for effective user testing through UserTesting.
Learn how to transform from a mild-mannered online organizer into a true data-driven mastermind! What to track, how to test, and methods for creating a data-driven culture at your nonprofit.
The document discusses marketing research methods, including sampling techniques, statistical significance, and customer databases. It covers probability and non-probability sampling methods, how to determine sample size, and factors that affect statistical significance. It also addresses advantages and limitations of customer databases, data analysis techniques, and principles of profiling and segmenting customers.
If you are curious what is ML all about, this is a gentle introduction to Machine Learning and Deep Learning. This includes questions such as why ML/Data Analytics/Deep Learning ? Intuitive Understanding o how they work and some models in detail. At last I share some useful resources to get started.
Brighton CRO Meetup #1 - Oh Boy These AB tests Sure Look Like Bullshit to MeCraig Sullivan
An updated deck of a short talk (30m) given at the first Brighton CRO meetup. Contains useful AB testing tools as well as full speaker notes for most of the slides.
Unlock Your Data's Potential By Integrating Qualtrics & TableauQualtrics
Find out what happens when you pair the only enterprise customer experience management platform with the world's most powerful data visualisation software.
The new Qualtrics and Tableau Integration allows you to connect your Tableau desktop to Qualtrics so you can gather and view data in real-time. Join Josh Robbins from Qualtrics and Bob Middleton from Tableau for our webinar where you will learn:
The easiest and most efficient way to get Qualtrics data into Tableau for both ad hoc or continual analysis.
Top tips for engaging your target respondents including keeping surveys mobile friendly, utilizing the survey library (not reinventing the wheel), while making questions easy to understand and much more.
Top tips for creating powerful visualisations.
High impact use cases from customers using the connector.
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
H2O World 2015 - Mark Landry
Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
User testing is a fantastic method to discover problems. But why is it such a great user research method? How to make sure you recruit the right participants? How to write the right questions and tasks for your usability test? And what is your job as a moderator? This slide deck answers all your questions on usability testing!
Optimizely Workshop: Take Action on Results with StatisticsOptimizely
Optimizely recently released the stats engine, which moves away from the traditional statistics model and into a new framework that is more aligned with modern business operations. In this workshop, we’ll walk you through the core trade-offs in A/B Testing, and how you can use them to decide when to stop running your test.
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
This document discusses challenges, risks, and how to handle them with AI in the real world. It covers:
- AI can perform tasks like driving a car faster and cheaper than humans, but can't fully explain how.
- Deploying and managing AI models at scale is complex, as is integrating models with user experiences. Bias and lack of transparency are also risks.
- When applying AI, such as in high-risk domains like medicine, it is important to audit models, gradually introduce them with trials, monitor outcomes, and find ways to identify and address errors or unfair impacts. With care and oversight, AI can be developed to help more people than it harms.
Material for the 26 Oct 2015 lecture I held for Aalto University business students. The lecture focuses on the high level topics in analytics and Big Data that are either central to the subject or just highly visible in the media.
The main messages of the lecture are:
- The purpose of analytics and of the data analyst is to solve business problems
- Big Data brings over some very special traits to doing analytics that don't exist when working working with smaller datasets. Understanding these traits is a must for successful analytics.
- Deploying analytics is more dependent on humans than on technology
- Data and analytics are nowadays significant assets to many companies. Therefore they need their own strategy and need to be managed just like any other business critical assets.
The document discusses challenges with typical metrics used in software testing. It notes that counts, percentages and trends used are often inaccurate and lack context. Metrics need to be tied to objectives and drive organizational change to be effective. Sampling approaches in testing need to approximate the actual quality, but randomness may not find as many defects as methodical testing. The presentation provides examples of nominal, ordinal, interval and ratio measures and recommends using the appropriate levels of measurement. It also addresses issues with deriving ratios from lower levels of data and challenges in measuring trends over time.
Similar to Data Science Toolkit for Product Managers (20)
In order to create a more open society, we're celebrating the achievements on openness and collaboration and their impact on people's lives.
Aug 15, 2015 - JOSA's OpenJordan event in Amman, Jordan
The document proposes an open government data system for Jordan with the following key points:
- It would make more government data available to the public in open formats like CSV and JSON to enable academic and commercial uses.
- Data on the system would include both raw datasets and summarized data and insights from government agencies. Formats would need to follow open standards.
- Each dataset would include the raw data files, metadata files describing the data, and checksum files to ensure correctness. Metadata would also provide descriptions, collection methods, and potential uses.
- The system would have a centralized agency to manage it, government agencies to upload data, and public users to access and analyze the data through a web interface or API
Introduction to JOSA's data science bootcamp, which includes introduction to data science itself and information for people interested in this domain.
More material publicly available here: http://bit.ly/josa-dsbc
Jan 2, 2016 - JOSA Data Science Bootcamp
Using technology to improve our innovative business ideas, with focus on IoT and urban development.
May 21, 2017 - Oasis500 Bootcamp session for Urban Development Startups
The document discusses how big data is being applied in financial technology (FinTech). It begins with an agenda and introduction to the speaker, Mahmoud Jalajel. It then discusses how tech companies are leading innovations in FinTech through applications like money transfers. The bulk of the document outlines key concepts in big data including ingestion, ETL processes, software, analytics, and data science. It provides examples of how these are applied in FinTech for areas like predictive modeling, personalization, and fraud detection. Finally, it shares two case studies of startups leveraging big data for applications like automated lending and risk management.
JOSA TechTalk - Lambda architecture and real-time processingMahmoud Jalajel
Although Hadoop might be the first thing to come in your mind when you think of processing large data sets, it is not always the best solution for your Big Data problems.
Hadoop might be the right choice for batch-processing big data, but when it comes to real-time data processing there are other architectures and tools to consider. This TechTalk shows the need behind solving real time data problems and explains the Lambda architecture, mentioning Druid as an example, and the simpler and less expensive "event sourcing model".
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
2. “While others may deliver deadlines for management,
product managers deliver value for users.”
3. ACKNOWLEDGEMENTS
• Abdallah Al-Khalidi
• Ashraf Samhouri
• Ibrahem Abu Hijleh
• Mohammad Obaidat
• Rawan Abu Khadra
• Rema Malkawi
• SereenYaseen
• Yousef Alsayeh
Thank you for sharing your experiences and
doing what product managers do best: nagging!
4. WHO AM ME?
• JOSA member & AIESEC alumnus
• Past Entrepreneur (Currently Undercover)
• Full-Stack Data Scientist:
• Recommender Systems
• Real-time systems
• Other Activities : NLP, Machine Learning, Programming, DevOps,
Hardware, Bash-scripting
5. WHY ARE WE HERE?
• Develop data intuition and argue about data
• Build data culture
• Make data-driven decisions
• Leverage data to build better products
7. CLARIFICATION
• Data Science is not Big Data
• Data Science sometimes uses Big Data technologies.
• Data Science is about extracting value from data
• Motivation: How can I use this data to drive more value.
• Big Data is about solving the data problem
• Motivation:“God! we have too much data our servers are crashing!”
• This talk is about statistics and data science, not big data.
10. MOTIVATING QUESTIONS
• Which is better? Asking users or watching them use
the product?
• How do you get user’s feedback actually fed-back
into the product?
• How do you discover user needs?
• How do you do any of these with thousands of users?
11. WHY A DATA CULTURE
• You can’t improve what you can’t measure
• Data is the best equalizer: From top management to
the freshest interns
• Create accountability around data results
• It's all about culture: Build (or enforce) a data culture.
12. PRODUCT LIFECYCLE
Classical (iteration#1) Data-Driven (iteration#2)
Secondary Research Study Market Analyze Open Datasets
Primary Research Interviews / UserTesting Usage Data Collection
User Criteria (profiling) Demographics Segmentatin User Behavioral Profiling
Personas and Scenario Assumptions & Interviews User Clusetring
Execute: Get feedback Qualitative Feedback A/BTesting
Improve andTest Qualitative Feedback
A/BTesting
Anomalies in usage patterns
14. “Three statisticians are out hunting. Bird flies up out of
the bush, and the first statistician aims and fires.
Unfortunately for them, he missed, the bullet going
about a foot below the bird.The second one fires, but
the bullet goes about a foot above the bird.
The third statistician puts down his gun and says:
‘All right! We got him!’”
15. ANTI-PATTERN:AVERAGE
• Average reduces amount of information into a single
misleading figure.
• average(0, 50, 100) = 50
• average (49, 50, 51) = 50
• average (8, 9, 9, 10, 14, 250) = 50
• average (0, 0, 0, 0, 0, 0, 0, 0, 0, 500) = 50
16.
17. MEDIAN & PERCENTILES
• Percentiles guarantee
data order and guard
against outliers
• Median:Value occuring
at the center of ordered
values.
• Xth
Percentile: Number
larger than X% of data
Average Median 90th P
0,50,100 50 50 50
49,50,51 50 50 50
8, 9, 9,
10, 14,
250
50 9.5 14
0,0,…
500
50 0 0
18. STANDARD DEVIATION
• Used for normally-
distributed dataest
• How spread-out
dataset is?
Average σ Median 90th P
0, 50,
100
50 50 50 50
49, 50,
51
50 1 50 50
8, 9, 9,
10, 14,
250
50 98 9.5 14
0,0,…
500
50 158.1 0 0
20. DOESN’T EVERYTHING FOLLOWS
THE NORMAL DISTRIBUTION?
• Mean and Standard Deviation assumes that data is
normally distributed (has a bell-curve figure)
• In normally-distributed datasets: mean, median, and
mode are all the same.
• Most data is thought to be normally-distributed,
but it actually is not!
23. APPLICATIONS
• Service-Level and Server up time:
• On average, each API call will take 200ms
• 80% of calls under 100ms, 95% of calls under 200ms
• Paying customers:
• On average, each user pays $7
• Segment users, remove outliers and represent them with percentiles:
• 90% of basic users pay $5 or more per month
• 90% of premium users pay $13 or more per month
• One outlier paid us $200 last month. Interesting, let’s investigate!
24. ANTI-PATTERN:ACCURACY
• Accuracy also compresses information into misleading and usually useless figure.
• Example: If 1% of your email is spam.
• Solution#1: Marks all emails as spam → 100% accurate.
• Solution#2: Marks all emails as non-spam → 99% accurate.
• We care a lot about:
• What kind of error happened?
• Can we tolerate it?
• Are all errors born equal? Assign cost per error type.
26. PRECISION AND RECALL
• Accuracy = (TP + TN) / All
• Treats FP and FN equally
• Precision = TP / (TP + FP)
• 100% when FP=0 (no errors returned)
• Useful for search results, sensitive and important information
• “I’d rather say nothing than tell a lie! or embarrass myself with a wrong answer”
• Recall = TP / (TP + FN)
• 100% when FN=0 (When all correct results are returned)
• Useful for passive interactions like recommender systems and loose-searching (similar items)
• “I won’t hide anything from you, even the useless details”
28. OTHER MEASURES
• Weighted errors
• if result >= truth, consider it correct
• if result < truth, consider it wrong
• Loss functions
• if result > truth, take difference as error
• if result < truth, take ten times difference as error
• Sum all errors and try to build a model with the least amount of error
29. APPLICATIONS
• Server caching systems (for speed)
• Search and Recommender Systems
• Product design and error-prioritization
• What kind of error would the user tolerate?
30. A/BTESTING
The road to proper A/B testing is filled with coincidences,
correlations, a comic, and fancy “null hypothesis”
31. WHEN A/BTESTS BECOME
HARMFUL
A bad A/B test can lead to:
• Wasting time, money and effort.
• Making the wrong decision and doing the wrong
thing.
• Inconsistent User Experience.
32. By finding the mind reader in the audience!
How can we avoid A/B side-effects?
33. COINCIDENCE
• If you pull a random answer for any question, how
many times will you be correct?
• How to build a system that makes sure you’re not
doing good coincidentally?
• How harmful can it be to extrapolate a few
samples?
34.
35. CORRELATIONS
1. Alice doesn’t study and gets a full mark, Bob studies hard and fails the
exam → Studying makes you fail.
2. I lost weight and got invited to talk to you → My weight loss caused
you to invite me!
3. Whenever windmills rotate quickly, wind is strong → Windmills cause
wind
4. Sick people smell bad. → Bad odors cause diseases.
5. High altitudes are colder → Altitude causes cold
36.
37. NULL HYPOTHESIS
• By default, everything is random.
• Aim: Disprove null hypothesis (Make it fail the
“random” test)
• Usually disproven with confidence of 95% or 99%
39. FAIR COIN?
• Classical question: How many times should you toss a coin before
deciding it’s a Fair Coin?
• Answer: Depends on:
• What kind of error are you willing to tolerate?
• How confident/sure you want to be
• With 1% error and 68.27% confidence: 2,500 tosses
• With 1% error and 99.90% confidence: 27,225 tosses
47. BASIC RECOMMENDER
• Create co-occurrence table between Product A
and all other products
• When user is on Product#A, show Related Items.
48. SOPHISTICATED
RECOMMENDER
• Time: New (fresh) products appear first
• Location: Products physically near user appear first
• Context:Where is the user seeing the result? mobile / web / extension / chatbot?
• History: How did user interact with this category/brand/tag before?
• Price-Sensitivity: How price-sensitive is the user?
• Ephemeral:Are the current searches & browsing habits converging around a certain pattern?
• Related: How did the user react to similar products?
• Quality: Given an internal quality measure, how good is this product?
50. CONCLUSIONS
• Raw data matters, a lot! UseTableau or a similar
product.
• Always ask one more question.
• Question data sources.
• Build a culture around data possibilities.
51. WATCH / LISTEN / READ
• Book: ”How to lie with statistics” : http://a.co/0DIGMwt
• Podcast:“Data Driven Product Management AtYammer” — http://bit.ly/
data-driven-yammer
• Choice, happiness and spaghetti sauce | Malcolm Gladwell — https://
youtu.be/iIiAAhUeR6Y
• The BayesianTrap: https://youtu.be/R13BD8qKeTg
• Scientific Studies: Last WeekTonight with John Oliver (HBO): https://
youtu.be/0Rnq1NpHdmw
• The Future of Product Management — Janice Fraser: https://youtu.be/
f116MblyZbQ