To download please go to: http://www.intelligentmining.com/knowledge-base.html
Slides as presented by Alex Lin to the NYC Predictive Analytics Meetup group: http://www.meetup.com/NYC-Predictive-Analytics/ on April 1, 2010 (no joke!) :)
This slide discuss predictive data analytics models and their applications in broader content. It gives simple examples of regression and classification.
Predictive Analytics enables organisations to forecast future events, analyse risks and opportunities, and automate decision making processes by analysing historic data.
This slide discuss predictive data analytics models and their applications in broader content. It gives simple examples of regression and classification.
Predictive Analytics enables organisations to forecast future events, analyse risks and opportunities, and automate decision making processes by analysing historic data.
Analytics in offline retail can offer a host of solutions to price optimization, sales & inventory forecasting, aid in supply chain logistics and leveraging demographics to expand new store locations
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
This Data Science presentation will help you understand what is Data Science, who is a Data Scientist, what does a Data Scientist do and also how Python is used for Data Science. Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining. This Data Science tutorial will help you establish your skills at analytical techniques using Python. With this Data Science video, you’ll learn the essential concepts of Data Science with Python programming and also understand how data acquisition, data preparation, data mining, model building & testing, data visualization is done. This Data Science tutorial is ideal for beginners who aspire to become a Data Scientist.
This Data Science presentation will cover the following topics:
1. What is Data Science?
2. Who is a Data Scientist?
3. What does a Data Scientist do?
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. A data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn’s Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Those who complete the course will be able to:
1. Gain an in-depth understanding of data science processes, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
Install the required Python environment and other auxiliary tools and libraries
2. Understand the essential concepts of Python programming such as data types, tuples, lists, dicts, basic operators and functions
3. Perform high-level mathematical computing using the NumPy package and its largelibrary of mathematical functions.
Learn more at: https://www.simplilearn.com
This presentation covers data science buzz words, big data introduction, predictive analytics, and model building methods. Structured vs unstructured. Supervised learning vs unsupervised learning.
Generally in recommendation engines, user's past history on engagements with different items is a key input. However, in many situations in an enterprise’s business cycle, it is necessary to generate recommendations based on user activity in real time. In this Big Data Cloud's meetup on April 3, 2014, we discussed how to decipher real time click streams into meaningful recommendations in real time.
Pranab Ghosh discussed the real time recommendations feature of Sifarish, which is an open source project built on Hadoop, Storm and Redis.
Sifarish is a recommendation engine that does content based recommendation as well as social collaborative filtering based recommendation.
Recommender systems support the decision making processes of customers with personalized suggestions. These widely used systems influence the daily life of almost everyone across domains like ecommerce, social media, and entertainment. However, the efficient generation of relevant recommendations in large-scale systems is a very complex task. In order to provide personalization, engines and algorithms need to capture users’ varying tastes and find mostly nonlinear dependencies between them and a multitude of items. Enormous data sparsity and ambitious real-time requirements further complicate this challenge. At the same time, deep learning has been proven to solve complex tasks like object or speech recognition where traditional machine learning failed or showed mediocre performance.
Join Marcel Kurovski to explore a use case for vehicle recommendations at mobile.de, Germany’s biggest online vehicle market. Marcel shares a novel regularization technique for the optimization criterion and evaluates it against various baselines. To achieve high scalability, he combines this method with strategies for efficient candidate generation based on user and item embeddings—providing a holistic solution for candidate generation and ranking.
The proposed approach outperforms collaborative filtering and hybrid collaborative-content-based filtering by 73% and 143% for MAP@5. It also scales well for millions of items and users returning recommendations in tens of milliseconds.
Event: O'Reilly Artificial Intelligence Conference, New York, 18.04.2019
Speaker: Marcel Kurovski, inovex GmbH
Mehr Tech-Vorträge: inovex.de/vortraege
Mehr Tech-Artikel: inovex.de/blog
Machine Learning has become a must to improve insight, quality and time to market. But it's also been called the 'high interest credit card of technical debt' with challenges in managing both how it's applied and how its results are consumed.
Fairly Measuring Fairness In Machine LearningHJ van Veen
We look at a case and two research papers on measuring discrimination in machine learning models for extending credit. Presentation given as part of the Sao Paulo Machine Learning Meetup, theme "Ethics in Data Science".
modeling and predicting cyber hacking breaches Venkat Projects
Analyzing cyber incident data sets is an important method for deepening our understanding of the evolution of the threat situation. This is a relatively new research topic, and many studies remain to be done. In this paper, we report a statistical analysis of a breach incident data set corresponding to 12 years (2005–2017) of cyber hacking activities that include malware attacks. We show that, in contrast to the findings reported in the literature, both hacking breach incident inter-arrival times and breach sizes should be modeled by stochastic processes, rather than by distributions because they exhibit autocorrelations. Then, we propose particular stochastic process models to, respectively, fit the inter-arrival times and the breach sizes. We also show that these models can predict the inter-arrival times and the breach sizes. In order to get deeper insights into the evolution of hacking breach incidents, we conduct both qualitative and quantitative trend analyses on the data set. We draw a set of cybersecurity insights, including that the threat of cyber hacks is indeed getting worse in terms of their frequency, but not in terms of the magnitude of their damage.
This a reduced PDF version of the hardcover book available at http://www.lulu.com/shop/jeffrey-strickland/predictive-analytics-using-r/hardcover/product-22000910.html, at a 40% discount. It will soon be available on Amazon.
Analytics in offline retail can offer a host of solutions to price optimization, sales & inventory forecasting, aid in supply chain logistics and leveraging demographics to expand new store locations
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
This Data Science presentation will help you understand what is Data Science, who is a Data Scientist, what does a Data Scientist do and also how Python is used for Data Science. Data science is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining. This Data Science tutorial will help you establish your skills at analytical techniques using Python. With this Data Science video, you’ll learn the essential concepts of Data Science with Python programming and also understand how data acquisition, data preparation, data mining, model building & testing, data visualization is done. This Data Science tutorial is ideal for beginners who aspire to become a Data Scientist.
This Data Science presentation will cover the following topics:
1. What is Data Science?
2. Who is a Data Scientist?
3. What does a Data Scientist do?
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. A data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
You can gain in-depth knowledge of Data Science by taking our Data Science with python certification training course. With Simplilearn’s Data Science certification training course, you will prepare for a career as a Data Scientist as you master all the concepts and techniques. Those who complete the course will be able to:
1. Gain an in-depth understanding of data science processes, data wrangling, data exploration, data visualization, hypothesis building, and testing. You will also learn the basics of statistics.
Install the required Python environment and other auxiliary tools and libraries
2. Understand the essential concepts of Python programming such as data types, tuples, lists, dicts, basic operators and functions
3. Perform high-level mathematical computing using the NumPy package and its largelibrary of mathematical functions.
Learn more at: https://www.simplilearn.com
This presentation covers data science buzz words, big data introduction, predictive analytics, and model building methods. Structured vs unstructured. Supervised learning vs unsupervised learning.
Generally in recommendation engines, user's past history on engagements with different items is a key input. However, in many situations in an enterprise’s business cycle, it is necessary to generate recommendations based on user activity in real time. In this Big Data Cloud's meetup on April 3, 2014, we discussed how to decipher real time click streams into meaningful recommendations in real time.
Pranab Ghosh discussed the real time recommendations feature of Sifarish, which is an open source project built on Hadoop, Storm and Redis.
Sifarish is a recommendation engine that does content based recommendation as well as social collaborative filtering based recommendation.
Recommender systems support the decision making processes of customers with personalized suggestions. These widely used systems influence the daily life of almost everyone across domains like ecommerce, social media, and entertainment. However, the efficient generation of relevant recommendations in large-scale systems is a very complex task. In order to provide personalization, engines and algorithms need to capture users’ varying tastes and find mostly nonlinear dependencies between them and a multitude of items. Enormous data sparsity and ambitious real-time requirements further complicate this challenge. At the same time, deep learning has been proven to solve complex tasks like object or speech recognition where traditional machine learning failed or showed mediocre performance.
Join Marcel Kurovski to explore a use case for vehicle recommendations at mobile.de, Germany’s biggest online vehicle market. Marcel shares a novel regularization technique for the optimization criterion and evaluates it against various baselines. To achieve high scalability, he combines this method with strategies for efficient candidate generation based on user and item embeddings—providing a holistic solution for candidate generation and ranking.
The proposed approach outperforms collaborative filtering and hybrid collaborative-content-based filtering by 73% and 143% for MAP@5. It also scales well for millions of items and users returning recommendations in tens of milliseconds.
Event: O'Reilly Artificial Intelligence Conference, New York, 18.04.2019
Speaker: Marcel Kurovski, inovex GmbH
Mehr Tech-Vorträge: inovex.de/vortraege
Mehr Tech-Artikel: inovex.de/blog
Machine Learning has become a must to improve insight, quality and time to market. But it's also been called the 'high interest credit card of technical debt' with challenges in managing both how it's applied and how its results are consumed.
Fairly Measuring Fairness In Machine LearningHJ van Veen
We look at a case and two research papers on measuring discrimination in machine learning models for extending credit. Presentation given as part of the Sao Paulo Machine Learning Meetup, theme "Ethics in Data Science".
modeling and predicting cyber hacking breaches Venkat Projects
Analyzing cyber incident data sets is an important method for deepening our understanding of the evolution of the threat situation. This is a relatively new research topic, and many studies remain to be done. In this paper, we report a statistical analysis of a breach incident data set corresponding to 12 years (2005–2017) of cyber hacking activities that include malware attacks. We show that, in contrast to the findings reported in the literature, both hacking breach incident inter-arrival times and breach sizes should be modeled by stochastic processes, rather than by distributions because they exhibit autocorrelations. Then, we propose particular stochastic process models to, respectively, fit the inter-arrival times and the breach sizes. We also show that these models can predict the inter-arrival times and the breach sizes. In order to get deeper insights into the evolution of hacking breach incidents, we conduct both qualitative and quantitative trend analyses on the data set. We draw a set of cybersecurity insights, including that the threat of cyber hacks is indeed getting worse in terms of their frequency, but not in terms of the magnitude of their damage.
This a reduced PDF version of the hardcover book available at http://www.lulu.com/shop/jeffrey-strickland/predictive-analytics-using-r/hardcover/product-22000910.html, at a 40% discount. It will soon be available on Amazon.
This presentation introduces big data and explains how to generate actionable insights using analytics techniques. The deck explains general steps involved in a typical analytics project and provides a brief overview of the most commonly used predictive analytics methods and their business applications.
Vijay Adamapure is a Data Science Enthusiast with extensive experience in the field of data mining, predictive modeling and machine learning. He has worked on numerous analytics projects ranging from healthcare, business analytics, renewable energy to IoT.
Vijay presented these slides during the Internet of Everything Meetup event 'Predictive Analytics - An Overview' that took place on Jan. 9, 2015 in Mumbai. To join the Meetup group, register here: http://bit.ly/1A7T0A1
Predictive Analytics: Context and Use Cases
Historical context for successful implementation of predictive analytic techniques and examples of implementation of successful use cases.
The Future of Personalized Health Care: Predictive Analytics by @Rock_HealthRock Health
View the archived webinar here: https://www.youtube.com/watch?v=UJak41hIDWc
How can we use new and existing sources of data to deliver better, personalized care? Predictive analytics underlies what has always been conducted by doctors through their training, experience, and decision-making. Dozens of new digital products have hit the market and $1.9B has flowed into the space since 2011—but what does it take for an algorithm to accurately and reliably impact care?
Purchase the report here: https://gumroad.com/l/gzbzV
Three Approaches to Predictive Analytics in HealthcareHealth Catalyst
Predictive analytics in healthcare must be timely, role-specific, and actionable to be successful. There are also three common types of healthcare predictive analytics: Risk scores (risk stratification using CMS-HCC or other models), What-if scenarios (simulations of specific outcomes given a certain combination of events, and Geo-spatial analytics (mapping a geographical location’s patient disease burden). The common thread in all of these is the element of action, or specifically, the intervention that really matters in healthcare predictive analytics.
In an era of Big Data organizations are looking to use analytic insight to improve
their business. Rapidly changing competitive landscapes and the need to evaluate and
adopt new business models is pushing organizations to become more adaptive. How
can these imperatives be reflected in the way we build systems? In response to these imperatives, organizations are increasingly buying or building a new class of systems - Decision Management Systems. Decision Management Systems leverage the growing power of predictive analytics to create agile, analytic and adaptive processes and systems.
Strata 2013: Tutorial-- How to Create Predictive Models in R using EnsemblesIntuit Inc.
This tutorial, based on a published book by Giovanni Seni, offers a hands-on intro to ensemble models, which combine multiple models into a single predictive system that’s often more accurate than the best of its components. Participants will use data sets and snippets of R code to experiment with the methods to gain a practical understanding of this breakthrough technology.
Giovanni Seni is currently a Senior Data Scientist with Intuit where he leads the Applied Data Sciences team. As an active data mining practitioner in Silicon Valley, he has over 15 years R&D experience in statistical pattern recognition and data mining applications. He has been a member of the technical staff at large technology companies, and a contributor at smaller organizations. He holds five US patents and has published over twenty conference and journal articles. His book with John Elder, “Ensemble Methods in Data Mining – Improving accuracy through combining predictions”, was published in February 2010 by Morgan & Claypool. Giovanni is also an adjunct faculty at the Computer Engineering Department of Santa Clara University, where he teaches an Introduction to Pattern Recognition and Data Mining class.
Presentación para la clase de Matemáticas Avanzadas en la maestría de tecnologías de información CUCEA Universidad de Guadalajara en colaboración con Jairo Ramirez
LieDM asociacijos paramos sistema institucijoms organizuojačioms nuotolinį mokymą. Pranešimas pristatytas tarptautinėje konferencijoje Kaune "Atviras profesinis bendradarbiavimas", 2015 m. lapkričio mėn. 5d.
Kenya is currently witnessing substantial growth in the insurance industry, with new products being created at a remarkable rate and the risk underwriting process becoming more and more complex. Claims reserving for the general insurance industry has also developed significantly over the recent years and insurance companies are using numerous methods to project future claims for all their lines of business.
Our aim in this study, therefore, is to come about with a comparison of the different methods of claims reserving for a general insurer with a given claims experience.
Finally, we assess the relative merits and demerits of each of these methods.
Presentation Regression -Predictive analysis using R and Python on 8 December at GHCI16, Bangalore
http://ghcischedule.anitaborg.org/session/predictive-modeling-using-r-and-python/
We review basic reserving methodologies for reserving general insurance like lag analysis and chain ladder. We then move forward to consider multiple stochastic loss reserving models in detail and show how they uncover more insights than basic reserving models.
Performed predictive Data analytics for “Black Friday Sales Dataset” wherein the company wants to predict the purchase amount against the products using Rapid Miner Tool.
Data scientists come in all shapes and sizes when it comes to understanding and experience of machine learning. We take a look at what's possible sklearns capabilities using Python. Concerning data normalization, we make clear what others make difficult to understand. In the case of data normalization, this presentation is an easy to use introduction to machine learning in Python.
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
This is an overview of the goals and roadmap for the Yellowbrick model visualization library (www.scikit-yb.org). If you're interested in contributing to Yellowbrick or writing visualizers, this is a good place to get started.
In the presentation we discuss the expected workflow of data scientists interacting with the model selection triple and Scikit-Learn. We describe the Yellowbrick API and it's relationship to the Scikit-Learn API. We introduce our primary object: the Visualizer, an estimator that learns from data and displays it visually. Finally we describe the requirements for developing for Yellowbrick, the tools and utilities in place and how to get started.
Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the Scikit-Learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines Scikit-Learn with Matplotlib in the best tradition of the Scikit-Learn documentation, but to produce visualizations for your models!
This presentation was given during the opening session of the 2017 Spring DDL Research Labs.
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and TricksSenturus
Senturus shares insights and tips on IBM Cognos 10 Framework Manager Metadata Modeling. View the video recording and download this deck: http://www.senturus.com/resources/cognos-framework-manager-metadata-modeling-tips-tricks/.
Topics Include:
• Use determinants, parameter maps and query macros to implement row level security
• Understand the use of determinants and their importance
• Enhance your metadata by leveraging parameter maps and query macros
See a live demonstration of implementing row-level security based on user attributes, dimensional modeling of relational query subjects and use of Model Design Accelerator.
Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.
Understanding and predicting behavior for each individual customer has always been the ultimate dream for all digital companies. Combining machine learning and big data processing has finally made that dream a reality. In this webcast, you'll learn about the behavior based algorithms Insights uses to predict customer behavior.
Listen to the podcast version here: http://bit.ly/1EYkSIH
View the webcast on Youtube: https://youtu.be/sidTdUkacHw
AI/ML Infra Meetup | ML explainability in MichelangeloAlluxio, Inc.
AI/ML Infra Meetup
May. 23, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Eric Wang (Software Engineer, @Uber)
Uber has numerous deep learning models, most of which are highly complex with many layers and a vast number of features. Understanding how these models work is challenging and demands significant resources to experiment with various training algorithms and feature sets. With ML explainability, the ML team aims to bring transparency to these models, helping to clarify their predictions and behavior. This transparency also assists the operations and legal teams in explaining the reasons behind specific prediction outcomes.
In this talk, Eric Wang will discuss the methods Uber used for explaining deep learning models and how we integrated these methods into the Uber AI Michelangelo ecosystem to support offline explaining.
الموعد الإثنين 03 يناير 2022
143
مبادرة
#تواصل_تطوير
المحاضرة ال 143 من المبادرة
المهندس / محمد الرافعي طرباي
نقيب المبرمجين بالدقهلية
بعنوان
"IT INDUSTRY"
How To Getting Into IT With Zero Experience
وذلك يوم الإثنين 03 يناير2022
السابعة مساء توقيت القاهرة
الثامنة مساء توقيت مكة المكرمة
و الحضور من تطبيق زووم
https://us02web.zoom.us/meeting/register/tZUpf-GsrD4jH9N9AxO39J013c1D4bqJNTcu
علما ان هناك بث مباشر للمحاضرة على القنوات الخاصة بجمعية المهندسين المصريين
ونأمل أن نوفق في تقديم ما ينفع المهندس ومهمة الهندسة في عالمنا العربي
والله الموفق
للتواصل مع إدارة المبادرة عبر قناة التليجرام
https://t.me/EEAKSA
ومتابعة المبادرة والبث المباشر عبر نوافذنا المختلفة
رابط اللينكدان والمكتبة الالكترونية
https://www.linkedin.com/company/eeaksa-egyptian-engineers-association/
رابط قناة التويتر
https://twitter.com/eeaksa
رابط قناة الفيسبوك
https://www.facebook.com/EEAKSA
رابط قناة اليوتيوب
https://www.youtube.com/user/EEAchannal
رابط التسجيل العام للمحاضرات
https://forms.gle/vVmw7L187tiATRPw9
ملحوظة : توجد شهادات حضور مجانية لمن يسجل فى رابط التقيم اخر المحاضرة
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Building a Predictive Model
1. BUILDING A
PREDICTIVE MODEL
AN EXAMPLE OF A PRODUCT
RECOMMENDATION ENGINE
Alex Lin
Senior Architect
Intelligent Mining
alin@intelligentmining.com
2. Outline
Predictivemodeling methodology
k-Nearest Neighbor (kNN) algorithm
Singular value decomposition (SVD)
method for dimensionality reduction
Using a synthetic data set to test and
improve your model
Experiment and results
2
3. The Business Problem
Design
product recommender solution that will
increase revenue.
$$
3
4. How Do We Increase Revenue?
Increase
Conversion
Increase
Revenue Increase
Unit Price
Increase Avg.
Order Value
Increase
Units / Order
4
5. Example
Is this recommendation effective?
Increase
Unit Price
Increase
Units / Order
5
7. Predictive Model
Framework
ML Prediction
Data Features
Algorithm Output
What data? What feature? Which Algorithm ? Cross-sell & Up-sell
Recommendation
7
8. What Data to Use?
Explicit data
Ratings
Comments
Implicit data
Order history / Return history
Cart events
Page views
Click-thru
Search log
In
today’s talk we only use Order history and Cart
events
8
9. Predictive Model
ML Prediction
Data Features
Algorithm Output
Order History What feature? Which Algorithm ? Cross-sell & Up-sell
Cart Events Recommendation
9
10. What Features to Use?
We know that a given product tends to get
purchased by customers with similar tastes or
needs.
Use user engagement data to describe a product.
users
1 2 3 4 5 6 7 8 9 10 … n
item
17 1 .25 .25 1 .25
user engagement vector
10
11. Data Representation / Features
When we merge every item’s user engagement
vector, we got a m x n item-user matrix
users
1 2 3 4 5 6 7 8 9 10 … n
1 1 .25 1 .25
2 .25
items
3 1 .25 1
4 .25 1 .25 1
1 1
…
m
11
12. Data Normalization
Ensurethe magnitudes of the entries in the
dataset matrix are appropriate
users
1 2 3 4 5 6 7 8 9 10 … n
1 1
.5 1
.9 1
.92 1
.49
2 1
.79
items
3 1
.67 1
.46 1
.73
4 1
.39 1
.82 1
.76 1
.69
1 1
…
…
.52 .8
m
Removecolumn average – so frequent buyers
12
don’t dominate the model
13. Data Normalization
Differentengagement data points (Order / Cart /
Page View) should have different weights
Common normalization strategies:
Remove column average
Remove row average
Remove global mean
Z-score
Fill-in the null values
13
14. Predictive Model
ML Prediction
Data Features
Algorithm Output
Order History User engagement Which Algorithm ? Cross-sell & Up-sell
Cart Events vector Recommendation
Data Normalization
14
15. Which Algorithm?
How
do we find the items that have similar user
engagement data?
users
1 2 3 4 5 6 7 8 9 10 … n
1 1 .25 1 1
2 1
items
17 1 1 1 .25 .25
18 1 .25 1 1 1
1
…
.25
m
We
can find the items that have similar user
15
engagement vectors with kNN algorithm
16. k-Nearest Neighbor (kNN)
Find
the k items that have the most similar user
engagement vectors
users
1 2 3 4 5 6 7 8 9 10 … n
1 .5 1 1 1
2 1 .5 1
items
3 1 1 1 1
4 1 .5 1 1
.5 1
…
m 1 .5
Nearest Neighbors of Item 4 = [2,3,1] 16
17. Similarity Measure for kNN
users
1 2 3 4 5 6 7 8 9 10 … n
items
2 1 .5 1
4 1 .5 1 1
Jaccard coefficient:
(1+ 1)
sim(a,b) =
(1+ 1+ 1) + (1+ 1+ 1+ 1) − (1+ 1)
Cosine similarity:
a•b (1*1+ 0.5 *1)
sim(a,b) = cos(a,b) = =
€ a ∗ b (12 + 0.5 2 + 12 ) * (12 + 0.5 2 + 12 + 12 )
2 2
Pearson Correlation:
€ corr(a,b) =
∑ (r − r )(r − r )
i ai a bi b
=
m∑ aibi − ∑ ai ∑ bi
∑ (r − r ) ∑ (r − r )
i ai a
2
i bi b
2
m∑ ai2 − (∑ ai ) 2 m∑ bi2 − (∑ bi ) 2
17
match _ cols * Dotprod(a,b) − sum(a) * sum(b)
=
match _ cols * sum(a 2 ) − (sum(a)) 2 match _ cols * sum(b 2 ) − (sum(b)) 2
19. Predictive Model
Ver. 1: kNN
ML Prediction
Data Features
Algorithm Output
Order History User engagement k-Nearest Neighbor Cross-sell & Up-sell
Cart Events vector (kNN) Recommendation
Data Normalization
19
20. Cosine Similarity – Code fragment
long i_cnt = 100000; // number of items 100K
long u_cnt = 2000000; // number of users 2M
double data[i_cnt][u_cnt]; // 100K by 2M dataset matrix (in reality, it needs to be malloc allocation)
double norm[i_cnt];
// assume data matrix is loaded
……
// calculate vector norm for each user engagement vector
for (i=0; i<i_cnt; i++) {
norm[i] = 0;
for (f=0; f<u_cnt; f++) {
norm[i] += data[i][f] * data [i][f];
} 1. 100K rows x 100K rows x 2M features --> scalability problem
norm[i] = sqrt(norm[i]); kd-tree, Locality sensitive hashing,
}
MapReduce/Hadoop, Multicore/Threading, Stream Processors
// cosine similarity calculation 2. data[i] is high-dimensional and sparse, similarity measures
for (i=0; i<i_cnt; i++) { // loop thru 100Knot reliable --> accuracy problem
are
for (j=0; j<i_cnt; j++) { // loop thru 100K
dot_product = 0;
This leads us to The SVD dimensionality reduction !
for (f=0; f<u_cnt; f++) { // loop thru entire user space 2M
dot_product += data[i][f] * data[j][f];
}
printf(“%d %d %lfn”, i, j, dot_product/(norm[i] * norm[j]));
} 20
// find the Top K nearest neighbors here
…….
21. Singular Value Decomposition
(SVD)
A = U × S ×VT
A U S VT
m x n matrix m x r matrix r x r matrix r x n matrix
€
items
items
rank = k
k<r
users users
users
Ak = U k × Sk × VkT
Low rank approx. Item profile is U k * Sk
items
Low rank approx. User profile is S k *VkT 21
€ Low rank approx. Item-User matrix is
€ U k * Sk * Sk *VkT
€
22. Reduced SVD
Ak = U k × Sk × VkT
Ak Uk Sk VkT
100K x 2M matrix 100K x 3 matrix 3 x 3 matrix 3 x 2M matrix
7 0 0
0 3 0
items
items
0 0 1 users
rank = 3
Descending
Singular Values
users
Low rank approx. Item profile is U k * Sk
22
€
23. SVD Factor Interpretation S
3 x 3 matrix
Singular values plot (rank=512) 7 0 0
0 3 0
0 0 1
Descending
Singular Values
23
More Significant Latent Factors Noises + Others Less Significant
24. SVD Dimensionality Reduction
U k * Sk
<----- latent factors -----> # of users
€
items
3
rank
Need to find the most optimal low rank !!
10
24
25. Missing values
Difference between “0” and “unknown”
Missing values do NOT appear randomly.
Value = (Preference Factors) + (Availability) – (Purchased
elsewhere) – (Navigation inefficiency) – etc.
Approx. Value = (Preference Factors) +/- (Noise)
Modeling missing values correctly will help us make good
recommendations, especially when working with an extremely
sparse data set
25
26. Singular Value Decomposition
(SVD)
Use SVD to reduce dimensionality, so neighborhood
formation happens in reduced user space
SVD helps model to find the low rank approx. dataset
matrix, while retaining the critical latent factors and
ignoring noise.
Optimal low rank needs to be tuned
SVD is computationally expensive
SVD Libraries:
Matlab [U, S, V] = svds(A,256);
SVDPACKC http://www.netlib.org/svdpack/
SVDLIBC http://tedlab.mit.edu/~dr/SVDLIBC/
GHAPACK http://www.dcs.shef.ac.uk/~genevieve/ml.html 26
27. Predictive Model
Ver. 2: SVD+kNN
ML Prediction
Data Features
Algorithm Output
Order History User engagement k-Nearest Neighbors Cross-sell & Up-sell
Cart Events vector (kNN) in reduced Recommendation
space
Data Normalization
SVD
27
28. Synthetic Data Set
Why do we use synthetic data set?
Sowe can test our new model in a controlled
environment
28
29. Synthetic Data Set
16latent factors synthetic e-commerce
data set
Dimension: 1,000 (items) by 20,000 (users)
16 user preference factors
16 item property factors (non-negative)
Txn Set: n = 55,360 sparsity = 99.72 %
Txn+Cart Set: n = 192,985 sparsity = 99.03%
Download: http://www.IntelligentMining.com/dataset/
user_id item_id type
10 42 0.25
10 997 0.25
10 950 0.25 29
11 836 0.25
11 225 1
30. Synthetic Data Set
Item property User preference Purchase Likelihood score
1K x 20K matrix
factors factors
1K x 16 matrix 16 x 20K matrix
X11 X12 X13 X14 X15 X16
x
X21 X22 X12 X24 X25 X26
y
items
X31 X32 X33 X34 X35 X36
a b c z
X41 X42 X43 X44 X45 X46
X51 X52 X53 X54 X55 X56
users
X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z
X32 = Likelihood of Item 3 being purchased by User 2
30
31. Synthetic Data Set
X11 X31 X51
Based on the distribution,
pre-determine # of items
X21 X41 purchased by an user X41
(# of item=2)
X31 Sort by Purchase X21 From the top, select and skip
X31
likelihood Score certain items to create data
X41 X51 sparsity. X21
X51 X11 X11
User 1 purchased Item 4 and Item 1
31
32. Experiment Setup
Each model (Random / kNN / SVD+kNN) will
generate top 20 recommendations for each item.
Compare model output to the actual top 20
provided by synthetic data set
Evaluation Metrics :
Precision %: Overlapping of the top 20 between model
output and actual (higher the better)
{Found _ Top20 _ items} ∩ {Actual _ Top20 _ items}
Precision =
{Found _ Top20 _ items}
Quality metric: Average of the actual ranking in the
model output (lower the better)
€ 32
1 2 30 47 50 21 1 2 368 62 900 510
33. Experimental Result
kNN vs. Random (Control)
Precision % Quality
(higher is better) (Lower is better)
33
34. Experimental Result
Precision % of SVD+kNN
Recall %
(higher is better)
Improvement
34
SVD Rank
35. Experimental Result
Quality of SVD+kNN
Quality
(Lower is better)
Improvement
35
SVD Rank
36. Experimental Result
The effect of using Cart data
Precision %
(higher is better)
36
SVD Rank
38. Outline
Predictivemodeling methodology
k-Nearest Neighbor (kNN) algorithm
Singular value decomposition (SVD)
method for dimensionality reduction
Using a synthetic data set to test and
improve your model
Experiment and results
38
39. References
J.S. Breese, D. Heckerman and C. Kadie, "Empirical Analysis of
Predictive Algorithms for Collaborative Filtering," in Proceedings of the
Fourteenth Conference on Uncertainity in Artificial Intelligence (UAI
1998), 1998.
B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative
filtering recommendation algorithms," in Proceedings of the Tenth
International Conference on the World Wide Web (WWW 10), pp. 285-295,
2001.
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl "Application of
Dimensionality Reduction in Recommender System A Case Study" In
ACM WebKDD 2000 Web Mining for E-Commerce Workshop
Apache Lucene Mahout http://lucene.apache.org/mahout/
Cofi: A Java-Based Collaborative Filtering Library
http://www.nongnu.org/cofi/
39