This document discusses real-time recommendation systems and describes the Sifarish recommendation engine implementation. Sifarish uses Hadoop, Storm, and Redis to process both batch and real-time recommendations. It generates recommendations through content-based analysis, social recommendations based on user behavior, and real-time processing of new user event data through Storm. Sifarish provides features like implicit rating generation, item correlation analysis, time-sensitive recommendations, and business goal injection for generating personalized recommendations at scale.
To download please go to: http://www.intelligentmining.com/knowledge-base.html
Slides as presented by Alex Lin to the NYC Predictive Analytics Meetup group: http://www.meetup.com/NYC-Predictive-Analytics/ on April 1, 2010 (no joke!) :)
In this lecture, I will first cover the recent advances in neural recommender systems such as autoencoder-based and MLP-based recommender systems. Then, I will introduce the recent achievement for automatic playlist continuation in music recommendation.
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
To download please go to: http://www.intelligentmining.com/knowledge-base.html
Slides as presented by Alex Lin to the NYC Predictive Analytics Meetup group: http://www.meetup.com/NYC-Predictive-Analytics/ on April 1, 2010 (no joke!) :)
In this lecture, I will first cover the recent advances in neural recommender systems such as autoencoder-based and MLP-based recommender systems. Then, I will introduce the recent achievement for automatic playlist continuation in music recommendation.
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
Presentation given for one of Pearson's Data Research teams. It motivates the use of recommender systems, describes common approaches to building and evaluating them and gives examples of how they are used in Mendeley. Thanks to Maya Hristakeva for creating some of the slides.
Recommender Systems represent one of the most widespread and impactful applications of predictive machine learning models.
Amazon, YouTube, Netflix, Facebook and many other companies generate an important fraction of their revenues thanks to their ability to model and accurately predict users ratings and preferences.
In this presentation we cover the following points:
→ introduction to recommender systems
→ working with explicit vs implicit feedback
→ content-based vs collaborative filtering approaches
→ user-based and item-item methods
→ machine learning and deep learning models
→ pros & cons of the methods: scalability, accuracy, explainability
GTC 2021: Counterfactual Learning to Rank in E-commerceGrubhubTech
Many ecommerce companies have extensive logs of user behavior such as clicks and conversions. However, if supervised learning is naively applied, then systems can suffer from poor performance due to bias and feedback loops. Using techniques from counterfactual learning we can leverage log data in a principled manner in order to model user behaviour and build personalized recommender systems. At Grubhub, a user journey begins with recommendations and the vast majority of conversions are powered by recommendations. Our recommender policies can drive user behavior to increase orders and/or profit. Accordingly, the ability to rapidly iterate and experiment is very important. Because of our powerful GPU workflows, we can iterate 200% more rapidly than with counterpart CPU workflows. Developers iterate ideas with notebooks powered by GPUs. Hyperparameter spaces are explored up to 8x faster with multi-GPUs Ray clusters. Solutions are shipped from notebooks to production in half the time with nbdev. With our accelerated DS workflows and Deep Learning on GPUs, we were able to deliver a +12.6% conversion boost in just a few months. In this talk we hope to present modern techniques for industrial recommender systems powered by GPU workflows. First a small background on counterfactual learning techniques, then followed by practical information and data from our industrial application.
By Alex Egg, accepted to Nvidia GTC 2021 Conference
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
Presentation given for one of Pearson's Data Research teams. It motivates the use of recommender systems, describes common approaches to building and evaluating them and gives examples of how they are used in Mendeley. Thanks to Maya Hristakeva for creating some of the slides.
Recommender Systems represent one of the most widespread and impactful applications of predictive machine learning models.
Amazon, YouTube, Netflix, Facebook and many other companies generate an important fraction of their revenues thanks to their ability to model and accurately predict users ratings and preferences.
In this presentation we cover the following points:
→ introduction to recommender systems
→ working with explicit vs implicit feedback
→ content-based vs collaborative filtering approaches
→ user-based and item-item methods
→ machine learning and deep learning models
→ pros & cons of the methods: scalability, accuracy, explainability
GTC 2021: Counterfactual Learning to Rank in E-commerceGrubhubTech
Many ecommerce companies have extensive logs of user behavior such as clicks and conversions. However, if supervised learning is naively applied, then systems can suffer from poor performance due to bias and feedback loops. Using techniques from counterfactual learning we can leverage log data in a principled manner in order to model user behaviour and build personalized recommender systems. At Grubhub, a user journey begins with recommendations and the vast majority of conversions are powered by recommendations. Our recommender policies can drive user behavior to increase orders and/or profit. Accordingly, the ability to rapidly iterate and experiment is very important. Because of our powerful GPU workflows, we can iterate 200% more rapidly than with counterpart CPU workflows. Developers iterate ideas with notebooks powered by GPUs. Hyperparameter spaces are explored up to 8x faster with multi-GPUs Ray clusters. Solutions are shipped from notebooks to production in half the time with nbdev. With our accelerated DS workflows and Deep Learning on GPUs, we were able to deliver a +12.6% conversion boost in just a few months. In this talk we hope to present modern techniques for industrial recommender systems powered by GPU workflows. First a small background on counterfactual learning techniques, then followed by practical information and data from our industrial application.
By Alex Egg, accepted to Nvidia GTC 2021 Conference
IOTA 2016 Social Recomender System Presentation.ASHISH JAGTAP
In today’s age of ever increasing use of internet, there are around 74% active internet users out of which 60% users contribute to social networking and most of them are students from the age group 16-30. If this young generation is targeted specifically towards educational activities keeping the same social networking environment in the background would create interest in students for educational activities and also yield productive results. This can be implemented by creating a social-cum-educational portal with recommender systems. Specific information to specific student can be provided. Use of such technology can reduce the gap between students and the information which can lead to their inherent development and success! However, most of the existing Social Recommender systems do not have good scalabilities which are unable to process huge volumes of data. Aiming to this problem we can design a social recommender system based on Hadoop and its parallel computing platform.
Dataset: Gather a large dataset of laptops and their features, including processor speed, RAM, storage, and display size, along with their corresponding prices.
Feature engineering: Extracting meaningful features from the dataset, such as brand, model, and year, and transforming them into a format that machine learning algorithms can use.
Model selection: Choosing the most appropriate machine learning algorithm, such as linear regression, decision tree, or random forest, based on the type of data and desired level of accuracy.
Model training: Splitting the dataset into training and testing sets, and using the training data to train the machine learning model.
Model evaluation: Testing the model's performance on the testing data and evaluating its accuracy using metrics such as mean squared error or R-squared.
Hyperparameter tuning: Optimizing the model's hyperparameters, such as learning rate or regularization strength, to achieve the best performance.
Project Explanation: Book Recommendation System
The goal of this project was to develop a book recommendation system that provides personalized recommendations to users based on their preferences and past reading behavior. The project involved the following key steps:
1. Data Collection: I gathered a comprehensive dataset of books, including information such as titles, authors, genres, and user ratings. This data was obtained from various reliable sources, such as online bookstores or publicly available book datasets.
2. Data Preprocessing: The collected data required cleaning and preprocessing to ensure its quality and consistency. I handled missing values, resolved inconsistencies in book titles or authors, and standardized the data format for further analysis.
3. Exploratory Data Analysis: I performed exploratory data analysis to gain insights into the dataset. This included analyzing book genres, distribution of user ratings, and identifying popular authors or books.
4. Feature Engineering: To capture the preferences and interests of users, I created relevant features from the available data. These features could include book genres, authors, user demographics, or historical reading behavior.
5. Recommendation Model Development: I developed a recommendation model using collaborative filtering techniques or content-based filtering methods. Collaborative filtering utilizes the preferences of similar users to make recommendations, while content-based filtering suggests books based on their attributes and user preferences. I employed popular machine learning algorithms, such as matrix factorization or k-nearest neighbors, to build the recommendation model.
6. Model Evaluation: I evaluated the performance of the recommendation system using metrics such as precision, recall, or mean average precision. I also conducted A/B testing or cross-validation to assess the system's effectiveness and optimize its performance.
7. User Interface Development: I created a user-friendly interface where users could input their preferences and receive personalized book recommendations. The interface provided an intuitive and interactive experience, allowing users to explore recommended books and provide feedback.
8. Deployment and Feedback Loop: The recommendation system was deployed in a production environment, where users could access it and provide feedback on the recommended books. This feedback was incorporated into the system to continually improve its accuracy and relevance over time.
By completing this project, I gained hands-on experience in data collection, preprocessing, exploratory data analysis, and recommendation system development. I demonstrated my ability to leverage machine learning algorithms and user data to build a personalized book recommendation system that enhances user engagement and satisfaction.
Olist Store Analysis
ccording to the data, Olist E-commerce has about 99,440 orders. With about 89,940 orders being delivered, the company has a 90% delivery success rate.
✔Their average product rating is 4.09 stars, with product categories going as high as 4.67 stars and as low as 2.5 stars. 1 Star reviews are on third place in the review score distribution ranking which likely indicates that there could be problems with product quality in some product categories
✔It helps in understanding the spending patterns of customers in sao paulo city .it also helps Olist in identifying high value customers and creating targeted marketing campaigns.
Explain Yourself: Why You Get the Recommendations You DoDatabricks
Machine learning recommender systems have supercharged the online retail environment by directly targeting what the customer wants. While customers are getting better product recommendations than ever before, in the age of GDPR there is growing concern about customer privacy and transparency with ML models. Many are asking, just why am I receiving these recommendations? While the current Implicit Collaborative Filtering (CF) algorithm in spark.ml is great for generating recommendations at scale, its currently lacks any method to explain why a particular customer is getting the recommendations they are getting. In this talk, we demonstrate a way to expand collaborative filtering so that the viewing history of a customer can be directly related to their recommendations. Why were you recommended footwear? Well, 40% of this recommendation came from browsing runners and 20% came from the shorts you recently purchased. Turns out, rethinking of the linear algebra in the current spark.ml CF implementation makes this possible. We show how this is done and demonstrate its implemented as a new feature to spark.ml, expanding the API to allow everyone to explain recommendations at scale and create a more transparent ML future.
Authors: Niels Hanson Kishori Konwar
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsBigDataCloud
This webinar discusses cloud based Machine Learning platforms in detail while identifying suitable business use cases for each of them: Microsoft Azure ML, Amazon Machine Learning DataBricks Cloud
Crime Analysis & Prediction System is a system to analyze & detect crime hotspots & predict crime.
It collects data from various data sources - crime data from OpenData sites, US census data, social media, traffic & weather data etc.
It leverages Microsoft's Azure Cloud and on premise technologies for back-end processing & desktop based visualization tools.
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
A tutorial given at NAACL HLT 2013.
Richard Socher and Christopher Manning
http://nlp.stanford.edu/courses/NAACL2013/
Machine learning is everywhere in today's NLP, but by and large machine learning amounts to numerical optimization of weights for human designed representations and features. The goal of deep learning is to explore how computers can take advantage of data to develop features and representations appropriate for complex interpretation tasks. This tutorial aims to cover the basic motivation, ideas, models and learning algorithms in deep learning for natural language processing. Recently, these methods have been shown to perform very well on various NLP tasks such as language modeling, POS tagging, named entity recognition, sentiment analysis and paraphrase detection, among others. The most attractive quality of these techniques is that they can perform well without any external hand-designed resources or time-intensive feature engineering. Despite these advantages, many researchers in NLP are not familiar with these methods. Our focus is on insight and understanding, using graphical illustrations and simple, intuitive derivations. The goal of the tutorial is to make the inner workings of these techniques transparent, intuitive and their results interpretable, rather than black boxes labeled "magic here". The first part of the tutorial presents the basics of neural networks, neural word vectors, several simple models based on local windows and the math and algorithms of training via backpropagation. In this section applications include language modeling and POS tagging. In the second section we present recursive neural networks which can learn structured tree outputs as well as vector representations for phrases and sentences. We cover both equations as well as applications. We show how training can be achieved by a modified version of the backpropagation algorithm introduced before. These modifications allow the algorithm to work on tree structures. Applications include sentiment analysis and paraphrase detection. We also draw connections to recent work in semantic compositionality in vector spaces. The principle goal, again, is to make these methods appear intuitive and interpretable rather than mathematically confusing. By this point in the tutorial, the audience members should have a clear understanding of how to build a deep learning system for word-, sentence- and document-level tasks. The last part of the tutorial gives a general overview of the different applications of deep learning in NLP, including bag of words models. We will provide a discussion of NLP-oriented issues in modeling, interpretation, representational power, and optimization.
Why Hadoop is the New Infrastructure for the CMO?BigDataCloud
As the big data market matures, discussions about Hadoop are expanding from pure technology to how businesses can use it to innovate and leap frog competitors. In this session, Karmasphere will outline how technologists can effectively work with their CMOs - the likely drivers of widespread Hadoop adoption, to unlock its business value. The discussion will include: how changes in marketing are driving the adoption of Hadoop big data analytics, the evolving role of the data and business analysts and a review of real-world big data analytics use cases.
Karmasphere will demonstrate how the Full Fidelity Analytics of Hadoop can empower high-tech, e-commerce, etail and reatil banking to quickly and easily analyze complex data types across silos and apply sophisticated analytics to personalize customer engagement and optimize revenue.
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalBigDataCloud
As adoption of Hadoop across enterprises has skyrocketed, a variety of business use cases have emerged. In this talk, Milind would highlight a few use cases, and talk about emerging use cases that are shaping the future of the Hadoop platform.
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBigDataCloud
"Navigating the Database Universe" was the topic of the Big Data Cloud meetup held on Jan 24th 2013 in Santa Clara, CA. This is the presentation made by Mike Stonebraker & Scott Jarr of VoltDB.
This meetup was sponsored by VoltDb.
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBigDataCloud
Security is the greatest challenge for the widespread adoption of Hadoop in enterprises.
This meetup will discuss ways and means of how such challenges are being met with various solutions and/or products in the industry today. Industry security experts will showcase their varied experiences.
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
At Facebook, we use various types of databases and storage system to satisfy the needs of different applications. The solutions built around these data store systems have a common set of requirements: they have to be highly scalable, maintenance costs should be low and they have to perform efficiently. We use a sharded mySQL+memcache solution to support real-time access of tens of petabytes of data and we use TAO to provide consistency of this web-scale database across geographical distances. We use Haystack datastore for storing the 3 billion new photos we host every week. We use Apache Hadoop to mine intelligence from 100 petabytes of clicklogs and combine it with the power of Apache HBase to store all Facebook Messages.
This talk describes the reasons why each of these databases are appropriate for their workloads and the design decisions and tradeoffs that were made while implementing these solutions. We touch upon the consistency, availability and partitioning tolerance of each of these solutions. We touch upon the reasons why some of these systems need ACID semantics and other systems do not. We briefly touch upon some futures of how we plan to do big-data deployments across geographical locations and our requirements for a new breed of pure-memory and pure-SSD based transactional database.
What Does Big Data Mean and Who Will WinBigDataCloud
Michael Ralph Stonebraker is a computer scientist specializing in database research. He is currently an adjunct professor at MIT, where he has been involved in the development of the Aurora, C-Store, H-Store, Morpheus, and SciDB systems.Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational database systems on the market today. He is also the founder of a number of database companies, including Ingres, Illustra, Cohera, StreamBase Systems, Vertica, VoltDB, and Paradigm4. He was previously the Chief Technical Officer (CTO) of Informix & a Professor of Computer Science at University of California, Berkeley. He is also an editor for the book "Readings in Database Systems"
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
Big Data Analytics is characterized by analysis of data on three vectors: exploding data volume, proliferating data variety (relational, multi-media), and accelerating data velocity. However, other key vectors such as costs and skill set needed for Big Data Analytics are often overlooked. In this session, we will consider all five vectors by exploring various techniques where traditional but progressive technologies such as column store DBMS and Event Stream Processing is combined with open source frameworks such as Hadoop to exploit the full potential of Big Data Analytics.
Agenda:
- Big Data Analytics in the real world
- Commercial and Open Source techniques
- Bringing together Commercial and Open Source techniques
* Architectures
* Programming APIs
(e.g. embedded and federated MapReduce)
- Conclusions
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
2. CONTENTS
Recommendation processing concepts
Hadoop, Storm & Redis based Recommendation
Engine implementation in ‘Sifarish’.
Content based recommendation and social
recommendation
Key distinguishing features of ‘Sifarish’ compared
to Apache Mahout
Real time Social Recommendations
2
3. HADOOP AT 30,000 FT
Power of functional programming and parallel
processing join hands to create Hadoop
Basically parallel processing framework running on
cluster of commodity machines
Stateless functional programming because
processing of each row of data does not depend
upon any other row or any state
Divide and conquer parallel processing. Data gets
partitioned and each partition get processed by a
separate mapper or reducer task.
3
4. STORM AT 30,000 FT
Clustered framework for scalable real time stream
processing
Like Hadoop, parallel processing framework running
on cluster of commodity machines
Instead of processes as in Hadoop, uses a combination
of processes and threads for parallelism
Unlike 2 processing stages in Hadoop (map and
reduce) there can be multiple processing stages
defined in a Storm topology.
Unlike a Hadoop job, a topology once deployed runs
continuously.
4
5. REDIS AT 30,000 FT
It’s a wonderful glue for Big Data eco system
Can be thought of as a distributed data
structure server
Can be used as a list, queue, cache etc.
Supports master slave replication
There is no sharding support
5
6. RECOMMENDATION SYSTEMS
• You know recommender systems if you have visited
Amazon or Netflix.
• Very computationally intensive, ideal for Big Data
processing.
• In memory based recommendation engines, the entire
data set is used directly e.g user behavior based
recommendation a.k.a social recommendation or
content based recommendation engine. This is our focus.
• Model based recommendation, a model is built first by
training the data and then predictions made e.g.,
Bayesian, decision tree
6
7. CONTENT BASED RECOMMENDATION
Recommendation is based on innate attributes of
items under consideration
Each item is considered to be a point in an n
dimensional feature space, where the item has n
attributes
Distance between items in n dimensional space is
computed to find similarities between items.
Similarity is inversely proportional to distance
Attributes can be numerical, categorical or text.
Not effective in for cross sell recommendation
Essential for boot strapping recommender system
7
8. CONTENT BASED RECOMMENDATION
Distance between numerical attributes is simply
the difference in values
Distance between categorical attributes is 0 if
same 1 otherwise
Distance between text attributes is based on
either jaccard distance or cosine distance
Distance between corresponding attributes is
aggregated to find distance between items
Different weights can be assigned to different
attributes for the aggregation to control the
contribution of particular attribute
8
9. COLD START
When bootstrapping a business no user behavior
data is available.
Content based recommendation is the only option.
Distance calculation is performed between user
profile and items.
Two different kinds of entities. Attributes from one
entity is mapped to attributes of the other entity.
User profile may have been provided explicitly by
user or derived from user behavior e.g. pages
visited, search terms etc.
9
10. WARM START
Refers to the case when some limited amount
interaction data is available
The user may have browsed and / or bought some
item
We use content based recommendation again, but
we find similarities between items of same type
(e.g., product)
Use SameTypeSimilarity MR to find distance
beween pairs of items for all possible pair
10
11. SOCIAL RECOMMENDATION
• Customers are fully engaged and significant amount of
user behavior data is available
• Recommendation algorithms are based on user behavior
data only
• Consider a matrix of user and item. Items are rows and
users are columns a.k.a utility matrix. The matrix is
sparse
• The cell value could be boolean e.g., whether user has
purchased an item or shown interest in some way
• The cell value could also be numeric representing rating.
Rating could be exclusive and derived from user
behavior data
11
12. SOCIAL RECOMMENDATION
• The purpose of recommenders is to fill in the blanks
in the utility matrix
• If an user has rated A, then enough users must have
rated A as well as other items, for recommendation
to be effective
• Effective in cross sell recommendation.
• The utility matrix is dynamic causing drift in the
underlying model.
• Periodic re-computation is necessary depending
upon the rate of change
12
13. DISTANCE BASED SOCIAL RECOMMENDATION
• Consider rows of the utility matrix, which are items
vectors. The vector is n dimensional if there n users
• We can find distances between pair of item vectors
• Consider a matrix of user and item. Items are rows
and users are columns a.k.a utility matrix
• The cell value could be boolean e.g., whether user
has purchased an item or shown interest in some
way
• The cell value could also be numeric representing
rating. Rating could be exclusive and derived from
user behavior data
13
14. ITEM CORRELATION
• We can find distances between pair of item
vectors, using distance algorithms discussed
earlier.
• ItemDynamicAttributeSimilarity is the MR used.
Distance or correlation algorithm can be
configured to Jaccard, Cosine or Pearson.
• This is known as item based correlation. The other,
although less preferred, approach is user based
correlation.
14
16. IMPLICIT RATING ESTIMATE
• Generally users don’t explicitly rate items. It tends
to be biased because users with extreme views
tend to rate more
• The MR ImplicitRatingEstimator converts user
engagement data (e.g, browsing product
description page, product review page, placing item
in shopping cart etc) to a rating value.
• This is an optional processing phase necessary,
when explicit rating data is not available
16
17. RATING PREDICTOR
• Based on rating by an user u1 for item i1, the rating
for an item i2 is predicted using the correlation
between i1 and i2
• The MR job for rating prediction is UtilityPredictor
• The correlation between items can be
multiplicative or additive. The type of correlation to
be used can be set through a configuration
parameter.
• For multiplicative correlation, the algorithms are
Jaccard, Cosine or Pearson, as mentioned earlier.
• The next slide is on additive correlation
17
18. ADDITIVE ITEM CORRELATION
• Also known as Slope One Recommender
• If a set of users have rated two items i1 and i2, we
find the average rating difference between and i2
and i1.
• If an user has rating for i2, we can predict the rating
for i1 based on the average of the difference
• The steps can be repeated, e.g. find average rating
difference between i3 and i1 and if the user has
rating for i3, get another prediction for rating of i1.
18
19. AGGREGATION OF PREDICTED RATING
• If an user u1 has rated items i1, i2, ..i5, all of them
could be correlated to an item i9. All 5 items will
contribute towards prediction of rating for the item
i9
• The MR UtilityAggregator aggregates predicted
rating.
• We can either take average or median of all
predicted ratings during. The choice can be made
through configuration
19
20. BUSINESS GOAL INJECTION
• This is an optional processing phase, where items
are associated with scores indicative of business
interest (e.g. preferring items with excess
inventory) in recommending an item
• Final recommendation score is a weighted average
between predicted rating and the business goal
score. The relative weights are configurable.
• The MR for this processing is BusinessGoalInjector
20
21. GROUP BY USER
• This is an optional task that groups the
recommended items produced by the
processing steps discussed so far by user ID
• The MR class TextSorter performs this task
21
22. TIME SENSITIVE RECOMMENDATION
• Timestamp is associated with rating
data. Each cell in the rating matric has
an associated time stamp.
• When processing, past rating data
beyond a specified time window is
discarded.
• Time window can be specified as a
configuration parameter.
22
23. USER SEGMENTATION
• When user population is not homogenous, it
is better to segment the users by clustering
or other means
• Separate utility matrix should be built for
each segment.
• Ratings should be predicted for each
segment separately by running the MR
pipeline for each segment
23
24. KEY DISTINGUISHING FEATURES OF SIFARISH
•Implicit rating generation from explicit user
engagement events for social recommendation
•Semantic matching using RDF model for knowledge
representation for content based recommendation
•Supports time widow, location attributes for content
based recommendation
•Time sensitive social recommendation
•Business goal infused social recommendation
•Real time social recommendation
•Serendipity and novelty in social recommendation
(planned)
24
26. REAL TIME
RECOMMENDATION PROCESSING FLOW
• 1 - Copy historical event click stream data to HDFS
• 2 - Copy output of multiple MR i.e. item correlation
matrix to Redis cache. This needs to be done
whenever correlation matrix is re computed
• 3 - Copy event mapping metadata to Redis cache.
This is one time operation.
• 4 - Write real time event click stream data to Redis
queue
• 5 -Storm consumes event mapping metadata from
Redis cache when the storm topology starts up.
26
27. REAL TIME
RECOMMENDATION PROCESSING FLOW
• 6 - Storm consumes item correlation matrix from
Redis cache
• 7 - Storm consumes event click stream data from
Redis queue
• 8 - Storm writes recommended items for an user to
Redis queue or cache
• 9 -Application server consumes recommended
items from Redis queue or cache
27
28. REAL TIME
RECOMMENDATION PROCESSING
• Only recent user engagement data is used. Recency
is defined per session, by time window or event
count.
• However, historical user engagement event is used
to compute item correlation matrix using Hadoop.
• Historical user engagement event data is converted
to implicit rating by Hadoop MR which is consumed
by several more Hadoop MR to generate the item
correlation matrix.
• Item correlation matrix is saved in Redis as a map
for later consumption by Storm
28
29. REAL TIME RECOMMENDATION
PROCESSING
• Storm ingests real time user engagement click
stream data from a Redis queue and uses items
correlation matrix generated by Hadoop to make
Real time recommendation
• Storm writes recommended items to another Redis
queue or cache
• In the next several slides we will go through some
details of the steps involved
29
30. GENERATE IMPLICIT RATING
• As mentioned earlier this is generated by a Hadoop MR
ImplicitRatingEstimator.
• Uses pre processes click stream data consisting of
(userID, sessionID, eventType, timestamp).
• There are different event types indicative of user’s level
of intent or interest for an item e.g. purchased item, in
checkout, placed in shopping cart, browsed from search
results etc.
• Events with strongest intent level are extracted from the
click stream along with the counts for such event. This
information is mapped to an implicit rating based some
heuristics.
30
31. CONVERTING IMPLICIT RATING TO A
COMPACT FORM
• Implicit rating generated in the previous step is of the
format (userID, itemID, rating)
• However item correlation generating MR
ItemDynamicAttributeSimilarity expects data is a
compact format as (itemdID1, userID1:rating1,
userID2:rating2,..)
• The format conversion is done through the Hadoop MR
CompactRatingFormatter. It’s essentially a group by
operation.
31
32. ITEM CORRELATION
• The MR ItemDynamicAttributeSimilarity generates item
correlation with the output format (itemdID1, itemID2,
corr1)
• There are many configuration parameters involved, the
important being correlation algorithm, the choices being
Jaccard, Cosine and Pearson
• For real time processing the correlation data needs to
be a sparse matrix form.
• The MR CorrelationMatrixBuilder does the necessary
transformation with the output being of the format
(itemID1, itemID2:corr1, itemID2:corr2,….)
32
33. CACHING ITEM CORRELATION
• Item correlation matrix is loaded into a Redis map
using a python script. The map key is the item ID
and the value is the list of correlated itemIDs along
with corresponding correlation coefficients
• A storm bolt reads the correlated items and
coefficients from Redis, when it receives a new user
engagement tuple from the Redis queue. The storm
bolt also caches the correlation values in an in-
memory Google Guava cache.
33
34. CACHING USER EVENT TO RATING
MAPPING METADATA
• This mapping meta data is used by Storm bolt
to convert real time user engagement event
data to implicit rating
• The metadata JSON file content is loaded into
a Redis cache by a python script.
• This is an one time operation. However it
needs to be reloaded, if the metadata is
changed.
34
35. STORM PROCESSING
• A Storm Redis Spout consumes user event data from a
Redis queue.
• The event data is distributed across multiple Storm Bolt
instances. The data is partitioned by userID (field grouped
in Storm terminology)
• The bolt on receipt of the event data, estimates rating
based on recent user engagement event click stream data
• It also looks up the corresponding row of the item
correlation matrix from the in memory Google Guava
cache using itemID as the key
• Guava cache loads from Redis cache in case of cache miss.
35
36. STORM PROCESSING
• Predicted ratings are calculated for items correlated with the
item in the user event using the estimated rating and the item
correlation row vector
• The predicted rating vector is aggregated with the cumulative
predicted rating vector
• The cumulative predicted rating vectors are sorted by rating
value and the top n items along with associated predicted
ratings are written to a Redis queue.
• Output is written to the Redis queue in the format
(userID1,itemIe1:rating1,itemdID2:rating2)
• Optionally, the recommendation output can be written to a
Redis cache with userID as the key and recommended items as
the value
36
37. EVENT CLICK STREAM
• The storm bolt maintains an window of recent user
engagement event click stream data in an in-memory
cache
• The click stream data expiry in the window can be
managed in several ways driven by configuration
• If data is expired by session, whenever a new session is
encountered for an user, the window is cleared
• If data is expired by time span, any event data older is
discarded from the window
• If data is expired by a maximum count, older data is
discarded from the window when the window size
exceeds limit
37
38. EVENT CLICK STREAM
• The storm bolt maintains an window of recent event click
stream data in an in memory cache
• The click stream data expiry in the window can be
managed in several ways driven by configuration
• If data is expired by session, whenever a new session is
encountered for an user, the window is cleared
• If data is expired by time span, any event data older is
discarded from the window
• If data is expired by a maximum count, older data is
discarded from the window when the window size
exceeds limit
38