Just Count the Love-Hate Squares

•

1 like•1,110 views

This document proposes a method for recommender systems that counts different configurations ("squares") in the user-item bipartite rating network to predict whether a user will rate an item highly. It involves counting the number of each configuration for every user-item pair to generate features, then training a machine learning classifier on these features. The method was applied to the KDD Cup 2011 Yahoo! Music Dataset competition and achieved competitive results, with enhancements like normalizing against random networks and separating counts based on item hierarchy. Interestingly, configurations involving "hate" edges were most predictive of a user's potential love for an item.

Just Count the Love-Hate Squares:
a Rating Network Based Method for
Recommender Systems
KDD Cup 2011
August 21, 2011

Joseph Kong, Kyle Teague, Justin Kessler

Approved for public release by Northrop Grumman Information Systems, ISHQ-2011-0042

Link Prediction in Bipartite Rating Network

1 2 3 4 Items
80

20 100 90 50
?

A B Users
1 2 3 4 Items
+

- + + -
?

A B Users
•  Solid edges represent the observed rating pattern

•  Score >= 80 ( I-love-it, “+” ); score < 80 ( I-hate-it, “-” );

2 •  Goal: predict whether unobserved link is highly rated?

Motivation: Happy Hour with Brock and Donald

Song 1
+ Brock +
Song 2 Donald
- +
? - ? +
- +
- +
Me - Me +
- +

•  Happy hour chat: with Brock, there are 3 songs that we
both hate; with Donald, we find 3 songs we both love.

•  Now, Brock loves Song 1 and Donald loves Song 2

•  Am I more likely to love Song 1 or Song 2?

•  Main idea: the presence of certain type of square may be
3
highly indicative of love/hate; so, just count them!

The Square Counting Method: How to Count

- + - +
? 0 - ? 1 - ? 2 + ? 3 +
- - - -
- + - +
? 4 - ? 5 - ? 6 + ? 7 +
+ + + +
Configuration No. denoted in middle

•  Given user-item (utg-itg) pair: Count number of each
configuration and form feature vector

•  For example, in right Fig., the path (utg-i1-u1-itg), which has a
sign sequence of {-,+,-}, corresponds to configuration No. 2
(see left Fig.); thus, the count for configuration No. 2 is 1.
4

The Square Counting Method: Machine Learning

•  Counts for different square configurations form the features.

•  Construct the validation set with user-item pairs with known ratings.

•  Machine learning framework:

1.  Perform square counting on rating network for each user-item pair in the
validation set and generate the validation instance-feature matrix.

2.  Train a machine learned classifier on validation instance-feature matrix.

3.  Repeat square counting on the rating network for the test set and generate the
test instance-feature matrix.

4.  Apply the machine learned classifier for each instance in the test instance-
feature matrix.

5

KDD Cup Track 2-Yahoo! Music Dataset

•  Goal is to develop algorithms to separate which ratings were
highly rated by a user (score >=80) and which were not.

•  For each user in the test set, 6 songs were given; out of the 6
songs, 3 songs were highly rated by the user and 3 songs were
not (task is to distinguish them)

•  Winners are determined by the error rate on a hold-out test set

Statistic Count
Users 249,012
Items 296,111
Ratings 62,551,438
Training Ratings 61,944,406
Test Ratings 607,032

Summary of Results-KDD Cup Track 2

•  Enhancements •  Square counting
–  Normalizing square counts –  Generate feature-instance matrix
against random network model –  Implemented in C++/OpenMP
–  Separate counts based on item –  ~ 5 hr on 8-core workstation (2 GB
hierarchy RAM)
–  Further edge categorization
•  Machine learning: ~1 hr
–  Removing very popular items
–  Using bias-removed scores

7

Hate is a Powerful Signal in Predicting Love

•  Logistic regression coefficients (in 10-3) for each love-hate
square configuration in predicting a user's highly rated items

•  Interesting observation: most powerful configs for predicting
a user’s love for an item comes from hate edges: config. No.
1 & 4 (2nd top row; 1st bottom row).

•  Config. No. 1 (2nd top row) means: Item X is recommended
to you because you hate items Y and Z!
8

This document describes an efficient parallel set-similarity join algorithm for finding similar records in large datasets using MapReduce. The algorithm uses three stages: 1) token ordering to establish a global token ordering, 2) RID-pair generation to output record ID pairs that may be similar, and 3) record join to retrieve the actual joined records. Experimental results on real datasets show that the algorithm scales well to large clusters and datasets, with the best performing approaches being basic token ordering with an indexed kernel for RID-pair generation and a basic record join.

R user group meeting 25th jan 2017

Garrett Teoh Hor Keong

Using R in Kaggle Competitions. Kaggle has been the most popular data science platform linking close to half a million of data scientists worldwide. How to get yourself a decent ranking on Kaggle competitions with R programming, eXtreme Gradient BOOSTing, and a laptop. Great machine learning tools for all levels to get started and learn. Find out how to perform features engineering, tuning XGB models, selecting a sizable cross validations and performing model ensembles.

Paris data-geeks-2013-03-28

Ted Dunning

This document discusses machine learning techniques for recommendations and clustering. It introduces recommendation algorithms that analyze user-item interaction data to find items users who interacted with one item also interacted with another. It also discusses techniques for fast, scalable clustering of large datasets including using a surrogate to quickly cluster data before applying a higher quality algorithm to cluster centroids. The document emphasizes that simple techniques like logging, counting and session analysis often work best at large scale and provides examples of using recommendations for queries, videos and music.

Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)

Matthew Lease

This document provides an overview and agenda for a lecture on graph processing using MapReduce. It discusses representing graphs as adjacency matrices or lists, and gives examples of single source shortest path and PageRank algorithms. Graph processing in MapReduce typically involves computations at each node and propagating those computations across the graph. Key challenges include representing graph structure suitably for MapReduce and traversing the graph in a distributed manner through multiple iterations.

Ralf Herbrich - Introduction to Graphical models in Industry

Bayes Nets meetup London

Slope one recommender on hadoop

YONG ZHENG

This document provides an overview of slope one recommender algorithms and their implementation in distributed systems using Hadoop and Mahout. It discusses slope one and weighted slope one recommenders, how they are implemented in Mahout, and how Mahout runs them in a distributed manner on Hadoop using mappers and reducers. It then describes experiments run on MovieLens data using this distributed slope one implementation and analyzes the results.

Scala Data Pipelines for Music Recommendations

Chris Johnson

Are you still building data pipelines with Java and Python? Are you curious about the current buzz in the Big Data community surrounding Scala as a data processing environment? In this talk I'll discuss how Spotify migrated its music recommendations pipeline from Python to Scala. I'll dive into the language specific features that make Scala the ideal candidate for big data processing as well as highlight the rich set of tools and APIs that we take advantage of to process music recommendations for our 50 Million active users including Scalding, Breeze, Kafka, Spark, Parquet, Driven and Zeppelin.

The document summarizes an algorithm for overlapping community detection in networks. It discusses background on community definitions and types, describes four categories of algorithms - clique percolation, link partition, local expansion, and dynamics. For each category it provides examples to illustrate how the algorithms work by extracting communities from sample graphs. It also discusses evaluation metrics used to assess detected communities.

ML Label engineering and N-Hot Encoders

Mor Krispil

Grokking Techtalk #37: Data intensive problem

Grokking VN

At some point in your software engineer career, you will have to deal with data and your success depends on how big the data that your software can deal with. From a simple problem that requires processing a large amount of data, this talk will present to you how to approach this kind of issue and how to design and choose an efficient solution. About speaker: Hồ is Senior Software Engineer at AXON where he helps design and develops complex distributed systems, including image and video encoding, distributed file conversion system. Besides coding, Ho likes to read manga and meet friends in his free time.

Enar short course

Deepak Agarwal

This document discusses statistical computing for big data using distributed computing frameworks like MapReduce and Hadoop. It introduces MapReduce concepts like mappers, reducers, and Hadoop components including HDFS and YARN. Statistical challenges with big data are described, like scalability, dimensionality, and heterogeneity. The document discusses approaches for computing statistics on large datasets in parallel, including the Bag of Little Bootstraps method which breaks data into partitions to allow bootstrapping computations to run independently on clusters. Examples of computing means and counts in parallel using MapReduce are also provided.

An early look at the LDBC Social Network Benchmark's Business Intelligence wo...

Gábor Szárnyas

This document summarizes an early look at the Business Intelligence workload of the LDBC Social Network Benchmark. It describes the LDBC timeline and goals, defines the graph processing landscape including interactive, graph analytics, and BI workloads, and provides examples of BI global queries. It also outlines the choke points identified in benchmark design and language features, and discusses implementing the BI workload through data generation, query specification, and cross-validation of multiple systems.

22期.百度彭滔搜索引擎评估与用户行为分析

Janwen Lou

Mmclass3

Hassan Dar

The document discusses several image compression standards and formats: 1. The JPEG-LS standard describes near-lossless image encoding that reconstructs the input with a pre-specified error threshold. It uses context modeling, run-length encoding, and Golomb-Rice encoding of prediction errors. 2. The GIF format represents raster images using blocks including screen, graphic, and image descriptors along with color tables and LZW compression of pixel data. 3. Network-aware formats aim to balance compression efficiency with progressive transmission performance over packet-based networks by chunking images into maximum transmission unit sizes.

Final Presentation - Edan&Itzik

itzik cohen

The document provides details of a final presentation on artificial intelligence using metaheuristic strategies. It outlines the project goal of building a generic problem solver using genetic algorithms. It describes implementing test problems like the traveling salesman problem and applications to stock market investments and AI for computer games. The document discusses the genetic algorithm framework created, background on genetic algorithms, and results for the test problems showing the genetic algorithm finding optimal or near-optimal solutions.

Social network analysis

Caleb Jones

Domainspecificsubgraph extraction ieee-bigdata2016

Sarasi Sarangi

This document discusses a method for extracting domain-specific subgraphs from large, cross-domain knowledge graphs (KGs) to improve the efficiency of recommendation systems. The method measures the domain specificity of relationships based on entity types and property paths. It is evaluated on movie and book recommendation using DBpedia and MovieLens datasets, achieving over 80% graph reduction while maintaining or improving recommendation accuracy and reducing computation time by up to 10x compared to using the full KG.

Domainspecificsubgraph extraction ieee-bigdata2016

Artificial Intelligence Institute at UofSC

This document discusses an approach to extract domain-specific subgraphs from large, cross-domain knowledge graphs (KGs) for use in recommendation systems. It proposes measuring the domain specificity of relationships based on entity types and property paths. An evaluation on movie and book recommendation demonstrates the approach reduces KG size by 80-90% while maintaining or improving recommendation accuracy and reducing computation time up to 10-fold compared to using the full KG.

Parking space detect

Amanullah Tariq

The document describes a 3-task process to detect parking spaces using images and 3D point cloud data: 1. Detect patterns in 2D images to generate a parking space map, and register corresponding points between the 2D image and 3D point cloud. 2. Segment objects in the point cloud using clustering methods and apply supervised learning with logistic regression to classify objects as cars or not. 3. Combine the 2D parking map with occupied spaces identified from the 3D data, and improve the map by drawing rectangles around predicted car locations.

A look inside pandas design and development

Wes McKinney

This document summarizes Wes McKinney's presentation on pandas, an open source data analysis library for Python. McKinney is the lead developer of pandas and discusses its design, development, and performance advantages over other Python data analysis tools. He highlights key pandas features like the DataFrame for tabular data, fast data manipulation capabilities, and its use in financial applications. McKinney also discusses his development process, tools like IPython and Cython, and optimization techniques like profiling and algorithm exploration to ensure pandas' speed and reliability.

Game Programming 07 - Procedural Content Generation

Nick Pruehs

object detection paper review

Yoonho Na

- R-CNN was the first CNN model to achieve high performance in object detection. It used a multi-stage pipeline involving region proposals, feature extraction via CNN, and SVM classification. It was slow due to computing CNN features for each region individually. - Fast R-CNN improved on R-CNN by introducing a ROI pooling layer to share computation and enabling end-to-end training. However, region proposals were still generated externally, slowing down detection. - Faster R-CNN addressed this by introducing a Region Proposal Network to generate proposals, allowing the entire model to be trained end-to-end. This led to faster and more accurate detection compared to previous models. - YOLO

Numerical Linear Algebra for Data and Link Analysis

Leonid Zhukov

The document discusses numerical linear algebra techniques for analyzing large graphs and networks. It provides examples of large social networks like Flickr that can be represented as graphs and analyzed using graph-based algorithms. Specifically, it discusses using techniques like PageRank to analyze link structures and rank nodes in a graph based on their importance. It also discusses computational methods like power iteration and Krylov subspace methods for efficiently solving the large systems of equations that arise in PageRank and related network analysis problems.

Multi-label graph analysis and computations using GraphX

Qingbo Hu

Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...

Databricks

In real-life applications, we often deal with situations where analysis needs to be conducted on graphs where the nodes and edges are associated with multiple labels. For example, in a graph that represents user activities in social networks, the labels associated with nodes may indicate their membership in communities (e.g. group, school, company, etc.), and the labels associated with edges may denote types of activities (e.g. comment, like, share, etc.). The current GraphX library in Spark does not directly support efficient calculation on the label-defined subgraph analysis and computations. In this session, the speakers will propose a general API library that is able to support analysis on multi-label graphs, and can be reused and extended to design more complicated algorithms. It includes a method to create multi-label graphs and calculate basic statistics and metrics at both the global and subgraph level. Common graph algorithms, such as PageRank, can also be efficiently implemented in a parallel scheme by reusing the module/algorithm in GraphX, such as Pregel API. See how LinkedIn is able to leverage this tool to efficiently find top LinkedIn feed influencers in different communities and by different actions. can be reused and extended to design more complicated algorithms. It includes a method to create multi-label graphs and calculate basic statistics and metrics at both the global and subgraph level. Common graph algorithms, such as PageRank, can also be efficiently implemented in a parallel scheme by reusing the module/algorithm in GraphX, such as Pregel API. See how LinkedIn is able to leverage this tool to efficiently find top LinkedIn feed influencers in different communities and by different actions.

Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...

Databricks

[SNU Computer Vision Course Project] Image Style Recognition

Hunjae Jung

This document outlines a project on style recognition from images. It describes acquiring hardware to support deep learning models, including a GPU and additional memory. Various CNN architectures are explored for style classification, including fine-tuning pre-trained models. Hand-crafted features like GIST and color histograms are also extracted. Different classifiers are tested on combined CNN and hand-crafted features. The best performance comes from a neural network classifier using CaffeNet features and GIST descriptors, achieving an accuracy of 39.5%, higher than the benchmark. A confusion matrix and online demo are presented.

“I’m still / I’m still / Chaining from the Block”

Claudio Di Ciccio

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Neo4j

Similar to Just Count the Love-Hate Squares

Overlapping community detection survey

煜林车

ML Label engineering and N-Hot Encoders

Mor Krispil

Grokking Techtalk #37: Data intensive problem

Grokking VN

Enar short course

Deepak Agarwal

An early look at the LDBC Social Network Benchmark's Business Intelligence wo...

Gábor Szárnyas

22期.百度彭滔搜索引擎评估与用户行为分析

Janwen Lou

Mmclass3

Hassan Dar

Final Presentation - Edan&Itzik

itzik cohen

Social network analysis

Caleb Jones

Domainspecificsubgraph extraction ieee-bigdata2016

Sarasi Sarangi

Domainspecificsubgraph extraction ieee-bigdata2016

Artificial Intelligence Institute at UofSC

Parking space detect

Amanullah Tariq

A look inside pandas design and development

Wes McKinney

Game Programming 07 - Procedural Content Generation

Nick Pruehs

object detection paper review

Yoonho Na

Numerical Linear Algebra for Data and Link Analysis

Leonid Zhukov

Multi-label graph analysis and computations using GraphX

Qingbo Hu

Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...

Databricks

Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...

Databricks

[SNU Computer Vision Course Project] Image Style Recognition

Hunjae Jung

Similar to Just Count the Love-Hate Squares (20)

Overlapping community detection survey

ML Label engineering and N-Hot Encoders

Grokking Techtalk #37: Data intensive problem

Enar short course

An early look at the LDBC Social Network Benchmark's Business Intelligence wo...

22期.百度彭滔搜索引擎评估与用户行为分析

Mmclass3

Final Presentation - Edan&Itzik

Social network analysis

Domainspecificsubgraph extraction ieee-bigdata2016

Parking space detect

A look inside pandas design and development

Game Programming 07 - Procedural Content Generation

object detection paper review

Numerical Linear Algebra for Data and Link Analysis

Multi-label graph analysis and computations using GraphX

Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...

Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...

[SNU Computer Vision Course Project] Image Style Recognition

Recently uploaded

“I’m still / I’m still / Chaining from the Block”

Claudio Di Ciccio

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Neo4j

Full-RAG: A modern architecture for hyper-personalization

Zilliz

Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

SOFTTECHHUB

The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing. One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Neo4j

A tale of scale & speed: How the US Navy is enabling software delivery from l...

sonjaschweigert1

Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved: - Reduction in onboarding time from 5 weeks to 1 day - Improved developer experience and productivity through actionable findings and reduction of false positives - Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO) Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production. We will cover: - How to remove silos in DevSecOps - How to build efficient development pipeline roles and component templates - How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence) - How to streamline operations with automated policy checks on container images

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

20240605 QFM017 Machine Intelligence Reading List May 2024

Matthew Sinclair

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI

Vladimir Iglovikov, Ph.D.

Presented by Vladimir Iglovikov: - https://www.linkedin.com/in/iglovikov/ - https://x.com/viglovikov - https://www.instagram.com/ternaus/ This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation. Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners. This case study covers various aspects, including: People: The contributors and community that have supported Albumentations. Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions. Challenges: The hurdles in monetizing open-source projects and measuring user engagement. Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration. Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community. Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations. Mental Health: Maintaining balance and not feeling pressured by user demands. Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth. Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects. Explore more about Albumentations and join the community at: GitHub: https://github.com/albumentations-team/albumentations Website: https://albumentations.ai/ LinkedIn: https://www.linkedin.com/company/100504475 Twitter: https://x.com/albumentations

Building RAG with self-deployed Milvus vector database and Snowpark Container...

Zilliz

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

Introduction to CHERI technology - Cybersecurity

mikeeftimakis1

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...

Neo4j

Dr. Sean Tan, Head of Data Science, Changi Airport Group Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.

Microsoft - Power Platform_G.Aspiotis.pdf

Uni Systems S.M.S.A.

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

Data structures and Algorithms in Python.pdf

TIPNGVN2

TrustArc Webinar - 2024 Global Privacy Survey

TrustArc

How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024? In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores. See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe. This webinar will review: - The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey - The top challenges for privacy leaders, practitioners, and organizations in 2024 - Key themes to consider in developing and maintaining your privacy program

Communications Mining Series - Zero to Hero - Session 1

DianaGray10

This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered: • Communication Mining Overview • Why is it important? • How can it help today’s business and the benefits • Phases in Communication Mining • Demo on Platform overview • Q/A

Presentation of the OECD Artificial Intelligence Review of Germany

innovationoecd

Recently uploaded (20)

“I’m still / I’m still / Chaining from the Block”

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024

Full-RAG: A modern architecture for hyper-personalization

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

A tale of scale & speed: How the US Navy is enabling software delivery from l...

Securing your Kubernetes cluster_ a step-by-step guide to success !

20240605 QFM017 Machine Intelligence Reading List May 2024

Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI

Building RAG with self-deployed Milvus vector database and Snowpark Container...

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Introduction to CHERI technology - Cybersecurity

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...

Microsoft - Power Platform_G.Aspiotis.pdf

Essentials of Automations: The Art of Triggers and Actions in FME

PCI PIN Basics Webinar from the Controlcase Team

Data structures and Algorithms in Python.pdf

TrustArc Webinar - 2024 Global Privacy Survey

Communications Mining Series - Zero to Hero - Session 1

Presentation of the OECD Artificial Intelligence Review of Germany

Just Count the Love-Hate Squares

1. Just Count the Love-Hate Squares: a Rating Network Based Method for Recommender Systems KDD Cup 2011 August 21, 2011 Joseph Kong, Kyle Teague, Justin Kessler Approved for public release by Northrop Grumman Information Systems, ISHQ-2011-0042

2. Link Prediction in Bipartite Rating Network 1 2 3 4 Items 80 20 100 90 50 ? A B Users 1 2 3 4 Items + - + + - ? A B Users •  Solid edges represent the observed rating pattern •  Score >= 80 ( I-love-it, “+” ); score < 80 ( I-hate-it, “-” ); 2 •  Goal: predict whether unobserved link is highly rated?

3. Motivation: Happy Hour with Brock and Donald Song 1 + Brock + Song 2 Donald - + ? - ? + - + - + Me - Me + - + •  Happy hour chat: with Brock, there are 3 songs that we both hate; with Donald, we find 3 songs we both love. •  Now, Brock loves Song 1 and Donald loves Song 2 •  Am I more likely to love Song 1 or Song 2? •  Main idea: the presence of certain type of square may be 3 highly indicative of love/hate; so, just count them!

4. The Square Counting Method: How to Count - + - + ? 0 - ? 1 - ? 2 + ? 3 + - - - - - + - + ? 4 - ? 5 - ? 6 + ? 7 + + + + + Configuration No. denoted in middle •  Given user-item (utg-itg) pair: Count number of each configuration and form feature vector •  For example, in right Fig., the path (utg-i1-u1-itg), which has a sign sequence of {-,+,-}, corresponds to configuration No. 2 (see left Fig.); thus, the count for configuration No. 2 is 1. 4

5. The Square Counting Method: Machine Learning •  Counts for different square configurations form the features. •  Construct the validation set with user-item pairs with known ratings. •  Machine learning framework: 1.  Perform square counting on rating network for each user-item pair in the validation set and generate the validation instance-feature matrix. 2.  Train a machine learned classifier on validation instance-feature matrix. 3.  Repeat square counting on the rating network for the test set and generate the test instance-feature matrix. 4.  Apply the machine learned classifier for each instance in the test instance- feature matrix. 5

6. KDD Cup Track 2-Yahoo! Music Dataset •  Goal is to develop algorithms to separate which ratings were highly rated by a user (score >=80) and which were not. •  For each user in the test set, 6 songs were given; out of the 6 songs, 3 songs were highly rated by the user and 3 songs were not (task is to distinguish them) •  Winners are determined by the error rate on a hold-out test set Statistic Count Users 249,012 Items 296,111 Ratings 62,551,438 Training Ratings 61,944,406 Test Ratings 607,032

7. Summary of Results-KDD Cup Track 2 •  Enhancements •  Square counting –  Normalizing square counts –  Generate feature-instance matrix against random network model –  Implemented in C++/OpenMP –  Separate counts based on item –  ~ 5 hr on 8-core workstation (2 GB hierarchy RAM) –  Further edge categorization •  Machine learning: ~1 hr –  Removing very popular items –  Using bias-removed scores 7

8. Hate is a Powerful Signal in Predicting Love •  Logistic regression coefficients (in 10-3) for each love-hate square configuration in predicting a user's highly rated items •  Interesting observation: most powerful configs for predicting a user’s love for an item comes from hate edges: config. No. 1 & 4 (2nd top row; 1st bottom row). •  Config. No. 1 (2nd top row) means: Item X is recommended to you because you hate items Y and Z! 8

Just Count the Love-Hate Squares

Recommended

Recommended

More Related Content

Similar to Just Count the Love-Hate Squares

Similar to Just Count the Love-Hate Squares (20)

Recently uploaded

Recently uploaded (20)

Just Count the Love-Hate Squares