Deep Learning approaches for Hate speech detection. In this work we used the two deep learning approaches DCNN and MLP two separate classifier on four publicly available datasets.
Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. However, traditionally machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph. In this talk I will discuss methods that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. I will provide a conceptual review of key advancements in this area of representation learning on graphs, including random-walk based algorithms, and graph convolutional networks.
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Bartlomiej Twardowski
Modeling Contextual Information in Session-Aware Recommender Systems with Neural Networks, RecSys 2016 Boston, Bartłomiej Twardowski
Presentation for a paper:
http://dl.acm.org/citation.cfm?id=2959162
Abstract:
Preparing recommendations for unknown users or such that correctly respond to the short-term needs of a particular user is one of the fundamental problems for e-commerce. Most of the common Recommender Systems assume that user identification must be explicit. In this paper a Session-Aware Recommender System approach is presented where no straightforward user information is required. The recommendation process is based only on user activity within a single session, defined as a sequence of events. This information is incorporated in the recommendation process by explicit context modeling with factorization methods and a novel approach with Recurrent Neural Network (RNN). Compared to the session modeling approach, RNN directly models the dependency of user observed sequential behavior throughout its recurrent structure. The evaluation discusses the results based on sessions from real-life system with ephemeral items (identified only by the set of their attributes) for the task of top-n best recommendations.
Maximizing the Diversity of Exposure in a Social Network Cigdem Aslay
Social-media platforms have created new ways for citizens to stay informed and participate in public debates. However, to enable a healthy environment for information sharing, social deliberation, and opinion formation, citizens need to be exposed to sufficiently diverse viewpoints that challenge their assumptions, instead of being trapped inside filter bubbles.
In this paper, we take a step in this direction and propose a novel approach to maximize the diversity of exposure in a social network. We formulate the problem in the context of information propagation, as a task of recommending a small number of news articles to selected users.
We propose a realistic setting where we take into account content and user leanings, and the probability of further sharing an article. This setting allows us to capture the balance between maximizing the spread of information and ensuring the exposure of users to diverse viewpoints.
The resulting problem can be cast as maximizing a monotone and submodular function subject to a matroid constraint on the allocation of articles to users. It is a challenging generalization of the influence maximization problem. Yet, we are able to devise scalable approximation algorithms by introducing a novel extension to the notion of random reverse-reachable sets. We experimentally demonstrate the efficiency and scalability of our algorithm on several real-world datasets.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Deep Learning approaches for Hate speech detection. In this work we used the two deep learning approaches DCNN and MLP two separate classifier on four publicly available datasets.
Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. However, traditionally machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph. In this talk I will discuss methods that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. I will provide a conceptual review of key advancements in this area of representation learning on graphs, including random-walk based algorithms, and graph convolutional networks.
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Bartlomiej Twardowski
Modeling Contextual Information in Session-Aware Recommender Systems with Neural Networks, RecSys 2016 Boston, Bartłomiej Twardowski
Presentation for a paper:
http://dl.acm.org/citation.cfm?id=2959162
Abstract:
Preparing recommendations for unknown users or such that correctly respond to the short-term needs of a particular user is one of the fundamental problems for e-commerce. Most of the common Recommender Systems assume that user identification must be explicit. In this paper a Session-Aware Recommender System approach is presented where no straightforward user information is required. The recommendation process is based only on user activity within a single session, defined as a sequence of events. This information is incorporated in the recommendation process by explicit context modeling with factorization methods and a novel approach with Recurrent Neural Network (RNN). Compared to the session modeling approach, RNN directly models the dependency of user observed sequential behavior throughout its recurrent structure. The evaluation discusses the results based on sessions from real-life system with ephemeral items (identified only by the set of their attributes) for the task of top-n best recommendations.
Maximizing the Diversity of Exposure in a Social Network Cigdem Aslay
Social-media platforms have created new ways for citizens to stay informed and participate in public debates. However, to enable a healthy environment for information sharing, social deliberation, and opinion formation, citizens need to be exposed to sufficiently diverse viewpoints that challenge their assumptions, instead of being trapped inside filter bubbles.
In this paper, we take a step in this direction and propose a novel approach to maximize the diversity of exposure in a social network. We formulate the problem in the context of information propagation, as a task of recommending a small number of news articles to selected users.
We propose a realistic setting where we take into account content and user leanings, and the probability of further sharing an article. This setting allows us to capture the balance between maximizing the spread of information and ensuring the exposure of users to diverse viewpoints.
The resulting problem can be cast as maximizing a monotone and submodular function subject to a matroid constraint on the allocation of articles to users. It is a challenging generalization of the influence maximization problem. Yet, we are able to devise scalable approximation algorithms by introducing a novel extension to the notion of random reverse-reachable sets. We experimentally demonstrate the efficiency and scalability of our algorithm on several real-world datasets.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Knowledge Graphs have proven to be extremely valuable to rec-
ommender systems, as they enable hybrid graph-based recommen-
dation models encompassing both collaborative and content infor-
mation. Leveraging this wealth of heterogeneous information for
top-N item recommendation is a challenging task, as it requires the
ability of effectively encoding a diversity of semantic relations and
connectivity patterns. In this work, we propose entity2rec, a novel
approach to learning user-item relatedness from knowledge graphs
for top-N item recommendation. We start from a knowledge graph
modeling user-item and item-item relations and we learn property-
specific vector representations of users and items applying neural
language models on the network. These representations are used
to create property-specific user-item relatedness features, which
are in turn fed into learning to rank algorithms to learn a global
relatedness model that optimizes top-N item recommendations. We
evaluate the proposed approach in terms of ranking quality on
the MovieLens 1M dataset, outperforming a number of state-of-
the-art recommender systems, and we assess the importance of
property-specific relatedness scores on the overall ranking quality.
Spotify uses a range of Machine Learning models to power its music recommendation features including the Discover page and Radio. Due to the iterative nature of training these models they suffer from IO overhead of Hadoop and are a natural fit to the Spark programming paradigm. In this talk I will present both the right way as well as the wrong way to implement collaborative filtering models with Spark. Additionally, I will deep dive into how Matrix Factorization is implemented in the MLlib library.
Algorithmic Music Recommendations at SpotifyChris Johnson
In this presentation I introduce various Machine Learning methods that we utilize for music recommendations and discovery at Spotify. Specifically, I focus on Implicit Matrix Factorization for Collaborative Filtering, how to implement a small scale version using python, numpy, and scipy, as well as how to scale up to 20 Million users and 24 Million songs using Hadoop and Spark.
There is increasing need for large-scale recommendation systems. Typical solutions rely on periodically retrained batch algorithms, but for massive amounts of data, training a new model could take hours. This is a problem when the model needs to be more up-to-date. For example, when recommending TV programs while they are being transmitted the model should take into consideration users who watch a program at that time.
The promise of online recommendation systems is fast adaptation to changes, but methods of online machine learning from streams is commonly believed to be more restricted and hence less accurate than batch trained models. Combining batch and online learning could lead to a quickly adapting recommendation system with increased accuracy. However, designing a scalable data system for uniting batch and online recommendation algorithms is a challenging task. In this talk we present our experiences in creating such a recommendation engine with Apache Flink and Apache Spark.
This is the first lecture on Applied Machine Learning. The course focuses on the emerging and modern aspects of this subject such as Deep Learning, Recurrent and Recursive Neural Networks (RNN), Long Short Term Memory (LSTM), Convolution Neural Networks (CNN), Hidden Markov Models (HMM). It deals with several application areas such as Natural Language Processing, Image Understanding etc. This presentation provides the landscape.
Erik Bernhardsson is the CTO at Better, a small startup in NYC working with mortgages. Before Better, he spent five years at Spotify managing teams working with machine learning and data analytics, in particular music recommendations.
Abstract Summary:
Nearest Neighbor Methods And Vector Models: Vector models are being used in a lot of different fields: natural language processing, recommender systems, computer vision, and other things. They are fast and convenient and are often state of the art in terms of accuracy. One of the challenges with vector models is that as the number of dimensions increase, finding similar items gets challenging. Erik developed a library called “Annoy” that uses a forest of random tree to do fast approximate nearest neighbor queries in high dimensional spaces. We will cover some specific applications of vector models with and how Annoy works.
With the explosive growth of online information, recommender system has been an effective tool to overcome information overload and promote sales. In recent years, deep learning's revolutionary advances in speech recognition, image analysis and natural language processing have gained significant attention. Meanwhile, recent studies also demonstrate its efficacy in coping with information retrieval and recommendation tasks. Applying deep learning techniques into recommender system has been gaining momentum due to its state-of-the-art performance. In this talk, I will present recent development of deep learning based recommender models and highlight some future challenges and open issues of this research field.
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
Slides for talk at PyData Seattle 2017 about Matthew Honnibal's 4-step recipe for Deep Learning NLP pipelines. Description of the stages in pipeline as well as 3 examples of document classification, document similarity and sentence similarity. Examples include Keras custom layers for different types of attention.
Representation Learning on Graphs with Complex Structures
Invited talk, Deep Learning for Graphs and Structured Data Embedding Workshop
WWW2019, San Francisco, May 13, 2019
In this presentation we discuss the hypothesis of MaxEnt models, describe the role of feature functions and their applications to Natural Language Processing (NLP). The training of the classifier is discussed in a later presentation.
Workload-aware materialization for efficient variable elimination on Bayesian...Cigdem Aslay
Bayesian networks are general, well-studied probabilistic models that capture dependencies among a set of variables. Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method, which can lead to significant efficiency gains when processing inference queries using the Variable Elimination algorithm. In particular, we address the problem of choosing a set of intermediate results to precompute and materialize, so as to maximize the expected efficiency gain over a given query workload. For the problem we consider, we provide an optimal polynomial-time algorithm and discuss alternative methods. We validate our technique using real-world Bayesian networks. Our experimental results confirm that a modest amount of materialization can lead to significant improvements in the running time of queries, with an average gain of 70%, and reaching up to a gain of 99%, for a uniform workload of queries. Moreover, in comparison with existing junction tree methods that also rely on materialization, our approach achieves competitive efficiency during inference using significantly lighter materialization.
The following topics we cover in the event..
1.Introduction of neural networks
(What,Why and How)
2.Types of neural networks
(For different types of problems)
3.Neural networks Algorithms explanation (Forward and Back propagation)
4.Demo of neural networks
(Image classification like bird , aeroplane,person and etc...)
Knowledge Graphs have proven to be extremely valuable to rec-
ommender systems, as they enable hybrid graph-based recommen-
dation models encompassing both collaborative and content infor-
mation. Leveraging this wealth of heterogeneous information for
top-N item recommendation is a challenging task, as it requires the
ability of effectively encoding a diversity of semantic relations and
connectivity patterns. In this work, we propose entity2rec, a novel
approach to learning user-item relatedness from knowledge graphs
for top-N item recommendation. We start from a knowledge graph
modeling user-item and item-item relations and we learn property-
specific vector representations of users and items applying neural
language models on the network. These representations are used
to create property-specific user-item relatedness features, which
are in turn fed into learning to rank algorithms to learn a global
relatedness model that optimizes top-N item recommendations. We
evaluate the proposed approach in terms of ranking quality on
the MovieLens 1M dataset, outperforming a number of state-of-
the-art recommender systems, and we assess the importance of
property-specific relatedness scores on the overall ranking quality.
Spotify uses a range of Machine Learning models to power its music recommendation features including the Discover page and Radio. Due to the iterative nature of training these models they suffer from IO overhead of Hadoop and are a natural fit to the Spark programming paradigm. In this talk I will present both the right way as well as the wrong way to implement collaborative filtering models with Spark. Additionally, I will deep dive into how Matrix Factorization is implemented in the MLlib library.
Algorithmic Music Recommendations at SpotifyChris Johnson
In this presentation I introduce various Machine Learning methods that we utilize for music recommendations and discovery at Spotify. Specifically, I focus on Implicit Matrix Factorization for Collaborative Filtering, how to implement a small scale version using python, numpy, and scipy, as well as how to scale up to 20 Million users and 24 Million songs using Hadoop and Spark.
There is increasing need for large-scale recommendation systems. Typical solutions rely on periodically retrained batch algorithms, but for massive amounts of data, training a new model could take hours. This is a problem when the model needs to be more up-to-date. For example, when recommending TV programs while they are being transmitted the model should take into consideration users who watch a program at that time.
The promise of online recommendation systems is fast adaptation to changes, but methods of online machine learning from streams is commonly believed to be more restricted and hence less accurate than batch trained models. Combining batch and online learning could lead to a quickly adapting recommendation system with increased accuracy. However, designing a scalable data system for uniting batch and online recommendation algorithms is a challenging task. In this talk we present our experiences in creating such a recommendation engine with Apache Flink and Apache Spark.
This is the first lecture on Applied Machine Learning. The course focuses on the emerging and modern aspects of this subject such as Deep Learning, Recurrent and Recursive Neural Networks (RNN), Long Short Term Memory (LSTM), Convolution Neural Networks (CNN), Hidden Markov Models (HMM). It deals with several application areas such as Natural Language Processing, Image Understanding etc. This presentation provides the landscape.
Erik Bernhardsson is the CTO at Better, a small startup in NYC working with mortgages. Before Better, he spent five years at Spotify managing teams working with machine learning and data analytics, in particular music recommendations.
Abstract Summary:
Nearest Neighbor Methods And Vector Models: Vector models are being used in a lot of different fields: natural language processing, recommender systems, computer vision, and other things. They are fast and convenient and are often state of the art in terms of accuracy. One of the challenges with vector models is that as the number of dimensions increase, finding similar items gets challenging. Erik developed a library called “Annoy” that uses a forest of random tree to do fast approximate nearest neighbor queries in high dimensional spaces. We will cover some specific applications of vector models with and how Annoy works.
With the explosive growth of online information, recommender system has been an effective tool to overcome information overload and promote sales. In recent years, deep learning's revolutionary advances in speech recognition, image analysis and natural language processing have gained significant attention. Meanwhile, recent studies also demonstrate its efficacy in coping with information retrieval and recommendation tasks. Applying deep learning techniques into recommender system has been gaining momentum due to its state-of-the-art performance. In this talk, I will present recent development of deep learning based recommender models and highlight some future challenges and open issues of this research field.
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
Slides for talk at PyData Seattle 2017 about Matthew Honnibal's 4-step recipe for Deep Learning NLP pipelines. Description of the stages in pipeline as well as 3 examples of document classification, document similarity and sentence similarity. Examples include Keras custom layers for different types of attention.
Representation Learning on Graphs with Complex Structures
Invited talk, Deep Learning for Graphs and Structured Data Embedding Workshop
WWW2019, San Francisco, May 13, 2019
In this presentation we discuss the hypothesis of MaxEnt models, describe the role of feature functions and their applications to Natural Language Processing (NLP). The training of the classifier is discussed in a later presentation.
Workload-aware materialization for efficient variable elimination on Bayesian...Cigdem Aslay
Bayesian networks are general, well-studied probabilistic models that capture dependencies among a set of variables. Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method, which can lead to significant efficiency gains when processing inference queries using the Variable Elimination algorithm. In particular, we address the problem of choosing a set of intermediate results to precompute and materialize, so as to maximize the expected efficiency gain over a given query workload. For the problem we consider, we provide an optimal polynomial-time algorithm and discuss alternative methods. We validate our technique using real-world Bayesian networks. Our experimental results confirm that a modest amount of materialization can lead to significant improvements in the running time of queries, with an average gain of 70%, and reaching up to a gain of 99%, for a uniform workload of queries. Moreover, in comparison with existing junction tree methods that also rely on materialization, our approach achieves competitive efficiency during inference using significantly lighter materialization.
The following topics we cover in the event..
1.Introduction of neural networks
(What,Why and How)
2.Types of neural networks
(For different types of problems)
3.Neural networks Algorithms explanation (Forward and Back propagation)
4.Demo of neural networks
(Image classification like bird , aeroplane,person and etc...)
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Hima Patel
It is widely accepted that data preparation is one of the most time-consuming steps of the machine learning (ML) lifecycle. It is also one of the most important steps, as the quality of data directly influences the quality of a model. In this session, we will discuss the importance and the role of exploratory data analysis (EDA) and data visualisation techniques to find data quality issues and for data preparation, relevant to building ML pipelines. We will also discuss the latest advances in these fields and bring out areas that need innovation. Finally, we will discuss on the challenges posed by industry workloads and the gaps to be addressed to make data-centric AI real in industry settings.
Science has escaped the lab and is roaming free in the world. People use software to understand the world . What tools are needed to support that work?
Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
Relationships are highly predictive of behavior, yet most data science models overlook this information because it's difficult to extract network structure for use in machine learning (ML).
With graphs, relationships are embedded in the data itself, making it practical to add these predictive capabilities to your existing practices.
That’s why we’re presenting and demoing the use of graph-native ML to make breakthrough predictions. This will cover:
- Different approaches to graph feature engineering, from queries and algorithms to embeddings
- How ML techniques leverage everything from classical network science to deep learning and graph convolutional neural networks
- How to generate representations of your graph using graph embeddings, create ML models for link prediction or node classification, and apply these models to add missing information to an existing graph/incoming data
- Why no-code visualization and prototyping is important
Introduction of streaming data, difference between batch processing and stream processing, Research issues in streaming data processing, Performance evaluation metrics , tools for stream processing.
Presented at http://dsatlconf.com/
Demonstrating how you can run Machine Learning inside the database and be several orders of magnitude more efficient. We also talk about you can build Machine Learning models with Probabilistic Rules inside the database
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Topic: Generating Synthetic Data with Generative Adversarial Networks: Opportunities and Challenges
Limited data access continues to be a barrier to data-driven product development. In this talk, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge.
We identify key challenges of existing GAN approaches with respect to fidelity (e.g., capturing complex multidimensional correlations, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity).
To address fidelity challenges, we discuss our experiences designing a custom workflow called DoppelGANger and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DoppelGANger achieves up to 43% better fidelity than baseline models.
With respect to privacy, we identify fundamental challenges with both classical notions of privacy as well as recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges.
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
The slide contains some high level information about some machine learning algorithms, cross validation and feature extraction techniques. It also contains high level techniques about high available and scalable ML products.
Data Driven: The Ancestry.com Journey to Self-Service AnalyticsWilliam Yetman
Presented as a breakout session at the 2014 Tableau Conference. Tag team effort with me and Adam Davis who leads Ancestry's Tools and Visualization Team. The demos that Adam did at the conference are missing from the presentation. They went really well and rounded out the breakout session.
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...Kunwoo Park
This slide is used for the tutorial in Deep Learning Summer School, held in IBS, Daejeon. Based on the recent effort on detecting misleading headlines through deep neural networks (Yoon et al., AAAI 2019), it explains how RNN and Attention mechanism works for text. Moreover, implementations based on TensorFlow 1.x are introduced.
Positivity Bias in Customer Satisfaction RatingsKunwoo Park
This slide is for my presentation at The Web Conference 2018 (also known as WWW). You can check the paper at the following link: https://dl.acm.org/authorize.cfm?key=N655133
Persistent Sharing of Fitness App Status on TwitterKunwoo Park
2016년 7월 25일 Naver labs에서 발표한 자료입니다. CSCW '16에서 발표된 아래 논문을 한글로 소개하였습니다.
Title: Persistent Sharing of Fitness App Status on Twitter
Author: Kunwoo Park, Ingmar Weber, Meeyoung Cha, Chul Lee
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구Kunwoo Park
2015년 12월 18일 한빛미디어에서 개최된 생활 데이터 모임에서 발표한 내용입니다. 소셜 데이터를 이용한 연구 사례로 피트니스 앱의 지속 사용에 관한 연구를 공유하였습니다. 소개된 논문은 다음 링크에서 확인 가능합니다: http://kunwpark.kr/wp-content/uploads/2015/12/cscw16_park.pdf
MS thesis defense - Gender swapping and its effects in MMORPGsKunwoo Park
This slide is for my MS thesis defense.
For more information, please feel free to contact me.
이 슬라이드는 제 석사 논문 디펜스에 사용된 슬라이드입니다. 보다 자세한 정보를 원하시는 분은 언제든 연락주세요 :)
Abstract:
Massively Multiplayer Online Role-Playing Games (MMORPGs) provide lifelike virtual environments, where players can freely behave while escaping from reality. Players can conduct a variety of activities including combat, trade, and chat with other players like in real world. Due to the development of the Web and Internet, numerous players have enjoyed MMORPGs, and it offers a big opportunity for conducting large-scale researches to understand human behaviors and social networks. Since online world is similar with real world, players can construct their identities independently of their real life. They can freely choose the appearance of avatars, and even decide their opposite gender. This leads to an interesting phenomenon, "gender swapping", which refers to players choosing avatars with genders opposite to their natural ones. This phenomenon was first observed several decades ago, and there is a line of research investigating who enjoy this phenomenon, and why they swap genders in MMORPGs. However, due to limited data access, those studies have been done in a relatively small-scale by conducting online surveys, which leaves the risk of sampling error.
In this thesis, an attempt is made to understand gender swapping using the entire data of Fairyland Online, a globally serviced MMORPGs. The results not only show which kinds of people participate in this phenomenon, but also report the behavioral patterns observed in players of this game during social interactions, both when playing as in-game avatars of their own real gender, or gender-swapped. Lastly, we show differences of structural patterns in social network against possible four gender combinations from two real genders and two in-game genders. This thesis also discusses the effect of gender role and self-image in virtual social situations, and the potential for this study to improve MMORPG quality and
increase our understanding of social networks.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
[CS570] Machine Learning Team Project (I know what items really are)
1. 2013. 06. 13(Thu)
Team 11. Junghyun Kwon
Kunwoo Park
Jongin Lee
Seungkyu Nam
I know
what items really are
2. • Problem
• Challenges
• Related works
• Motivation
• Approaches
• Experiment setup
• Feature extraction
• Result
• Discussion
2
Contents
3. • Purpose of Track 1 in 2012 KDD cup
• Predict which users(or items) a Weibo user might follow.
• Recommendation System [1]
• Save valuable time sifting through
less relevant stories
• Increase customer satisfaction
3
Problem
Twitter.com
4. • 90% data of the world are generated for the last three years
• 1.0 × 1016 byte everyday
• Sensor, Mobile, SNS, Online transaction
• 10 billion tweets everyday
• 30 billion FB msgs everyday (*)
• …
4
Problem
Source: http://goo.gl/9xXaG
*: BLOTER.NET 12.01.26
5. • Problem
• Too many data to find the informative features
• 80 million training data, Large user and item meta data
• Few accepted results compared to many rejected results
• Take too much time for data processing
• SVM for all data: 16 days
• Lack of computing resources
• Our goal
• Train large and complex Weibo data as much as
possible in a single machine
• Find effective features with a simpler(and faster) approach
5
Challenges
6. • Online learning [2],[3]
• Learns one instance at a time
• Ex. Product searching
• Pro – minimize some performance criteria
• Con – many incorrect label feedback
• Map-Reduce [4]
• Parallel, distributed model for processing large data
• Pro – good for lots of input, intermediate and output data
• Con – bad for synchronization required data
6
Related works
9. 9
Motivation
User Keywords
Year of birth
Gender
Number of tweets
Tag-ids
Category
Keywords
User Keywords
Year of birth
Gender
Number of tweets
Tag-ids
Feature 1
Feature 2
Feature 3
Feature 4
Feature 5
Our training data!
10. • Extract features between users and items using
metadata of user and item.
• Train model by Support Vector Machine
• Libsvm in R
10
Initial Approach
Failure!
Lots of computation time: 16 days for training SVN
Lack of computational resource: single machine
12. 1. Training data (73,209,277 user-item pairs)
- applying target ID, 38,332,489 user-item pairs
2. Test data (public, 2,617,106 user-item pairs)
3. Used features
- User’s number of tweet
- User’s number of tag
- Age similarity
- Item’s number of tweet
- Item’s number of tag
- Gender similarity
- Network similarity
- Number of Item’s follower
- Keyword similarity
4. Construct separate models using each feature
5. Evaluation metrics : F1 score, MAP@3
6. Baseline : Random prediction
12
Experiment Setup
13. • Age similarity = zscore( ||user_age – item_age|| )
• Gender similarity =
1
−1
0
𝑖𝑓 𝑠𝑎𝑚𝑒 𝑔𝑒𝑛𝑑𝑒𝑟
𝑖𝑓 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡 𝑔𝑒𝑛𝑑𝑒𝑟
𝑖𝑓 𝑢𝑛𝑘𝑜𝑤𝑛 𝑔𝑒𝑛𝑑𝑒𝑟
• Z-scored number of tweets from user
• Z-scored number of tweets from item
• Z-scored number of tags from user
• Z-scored number of tags from item
• Z-scored number of followers of item
13
Feature Extraction
14. • Keyword similarity =
𝑢𝑠𝑒𝑟_ 𝑘𝑒𝑦𝑤𝑜𝑟𝑑 ∙ 𝑖𝑡𝑒𝑚_𝑘𝑒𝑦𝑤𝑜𝑟𝑑
𝑢𝑠𝑒𝑟_ 𝑘𝑒𝑦𝑤𝑜𝑟𝑑 𝑖𝑡𝑒𝑚 _𝑘𝑒𝑦𝑤𝑜𝑟𝑑
: (cosine similarity)
1. Reduce lower document frequency(DF) under 20%. (255,141 → 2,507)
2. Using PCA, reduce the dimension (2,507 → 1,191) by choosing the k
as follow :
Begin k=1:N (number of total PC)
when
error = 1 −
λ 𝑖
𝑘
𝑖=1
λ 𝑖
𝑁
𝑖=1
≤ 0.05
End
14
Feature Extraction
16. • Homophilly
• Similar people get together!
Age Similarity, Gender Similarity
16
Background of choosing features
17. • Friend recommendation in Facebook
17
Background of choosing features
Common Friends
Works!!!!!
18. 18
Results
• All models outperformed random predictor
• Network similarity showed the highest f1 score
• Model using all features showed the best performance
• Top-5 model covers more accepted items compared to the model using all features
• Interestingly, prediction conducted by only two feature,
age similarity and network similarity, presented similar results with Top-5 model.
19. • Contribution
• Successfully trained large data set with a light classifier
• Found many features by analyzing meta data
• We saw the unseen
• Limitation
• Our models fairly showed good prediction results,
but they are not comparable to the level of KDD-Cup winners
• Possible solution: ensemble learning
• to make the best model using multiple weak classifiers(predictors)
19
Discussion
20. • Power of feature scaling
• Importance of learning rate
• Difficulty of handling Big Data
• Data reduction technique is essential for handling
large dimensional data.
20
What we learned
22. [1] Phelan, Owen, Kevin McCarthy, and Barry Smyth. "Using twitter to
recommend real-time topical news." Proceedings of the third ACM c
onference on Recommender systems. ACM, 2009.
[2] Littlestone, Nick. "Learning quickly when irrelevant attributes abo
und: A new linear-threshold algorithm." Machine learning 2.4 (1988):
285-318.
[3] Mairal, Julien, et al. "Online learning for matrix factorization and s
parse coding." The Journal of Machine Learning Research 11 (2010):
19-60.
[4] Tang, Jie, et al. "Social influence analysis in large-scale networks."
Proceedings of the 15th ACM SIGKDD international conference on K
nowledge discovery and data mining. ACM, 2009.
22
References