This was a great presentation from Blake Shaw at foursquare on using foursquare data to look at the world around us. Very cogent intersection of local & big data.
Two further methods for obtaining post-quantum security are discussed, namely code-based and isogeny-based cryptography. Topic 1: Revocable Identity-based Encryption from Codes with Rank Metric (will be presented by Dr. Reza Azarderakhsh) Authors: Donghoon Chang; Amit Kumar Chauhan; Sandeep Kumar; Somitra Kumar Sanadhya Topic 2: An Exposure Model for Supersingular Isogeny Diffie-Hellman Key Exchange Authors: Brian Koziel; Reza Azarderakhsh; David Jao
(Source: RSA Conference USA 2018)
Understanding High-dimensional Networks for Continuous Variables Using ECLHPCC Systems
Â
Syed Rahman & Kshitij Khare, University of Florida, present at the 2016 HPCC Systems Engineering Summit Community Day.
The availability of high dimensional data (or âbig dataâ) has touched almost every field of science and industry. Such data, where the number of variables (features) is often much higher than the number of samples, is now more pervasive than it has ever been. Discovering meaningful relationships between the variables in such data is one of the major challenges that modern day data scientists have to contend with.
The covariance matrix of the variables is the most fundamental quantity that can help us understand the complex multivariate relationships in the data. In addition to estimating the inverse covariance matrix, CSCS can be used to detect the edges in a directed acyclic graph, as opposed to the edges an undirected graph, which CONCORD (presented at the 2015 summit) was used for.
Similar to the CONCORD algorithm, the CSCS algorithm works by minimizing a convex objective function through a cyclic coordinate minimization approach. In addition, it is theoretically guaranteed to converge to a global minimum of the objective function. One of the main advantage of CSCS is that each row can be calculated independently of the other rows, and thus we are able to harness the power of distributed computing.
Syed Rahman
Syed Rahman is a PhD student in the Statistics department at the University of Florida working under the supervision of Dr. Kshitij Khare. He is interested in high-dimensional covariance estimation. In 2015, Syed programmed the CONCORD algorithm in ECL and presented this at the HPCC Systems Engineering Summit.
Kshitij Khare
Kshitij Khare is an Associate Professor of Statistics at the University of Florida. He earned his Ph.D. in Statistics from Stanford University in 2009. He has a variety of interests, which include covariance/network estimation in high-dimensional datasets, and Bayesian inference using Markov chain Monte Carlo methods. One of Dr. Khare's major research focus is development of novel statistical methods and algorithms for "big data" or high-dimensional data.
Towards typesafe deep learning in scalaTongfei Chen
Â
Deep learning is predominantly done in Python, a type-unsafe language. Try changing this unfortunate fact by doing it in Scala instead! With expressive types, compile-time typechecking, and awesome DSLs, we present Nexus, a typesafe, expressive and succinct deep learning framework.
Slides for the presentation at ENBIS 2018 of "Deep k-Means: Jointly Clustering with k-Means and Learning Representations" by Thibaut Thonet. Joint work with Maziar Moradi Fard and Eric Gaussier.
On Optimization of Network-coded Scalable Multimedia Service MulticastingAndrea Tassi
Â
In the near future, the delivery of multimedia multicast services over next-generation networks is likely to become one of the main pillars of future cellular networks. In this extended abstract, we address the issue of efficiently multicasting layered video services by defining a novel optimization paradigm that is based on an Unequal Error Protection implementation of Random Linear Network Coding, and aims to ensure target service coverages by using a limited amount of radio resources.
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb) Charles Deledalle
Â
Moving averages. Finite differences and edge detectors. Gradient, Sobel and Laplacian. Linear translations invariant filters, cross-correlation and convolution. Adaptive and non-linear filters. Median filters. Morphological filters. Local versus global filters. Sigma filter. Bilateral filter. Patches and non-local means. Applications to image denoising.
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
Â
Presentation given on VLDB 2016: 42nd International Conference on Very Large Data Bases.
Paper: http://dx.doi.org/10.14778/2977797.2977800
ArXiv: https://arxiv.org/abs/1605.05219
Poster: https://zenodo.org/record/61653 (doi 10.5281/zenodo.61653)
Gumbo Software: https://github.com/JonnyDaenen/Gumbo
Abstract
While services such as Amazon AWS make computing power abundantly available, adding more computing nodes can incur high costs in, for instance, pay-as-you-go plans while not always significantly improving the net running time (aka wall-clock time) of queries. In this work, we provide algorithms for parallel evaluation of SGF queries in MapReduce that optimize total time, while retaining low net time. Not only can SGF queries specify all semi-join reducers, but also more expressive queries involving disjunction and negation. Since SGF queries can be seen as Boolean combinations of (potentially nested) semi-joins, we introduce a novel multi-semi-join (MSJ) MapReduce operator that enables the evaluation of a set of semi-joins in one job. We use this operator to obtain parallel query plans for SGF queries that outvalue sequential plans w.r.t. net time and provide additional optimizations aimed at minimizing total time without severely affecting net time. Even though the latter optimizations are NP-hard, we present effective greedy algorithms. Our experiments, conducted using our own implementation Gumbo on top of Hadoop, confirm the usefulness of parallel query plans, and the effectiveness and scalability of our optimizations, all with a significant improvement over Pig and Hive.
Distributed Architecture of Subspace Clustering and RelatedPei-Che Chang
Â
Distributed Architecture of Subspace Clustering and Related
Sparse Subspace Clustering
Low-Rank Representation
Least Squares Regression
Multiview Subspace Clustering
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
đ Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Two further methods for obtaining post-quantum security are discussed, namely code-based and isogeny-based cryptography. Topic 1: Revocable Identity-based Encryption from Codes with Rank Metric (will be presented by Dr. Reza Azarderakhsh) Authors: Donghoon Chang; Amit Kumar Chauhan; Sandeep Kumar; Somitra Kumar Sanadhya Topic 2: An Exposure Model for Supersingular Isogeny Diffie-Hellman Key Exchange Authors: Brian Koziel; Reza Azarderakhsh; David Jao
(Source: RSA Conference USA 2018)
Understanding High-dimensional Networks for Continuous Variables Using ECLHPCC Systems
Â
Syed Rahman & Kshitij Khare, University of Florida, present at the 2016 HPCC Systems Engineering Summit Community Day.
The availability of high dimensional data (or âbig dataâ) has touched almost every field of science and industry. Such data, where the number of variables (features) is often much higher than the number of samples, is now more pervasive than it has ever been. Discovering meaningful relationships between the variables in such data is one of the major challenges that modern day data scientists have to contend with.
The covariance matrix of the variables is the most fundamental quantity that can help us understand the complex multivariate relationships in the data. In addition to estimating the inverse covariance matrix, CSCS can be used to detect the edges in a directed acyclic graph, as opposed to the edges an undirected graph, which CONCORD (presented at the 2015 summit) was used for.
Similar to the CONCORD algorithm, the CSCS algorithm works by minimizing a convex objective function through a cyclic coordinate minimization approach. In addition, it is theoretically guaranteed to converge to a global minimum of the objective function. One of the main advantage of CSCS is that each row can be calculated independently of the other rows, and thus we are able to harness the power of distributed computing.
Syed Rahman
Syed Rahman is a PhD student in the Statistics department at the University of Florida working under the supervision of Dr. Kshitij Khare. He is interested in high-dimensional covariance estimation. In 2015, Syed programmed the CONCORD algorithm in ECL and presented this at the HPCC Systems Engineering Summit.
Kshitij Khare
Kshitij Khare is an Associate Professor of Statistics at the University of Florida. He earned his Ph.D. in Statistics from Stanford University in 2009. He has a variety of interests, which include covariance/network estimation in high-dimensional datasets, and Bayesian inference using Markov chain Monte Carlo methods. One of Dr. Khare's major research focus is development of novel statistical methods and algorithms for "big data" or high-dimensional data.
Towards typesafe deep learning in scalaTongfei Chen
Â
Deep learning is predominantly done in Python, a type-unsafe language. Try changing this unfortunate fact by doing it in Scala instead! With expressive types, compile-time typechecking, and awesome DSLs, we present Nexus, a typesafe, expressive and succinct deep learning framework.
Slides for the presentation at ENBIS 2018 of "Deep k-Means: Jointly Clustering with k-Means and Learning Representations" by Thibaut Thonet. Joint work with Maziar Moradi Fard and Eric Gaussier.
On Optimization of Network-coded Scalable Multimedia Service MulticastingAndrea Tassi
Â
In the near future, the delivery of multimedia multicast services over next-generation networks is likely to become one of the main pillars of future cellular networks. In this extended abstract, we address the issue of efficiently multicasting layered video services by defining a novel optimization paradigm that is based on an Unequal Error Protection implementation of Random Linear Network Coding, and aims to ensure target service coverages by using a limited amount of radio resources.
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb) Charles Deledalle
Â
Moving averages. Finite differences and edge detectors. Gradient, Sobel and Laplacian. Linear translations invariant filters, cross-correlation and convolution. Adaptive and non-linear filters. Median filters. Morphological filters. Local versus global filters. Sigma filter. Bilateral filter. Patches and non-local means. Applications to image denoising.
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
Â
Presentation given on VLDB 2016: 42nd International Conference on Very Large Data Bases.
Paper: http://dx.doi.org/10.14778/2977797.2977800
ArXiv: https://arxiv.org/abs/1605.05219
Poster: https://zenodo.org/record/61653 (doi 10.5281/zenodo.61653)
Gumbo Software: https://github.com/JonnyDaenen/Gumbo
Abstract
While services such as Amazon AWS make computing power abundantly available, adding more computing nodes can incur high costs in, for instance, pay-as-you-go plans while not always significantly improving the net running time (aka wall-clock time) of queries. In this work, we provide algorithms for parallel evaluation of SGF queries in MapReduce that optimize total time, while retaining low net time. Not only can SGF queries specify all semi-join reducers, but also more expressive queries involving disjunction and negation. Since SGF queries can be seen as Boolean combinations of (potentially nested) semi-joins, we introduce a novel multi-semi-join (MSJ) MapReduce operator that enables the evaluation of a set of semi-joins in one job. We use this operator to obtain parallel query plans for SGF queries that outvalue sequential plans w.r.t. net time and provide additional optimizations aimed at minimizing total time without severely affecting net time. Even though the latter optimizations are NP-hard, we present effective greedy algorithms. Our experiments, conducted using our own implementation Gumbo on top of Hadoop, confirm the usefulness of parallel query plans, and the effectiveness and scalability of our optimizations, all with a significant improvement over Pig and Hive.
Distributed Architecture of Subspace Clustering and RelatedPei-Che Chang
Â
Distributed Architecture of Subspace Clustering and Related
Sparse Subspace Clustering
Low-Rank Representation
Least Squares Regression
Multiview Subspace Clustering
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
đ Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
Â
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Â
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Â
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dev Dives: Train smarter, not harder â active learning and UiPath LLMs for do...UiPathCommunity
Â
đĽ Speed, accuracy, and scaling â discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Miningâ˘:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing â with little to no training required
Get an exclusive demo of the new family of UiPath LLMs â GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
đ¨âđŤ Andras Palfi, Senior Product Manager, UiPath
đŠâđŤ Lenka Dulovicova, Product Program Manager, UiPath
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Â
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
Â
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
Â
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Â
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navyâs DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATOâs (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
DevOps and Testing slides at DASA ConnectKari Kakkonen
Â
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Â
Building better applications for business users with SAP Fiori.
⢠What is SAP Fiori and why it matters to you
⢠How a better user experience drives measurable business benefits
⢠How to get started with SAP Fiori today
⢠How SAP Fiori elements accelerates application development
⢠How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
⢠How SAP Fiori paves the way for using AI in SAP apps
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Â
Monitoring and observability arenât traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current companyâs observability stack.
While the dev and ops silo continues to crumbleâŚ.many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
Â
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
1. Machine Learning with
Large Networks of
People and Places
Blake Shaw, PhD
Data Scientist @ Foursquare
@metablake
2.
3. What is foursquare?
An app that helps you
explore your city and
connect with friends
A platform for location
based services and
data
4. What is foursquare?
People use foursquare to:
⢠share with friends
⢠discover new places
⢠get tips
⢠get deals
⢠earn points and badges
⢠keep track of visits
12. People connect places over time
â˘Places people go after the Museum of Modern
Art (MOMA):
⢠MOMA Design Store, Metropolitan Museum of Art,
Rockefeller Center, The Modern, Abby Aldrich
Rockefeller, Sculpture Garden, Whitney Museum of
American Art, FAO Schwarz
â˘Places people go after the Statue of Liberty:
⢠Ellis Island Immigration Museum, Battery Park, Liberty
Island, National September 11 Memorial, New York
Stock Exchange, Empire State Building
13. Predicting where people will go next
â˘Cultural
 places
 (landmarks
 etc.)
â˘Bus
 stops,
 subways,
 train
 staHons
â˘Airports
â˘College
 Places
â˘Nightlife
AMer
 âbarsâ:
 american
 restaurants,
Â
nightclubs,
 pubs,
 lounges,
 cafes,
 hotels,
Â
pizza
 places
AMer
 âcoďŹee
 shopsâ:
 oďŹces,
 cafes,
Â
grocery
 stores,
 dept.
 stores,
 malls
15. Collaborative ďŹltering
â˘Item-Item similarity
⢠Find items which are similar to items that a user has
already liked
â˘User-User similarity
⢠Find items from users similar to the current user
â˘Low-rank matrix factorization
⢠First ďŹnd latent low-dimensional coordinates of users
and items, then ďŹnd the nearest items in this space to
a user
[Koren, Bell â08]
16. Collaborative ďŹltering
â˘Item-Item similarity
⢠Pro: can easily update w/ new data for a user
⢠Pro: explainable e.g âpeople who like Joeâs pizza, also
like Lombardiâsâ
⢠Con: not as performant as richer global models
â˘User-User similarity
⢠Pro: can leverage social signals here as well... similar
can mean people you are friends with, whom youâve
colocated with, whom you follow, etc...
17. Finding similar items
â˘Large sparse k-nearest neighbor problem
â˘Items can be places, people, brands
â˘Different distance metrics
⢠Need to exploit sparsity otherwise
intractable
18. Finding similar items
â˘Metrics we ďŹnd work best for recommending:
⢠Places: cosine similarity
⢠Friends: intersection
⢠Brands: Jaccard similarity
sim(xi, xj) =
xixj
kxikkxj k
sim(A, B) = |A B|
sim(A, B) = |AB|
|A[B|
19. Computing venue similarity
X 2 RnâĽd
each
 entry
 is
 the
 log(#
 of
Â
checkins
 at
 place
 i
 by
 user
 j)
one
 row
 for
 every
 30m
Â
venues...
K 2 RnâĽn
Kij = sim(xi, xj)
=
xixj
kxikkxjk
20. Computing venue similarity
K 2 RnâĽn
Kij = sim(xi, xj)
=
xixj
kxikkxjk
⢠Naive solution for
computing :
â˘Requires ~4.5m
machines to compute
in < 24 hours!!! and
3.6PB to store!
K
O(n2
d)
21. Venue similarity w/ map reduce
visited venues map
reduce
vi, vj
vi, vj
vi, vj
user
emit âallâ pairs of visited venues
for each user
Sum up each userâs score
contribution to this pair of venues
score
score
key
key
score scorescore ...
...
ďŹnal score
22. The Social Graph
â˘15m person social network w/ lots of
different interaction types:
â˘friends
â˘follows
â˘dones
â˘comments
â˘colocation
27. The Social Graph
How can we better visualize this network?
A 2 BnâĽn L 2 RnâĽd
28. Graph embedding
â˘Spring Embedding - Simulate physical system
by iterating Hookeâs law
â˘Spectral Embedding - Decompose adjacency
matrix with an SVD and use eigenvectors
with highest eigenvalues for coordinates
â˘Laplacian eigenmaps[Belkin, Niyogi â02] - form graph
laplacian from adjacency matrix, ,
apply SVD to and use eigenvectors with
smallest non-zero eigenvalues for coordinates
A
L = D A
L
29. Preserving structure
A connectivity algorithm such as k-nearest
neighbors should be able to recover the edges
from the coordinates such that
Embedding
Connectivity G(K)
Edges Points
G(K)
G(K) = A
30. Structure Preserving Embedding
â˘SDP to learn an embedding from
â˘Linear constraints on preserve the global
topology of the input graph
â˘Convex objective favors low-rank close
to the spectral solution, ensuring low-
dimensional embedding
â˘Use eigenvectors of with largest
eigenvalues as coordinates for each node
K A
K
K
K
[Shaw, Jebara â09]
31. Structure Preserving Embedding
A 2 BnâĽn
K 2 RnâĽn
L 2 RnâĽd
max
KâĽK
tr(KA)
Dij > (1 Aij) max
m
(AimDim) â§i,j
where K = {K ⤠0, tr(K) ⼠1,
P
ij Kij = 0}
SDP SVD
[Shaw, Jebara â09]
Dij = Kii + Kjj 2Kij
32. Large-scale SPE
constraint to be written as tr(ClK
tr(ClK) = Kjj 2Kij + 2Kik Kkk.
i
j
k
i j
k
tr(ClK) < 0
tr(ClK) > 0
f(L) = âĽtr(LâĽ
LA)
X
l S
max(tr(ClLâĽ
L), 0).
SDP SVD
a-
s,
d
s-
o-
g-
x
st
m
A,
st
re
.
h
of
at
ct
a
n
ng
neighbors, b-matching, or maximum weight spanning
tree) which accepts as input a kernel K specifying an
embedding and returns an adjacency matrix, we call an
embedding structure preserving if the application of G
to K exactly reproduces the input graph: G(K) = A.
Linear constraints on K enforce that K preserves the
topology of the input adjacency matrix
DeďŹne distance and weight in terms of K:
Dij = Kii + Kjj 2Kij
Wij = Dij = Kii Kjj + 2Kij
G(K) = arg max
ËA
ij
Wij
ËAij s.t. ËA â T
k-nearest neighbors
Dij > (1 Aij) max
m
(AimDim)
-balls blah blah
Dij(Aij
1
2
) ⼠(Aij
1
2
)
maximum weight subgraph method blah blah
When the connectivity algorithm G(K) is a maximum
A
SGD
LSPE-SGD
orm
A,
llest
here
ces.
hich
et of
that
ruct
s a
s an
ning
bors
ns a
then
alue
ectly
a triplet (i, j, k) such that Aij = 1 and Aik = 0.
This set of all triplets clearly subsumes the dis-
tance constraints above, and allows each individual
constraint to be written as tr(ClK) > 0 where
tr(ClK) = Kjj 2Kij + 2Kik Kkk. Temporarily
dropping the centering and scaling constraints, we
can now formulate the SDP above as maximizing the
following objective function over L:
f(L) = âĽtr(Lâ¤
LA)
X
lâĽS
max(tr(ClLâ¤
L), 0).
We will maximize f(L) via projected stochastic sub-
gradient decent. DeďŹne the subgradient in terms of a
single randomly chosen triplet:
(f(L), Cl) =
(
2L(âĽA Cl) if tr(ClLâ¤
L) > 0
0 otherwise
and for each randomly chosen triplet constraint Cl, if
tr(ClLâ¤
L) > 0 then update L according to:
Lt+1 = Lt + (f(Lt), Cl)
where the step-size = 1â
t
. After each step, we
can use projection to enforce that tr(Lâ¤
L) ⼠1 andP
ij(Lâ¤
L)ij = 0, by subtracting the mean from L and
dividing each entry of L by its Frobenius norm.
Algorithm 1 Large-Scale Struct
bedding
Require: A ⧠Bn n
, dimension
parameter âĽ, and maximum i
1: Initialize L0 â rand(d, n)
(or optionally initialize to sp
Laplacian eigenmaps solution)
2: t â 0
3: repeat
4: t â 1â¤
t+1
5: i â rand(1 . . . n)
c-
ot
m-
D
in
an
st
ng
y-
x,
li-
h:
yields exponential number of constraints of form:
Structure preserving constraints can also beneďŹt
dimensionality reduction algorithms. These methods
similarly ďŹnd compact coordinates that preserve
certain properties of the input data. Many of these
manifold learning techniques preserve local distances
but not graph topology. We show that adding explicit
topological constraints to these existing algorithms is
crucial for preventing folding and collapsing problems
that occur in dimensionality reduction.
a
n
ue
y
and for each randomly chosen triplet constraint Cl, if
tr(ClLâ¤
L) > 0 then update L according to:
Lt+1 = Lt + (f(Lt), Cl)
where the step-size = 1â
t
. After each step, we
can use projection to enforce that tr(Lâ¤
L) ⼠1 andP
ij(Lâ¤
L)ij = 0, by subtracting the mean from L and
dividing each entry of L by its Frobenius norm.
Structure Preserving Embedding
optimized via SGD
From only connectivity information describing which
nodes in a graph are connected, can we learn a set of
low-dimensional coordinates for each node such that
these coordinates can easily be used to reconstruct
the original structure of the network?
As ďŹrst proposed, SPE learns a matrix K via a
semideďŹnite program (SDP) and then decomposes
K = LâĽ
L by performing singular value decom-
position. We propose optimizing L directly using
stochastic gradient descent (SGD).
f(L) = âĽtr(LâĽ
LA)
X
l S
max(tr
We will maximize f(L) via projec
gradient decent. DeďŹne the subgra
single randomly chosen triplet:
(f(L), Cl) =
(
2L(âĽA Cl) if tr
0 otherwise
and for each randomly chosen trip
tr(ClLâĽ
L) > 0 then update L acco
Lt+1 = Lt + (f(Lt),
where the step-size = 1â¤
t
. A
can use projection to enforce thatP
ij(LâĽ
L)ij = 0, by subtracting th
dividing each entry of L by its Fro
j
Tony Jebara
Computer Science Dept.
Columbia University
New York, NY 10027
liza-
ngs,
cted
dis-
f so-
log-
trix
hest
orm
A,
llest
here
es.
hich
SPE for greedy nearest-neighbor constraints solves the
following SDP:
max
K K
tr(KA)
Dij > (1 Aij) max
m
(AimDim) â i,j
where K = {K ⤠0, tr(K) ⼠1,
P
ij Kij = 0}
Structure preserving constraints can be written
as a set of matrices S = {C1, C2, ...Cm}, where
each Cl is a constraint matrix corresponding to
a triplet (i, j, k) such that Aij = 1 and Aik = 0.
This set of all triplets clearly subsumes the SPE
distance constraints, and allows each individual
constraint to be written as tr(ClK) > 0 where
tr(ClK) = Kjj 2Kij + 2Kik Kkk.
Temporarily dropping the centering and scaling
constraints, we can now formulate the SDP above as
maximizing the following objective function over L:
reserving Embedding
Tony Jebara
Computer Science Dept.
Columbia University
New York, NY 10027
liza-
ings,
cted
dis-
f so-
olog-
SPE for greedy nearest-neighbor constraints solves the
following SDP:
max
K K
tr(KA)
Dij > (1 Aij) max
m
(AimDim) â i,j
where K = {K ⤠0, tr(K) ⼠1,
P
ij Kij = 0}
Structure preserving constraints can be written
where the step-size = 1â¤
t
.
can use projection to enforce thP
ij(LâĽ
L)ij = 0, by subtracting
dividing each entry of L by its F
Imp
Red
hing
beca
impo
node
(f(L), Cl) =
(
2L(âĽA Cl) if tr(ClLâĽ
L) > 0
0 otherwise
Lt+1 = Lt + (f(Lt), Cl)
[Shaw, Jebara â11]
34. Notes for previous slide:
Each node in this network is a person, each edge represents friendship on
foursquare. The size of each node is proportional to how many friends that
person has. We can see the existence of dense clusters of users, on the right,
the top, and on the left. There is a large component in the middle. There are
clear hubs.
We can now use this low-dimensional representation of this high-dimensional
network, to better track what happens when a new coffee shop opens in the east
village.
As expected, it spreads ...like a virus, across this social substrate. We see as
each person checks in to la colombe, their friends light up. People who have
discovered the place are shown in blue. The current checkin is highlighted in
orange in orange.
Itâs amazing to see how la colombe spreads. Many people have been talking
about how ideas, tweets, and memes spread across the internet. For the ďŹrst
time we can track how new places opening in the real world spread in a similar
way.
35. The Social Graph
â˘What does this low-dimensional structure
mean?
â˘Homophily
â˘Location, Demographics, etc.
37. InďŹuence on foursquare
â˘Tip network
⢠sample of 2.5m people âdoingâ tips from other
people and brands
⢠avg. path length 5.15, diameter 22.3
â˘How can ďŹnd the authoritative people in this
network?
38. Measuring inďŹuence w/ PageRank
⢠Iterative approach
â˘start with random values and iterate
â˘works great w/ map-reduce
PR(i) = (1 d) + d
X
j2{Aij =1}
PR(j)
P
k Aik
[Page et al â99]
39. Measuring inďŹuence w/ PageRank
⢠Equivalent to ďŹnding the principal
eigenvector of the normalized adj. matrix
A 2 BnâĽn Pij =
Aij
P
j Aij
PR(i) / vi where Pv = 1v
[Page et al â99]
40. InďŹuence on foursquare
â˘Most inďŹuential brands:
⢠History Channel, Bravo TV, National Post,
Eater.com, MTV, Ask Men, WSJ, Zagat, NY
Magazine, visitPA, Thrillist, Louis Vuitton
â˘Most inďŹuential users
⢠Lockhart S, Jeremy B, Naveen S
43. Putting it all together
Userâs check-in history
Friendsâ check-in history,
similarity
Nearby relevant
venues
SimilarVenues
MOAR Signals
< 200 ms
44. Our data stack
⢠MongoDB
⢠Amazon S3, Elastic Mapreduce
⢠Hadoop
⢠Hive
⢠Flume
⢠R and Matlab
45. Open questions
â˘What are the underlying properties and
dynamics of these networks?
â˘How can we predict new connections?
â˘How do we measure inďŹuence?
â˘Can we infer real-world social networks?
46. Conclusion
â˘Unique networks formed by people interacting
with each other and with places in the real
world
â˘Massive scale -- today we are working with
millions of people and places here at
foursquare, but there are over a billion devices
in the world constantly emitting this signal of
userid, lat, long, timestamp
47. Join us!
foursquare is hiring!
110+ people and growing
foursquare.com/jobs
Blake Shaw
@metablake
blake@foursquare.com