In this talk we shall introduce the main ideas of TruSIS (Trust in Social Internetworking System), a Marie Curie Fellowhsip financed by European Union and hosted at VU University, Department of Computer Science, Business and Web group. The goal of TruSIS is to study the baheviour of users who affiliate to multiple social networking sites and
are active in them (e.g., users may publish personal profiles on sites like MySpace and post videos on sites like YouTube). We briefly called this scenario as SIS (Social Internetworking system).
As a first research contribution, we implemented a crawler to gather data about users and link their profiles on multiple social networking websites. To this purpose we used Google Social Graph API, a powerful API released by Google in 2008. We obtained a sample of about 1.3 millions of user accounts and 36 millions of connections between them.
Parameters from social network theory (like average clustering coefficient, network modularity and so on) were used to study the structural properties of the gathered sample and how these properties depend on user behavious.
A second contribution is about the computation of distance between two users in a SIS on the basis of their social ties. We used a popular parameter from Social Network Theory known as Katz coeffcient and
provide a computationally afficient approach to computing Katz coefficient which relies on the usage of a popular tool from linear algebra known as Sherman- Morrison formula.
Finally, we shall describe our work on extending the notion of trust from single social networks to a SIS. We describe the main research challenges tied to the definition of trust and how they relate to Semantic Web technologies.
2. “
Social Networking
• Explosive
growth
in
number
of
sites
and
users
– Facebook
350
mil
users
(bigger
than
US),
the
third
biggest
country
(Feb
2010)
– Used
for
adverEsing,
public
life,
etc
• Social
Networking
APIs
to
gather
data
on
users,
their
relaEonships
and
acEviEes
– Leskovec
&
Horowitz
(WWW‘08)
analyzed
240
mil
MSN
contacts
– Kwan
et
al.
(WWW‘10)
analyzed
the
whole
TwiVer
3. “
Social Network Analysis
• Study
collecEve
human
behaviour
on
a
large
scale,
e.g.
– How
node
degree
is
distributed?
– Do
small
world
phenomenon
emerge?
– Are
nodes
clustered
into
groups?
– What
are
the
different
user
informaEon
sharing
tasks?
– How
do
they
connect
with
different
communiEes?
4. Social Internetworking
• Users
affiliate
to
mulEple
social
spaces
– e.g.
UK
adults
have
~1.6
online
profiles,
and
39%
of
those
with
one
profile
have
at
least
two
other
profiles
• Pla`orm(s)
for
data
portability
among
social
networks
5. Social Internetworking System
• Provide
mechanisms
to:
– help
users
find
reliable
users
– disclose
malicious
users/spammers
– sEmulate
the
level
of
user
parEcipaEon
– deal
with
trust
in
linked
data
– deal
with
different
contexts
and
policies
for
accessing,
publishing
and
re-‐distribuEng
data
6. What do we aim for …
• model
to
represent
Social
Internetworking
components
&
their
rela4onships
• understand
Social
Internetworking
structural
proper4es
and
see
how
it
differs
from
tradiEonal
social
networks
• model
to
compute
trust
&
reputa4on
based
on
linked
data
7. Some requirements …
• Trust
-‐
4ed
to
user’s
performance,
i.e.,
beneficial
contribuEons
to
other
users
• Users
are
involved
in
a
range
of
ac4vi4es,
e.g.,
tagging,
posEng
comments,
raEng
• A
range
of
heterogeneous
en44es,
e.g.
users,
resources,
comments,
raEngs
and
their
interacEons
(vs.
single
role
nodes
in
graphs)
• Edges
need
to
support
n-‐ary
rela4onships
• Mul4-‐dimensional
network
8. SIS Pilot 1
• Social
Web
Crawler
– Google
Social
Graph
API
– XFN
and
FOAF
markups;
me
edges,
i.e.,
accounts
located
in
different
social
networks
referring
to
the
same
individual
• BFS
of
Social
Web
– 1
305
112
user
accounts
– 36
278
838
connecEons
between
user
accounts
Flickr
Twitter
LiveJournal
Others
9. Goal of the Pilot
The
pilot
has
three
main
goals:
• relaEonship
between
structural
properEes
of
a
SIS
and
human
behaviour
• how
can
we
take
advantage
of
global
knowledge
harnessed
in
a
SIS
• how
these
results
contribute
to
the
TruSIS
trust
definiEon
10. Goal of the Pilot
Goal
1:
We
found
that
some
structural
properEes
of
a
SIS
can
be
explained
in
terms
of
user
behaviours:
Example:
node
degree
distribuEon
shows
a
power
law
indicaEng
that
few
users
are
quite
acEve
(e.g.,
they
rate
many
objects,
post
many
comments,
and
so
on)
while
the
vast
majority
is
almost
inacEve.
11. Goal of the Pilot
Goal
2:
We
found
that
knowledge
in
a
SIS
is
useful
to
solve
cold
start
problems.
For
instance
assume
a
user
u
joins
a
social
network
like
Flickr
and
he
has
no
contacts
Idea:
Find
users
of
SIS
who
are
close
to
“u”
and
are
affiliated
to
Flickr
(bootstrap
user).
Suggest
them
to
u.
Problem:
When
two
users
are
close
in
a
SIS?
It
turns
to
a
known
problem
“when
two
nodes
in
a
graph
are
close”?
12. Goal of the Pilot
• Goal
3:
ConnecEvity
properEes
are
at
the
basis
of
many
algorithms
to
comput
etrust
in
social
networks
(Golbeck
2006,
Ziegler
2005,
Leskovec,
HuVenlocker
&
Kleinberg,
2010).
• We
plan
to
use
closeness
to
propagate
trust
values.
13. Pilot 1: Contact Graph Analysis
• Average
Clustering
Coefficient
(ACC)
to
assess
the
tendency
of
nodes
to
form
cliques
• High
compared
to
other
graphs
– reflects
the
high
chance
that
two
users
are
“friends”
as
there
is
a
third
person
who
is
also
their
“friend”
14. Pilot 1: Contact Graph Analysis
• edge
distribuEon
in
CG
– A
power
law
emerged
exponent
about
1.65
• distribuEon
of
me
edges
– exponent
about
3.39
• Why?
– mulEple
idenEEes
in
mulEple
social
spaces
but
no
connecEons
between
them
15. Pilot 1: Contact Graph Analysis
• High
Network
Modularity
– nodes
appear
clustered
in
groups
• Can
we
export
knowledge
of
the
user
from
one
network
to
another
(in
terms
of
trust
&
reputaEon)?
1
4
3
2
5
6
7
9
10
8
11
12
16. Calculating Closeness
• aggregaEng
informaEon
from
different
social
networks
to
determine
how
‘close’
are
users
• degree
of
closeness
of
two
users
-‐
Katz
coefficient
(Katz,
1953)
–
#
of
users
is
big
• algorithm
where
SIS
is
parEEoned
in
small
communiEes
plus
with
Sherman
Morrison
• Experimental
trials
show
that:
• We
achieve
significant
Eme
savings
• The
approximaEon
error
is
quite
small
18. In other words …
• Trust
is
defined
in
the
context
of:
– Reputa4on
(of
user)
in
a
social
network
– Impact
(of
user)
in
a
social
network
– Authority
(of
user
or
organizaEons)
• Trust
as
a
binary
rela4onship
between
users
(e.g.
A
trusts
B)
based
on
user
acEviEes:
– frequency,
quality
and
type
of
users
contribuEons
– etc.
19. For example: Reputation
• users
post
resources
&
rate
resources
posted
by
others
• To
compute
reputaEon
we
assume
that:
– User-‐high-‐reputaEon
if
the
user
authors
high
quality
resources
– Resource-‐high-‐quality
if
it
gets
a
high
average
raEng
&
posted
by
users
with
high
reputaEon
• mutual
reinforcement
principle
20. Trust in SIS
• n
=
#
of
users
m
=
#
of
resources
authored
• r(i)
=
reputaEon
of
useri
• q(j)
=
quality
of
resourcej
• e(j)
=
average
raEng
of
resourcej
• Aij
=
1
if
useri
posted
a
resourcej
Aij
=
0
otherwise
• r
=
Aq
and
q
=
AT
r
+
e
r
=
(I
–
AAT)-‐1Ae
• compute
dominant
eigenvector
of
a
symmetric
matrix
• easy
to
compute
even
if
A
gets
large
(AT
=
transpose
of
A
and
I
=
nxn
idenEty
matrix)
21. What do we try to expore …
• The
role
of
SW
in
the
definiEon,
idenEficaEon
and
reasoning
with
trust,
reputaEon,
impact
and
authority?
(e.g.,
Linked
Open
Data)
• The
role
of
trust,
reputaEon,
impact
and
authority
in
event
models,
e.g.
SEM
and
user
models,
e.g.
FOAF
22. Among others, we still need to …
• Gather
a
larger
amount
of
data
to
analyze
further
the
structural
properEes
of
SIS
• Test
the
effecEveness
of
the
approach
for
trust,
reputaEon,
impact
and
authority
compuEng
• Test
with
real
users
in
the
social
space
of
Agora
(Social
Event-‐based
History
browsing)
and
in
PrestoPrime
(Social
SemanEc
Tagging)
• Ontology-‐based
model
of
trust
and
reputaEon
in
different
domains
(with
LOD)
23. The team
• DIMET
–
University
of
Reggio
Calabria,
Italy
– Pasquale
De
Meo
– Domenico
Ursino
• External
collaborator
– University
of
Torino
– Federica
Cena