Social	
  Informa-on	
  Access	
  
-­‐-­‐	
  a	
  Personal	
  Update
2014/6/18
Agenda	
  
2
Collaborative Exploratory Search
Privacy Concerns in Social-based People Search
Some Reflections
Conclusions
Social Information Access
Scholars in Academic Social Networks
Informa-on	
  Access	
  
—  Information	
  Access:	
  an	
  interactive	
  process	
  starts	
  with	
  a	
  
user	
  noticing	
  his/her	
  needs	
  and	
  ends	
  with	
  the	
  user	
  
obtaining	
  the	
  necessary	
  information	
  
—  Iterative,	
  multiple	
  stages,	
  many	
  back	
  loops	
  
User	
  Generated	
  Content	
  
Social	
  Networks	
  
Social	
  Informa-on	
  Access	
  
—  Social	
  Information	
  Access:	
  information	
  
access	
  using	
  “community	
  wisdom”	
  	
  
—  distilled	
  from	
  the	
  actions	
  in	
  real/virtual	
  
community	
  	
  
—  Collaboration	
  in	
  explicit	
  or	
  implicit	
  manner	
  
—  Social	
  information	
  access	
  technologies	
  
capitalize	
  on	
  the	
  natural	
  tendency	
  of	
  
people	
  to	
  follow	
  direct	
  and	
  indirect	
  cues	
  
of	
  others’	
  activities	
  
—  going	
  to	
  a	
  restaurant	
  that	
  seems	
  to	
  attract	
  many	
  
customers,	
  or	
  
—  asking	
  others	
  what	
  movies	
  to	
  watch.	
  
Space	
  of	
  Social	
  Informa-on	
  Access	
  
—  [Brusilovsky2012]’s	
  	
  taxonomy	
  for	
  social	
  info	
  access	
  
6
More	
  Social	
  Informa-on	
  Access	
  
—  Collaboration	
  can	
  be	
  explicit,	
  not	
  just	
  implicit	
  
—  Explicit	
  Collaboration:	
  users	
  work	
  as	
  a	
  team	
  to	
  complete	
  the	
  same	
  
task	
  
—  Issues:	
  How	
  to	
  model	
  collaboration?	
  
7
Implicit Collaboration
Explicit Collaboration
More	
  Social	
  Informa-on	
  Access	
  
—  Target	
  can	
  be	
  people,	
  and	
  people’s	
  social	
  connections	
  	
  
are	
  important	
  
—  Relationship	
  is	
  as	
  important	
  as	
  the	
  documents	
  generated	
  by	
  the	
  people	
  
—  Issues:	
  What	
  are	
  the	
  impacts	
  of	
  privacy	
  concerns?	
  
8
More	
  Social	
  Informa-on	
  Access	
  
—  User	
  generated	
  content	
  can	
  be	
  generic,	
  or	
  academic	
  
—  E-­‐Science	
  and	
  CyberScholarship	
  are	
  increasingly	
  popular	
  
—  Issues:	
  What	
  are	
  scientists	
  doing	
  online?	
  
9
Popular Social Networks Academic Online Social Networks
Explicit	
  
Collaboration:	
  
Collaborative	
  
Exploratory	
  Search
Collaborate with Zhen Yue, Shuguang Han
Collabora-ve	
  Exploratory	
  Search	
  
11
—  Daily	
  collaborative	
  Web	
  
search	
  increased	
  
dramatically	
  
—  0.9%	
  in	
  2006	
  -­‐>	
  11%	
  in	
  2012	
  
(Morris,	
  2013)	
  
Complex	
  interactions	
  -­‐>	
  
difficult	
  to	
  study	
  
collaborative	
  search	
  
processes	
  
—  Many	
  studies	
  on	
  individual	
  search	
  process	
  	
  
—  Well-­‐established	
  models	
  such	
  as	
  Kuhlthau’s	
  model	
  (Kuhlthau,	
  1991)	
  
and	
  Marchionini’s	
  model	
  (Marchionini,	
  1995)	
  
—  Many	
  studies	
  on	
  analyzing	
  the	
  transition	
  patterns	
  of	
  actions	
  (Chen	
  
&	
  Cooper,	
  2002),	
  search	
  tactics	
  (Xie	
  &	
  Joo,	
  2010)	
  or	
  search	
  strategies	
  
(Belkin,	
  1995)	
  in	
  the	
  process	
  	
  
—  A	
  few	
  studies	
  on	
  collaborative	
  search	
  process	
  
—  Several	
  studies	
  look	
  into	
  the	
  application	
  of	
  individual	
  search	
  
models	
  in	
  the	
  collaborative	
  environment	
  (Hyldegard,	
  2006;	
  Shah	
  &	
  
González-­‐ibáñez,	
  2010).	
  
—  An	
  investigation	
  on	
  before,	
  during	
  and	
  after	
  search	
  stages	
  in	
  social	
  
search	
  (Evans	
  &	
  Chi,	
  2008).	
  
Status	
  on	
  Search	
  Processes	
  
12
Research	
  Ques-ons	
  
—  RQ1.	
  How	
  to	
  model	
  the	
  search	
  states	
  in	
  the	
  
collaborative	
  exploratory	
  search	
  process?	
  	
  What	
  are	
  the	
  
characteristics	
  of	
  search	
  states	
  in	
  the	
  collaborative	
  
exploratory	
  search	
  process?	
  
—  Search	
  states:	
  basic	
  units	
  of	
  a	
  search	
  process	
  
—  RQ2.	
  What	
  are	
  the	
  characteristics	
  of	
  query	
  behaviors	
  in	
  
the	
  collaborative	
  exploratory	
  search	
  process?	
  
—  RQ3.	
  What	
  are	
  the	
  characteristics	
  of	
  communications	
  
between	
  team	
  members	
  in	
  the	
  collaborative	
  
exploratory	
  search	
  process?	
  
13
Holistic view of collaborative
search processes
Two key components of the
collaborative search process
CollabSearch:	
  a	
  Collaborative	
  Search	
  System	
  	
  
http://crystal.exp.sis.pitt.edu:8080/CollaborativeSearch/
q  Search functions 
- Web Search
- Save/edit/rate/tag Web pages/snippets
- Space for search task description
q  Collaboration functions
- Chat
- Share search queries
- Share saved Web pages/snippets
System:	
  CollabSearch	
  
14
Experiment	
  Design	
  
—  Two	
  Experiment	
  Conditions	
  
—  COL:	
  Collaborative	
  search	
  	
  
–  Two	
  participants	
  worked	
  as	
  a	
  team	
  on	
  the	
  same	
  search	
  task	
  simultaneously.	
  	
  
—  IND:	
  Individual	
  search	
  	
  
–  One	
  participant	
  worked	
  on	
  the	
  search	
  task	
  individually.	
  	
  
—  Participants	
  	
  
—  36	
  participants	
  (18	
  pairs)	
  in	
  COL	
  condition	
  
—  18	
  participants	
  in	
  the	
  IND	
  condition	
  
—  Two	
  Exploratory	
  Search	
  Tasks	
  (30mins/task)	
  
—  Information-­‐gathering	
  task	
  	
  
–  collecting	
  information	
  for	
  writing	
  a	
  report	
  on	
  the	
  impact	
  of	
  social	
  network	
  
(Shah	
  &	
  Marchionini,	
  2010).	
  	
  
—  Decision-­‐making	
  task	
  	
  
–  collecting	
  information	
  for	
  planning	
  a	
  trip	
  to	
  Finland	
  (Paul	
  &	
  Morris,	
  2010).	
  	
  
—  The	
  orders	
  of	
  the	
  two	
  tasks	
  were	
  rotated	
  
15
HMM	
  Model	
  of	
  Search	
  Process	
  
16
Approaches	
  for	
  Study	
  Search	
  Processes	
  
—  Analyze	
  qualitative	
  constructs	
  in	
  the	
  search	
  process	
  	
  
—  Kuhlthau,	
  1991;	
  Ellis,	
  1993;	
  Marchionini,	
  1995	
  
—  Search	
  pattern	
  analysis	
  based	
  on	
  logged	
  behavior	
  
—  Directly	
  use	
  logged	
  actions	
  (Holscher	
  &	
  Strube,	
  2000)	
  
—  Ignore	
  user	
  intentions	
  such	
  as	
  search	
  tactics	
  
—  Manually	
  code	
  search	
  tactics/strategies	
  on	
  log	
  data	
  (Xie&Joo,2010)	
  
—  Time-­‐consuming	
  and	
  need	
  a	
  theoretical	
  model	
  to	
  generate	
  codes	
  
—  Our	
  approach:	
  Hidden	
  Markov	
  Model	
  
—  Model	
  search	
  tactics	
  as	
  hidden	
  states	
  
—  An	
  automatic	
  approach	
  for	
  analyzing	
  time	
  sequential	
  events	
  
17
—  A	
  two-­‐level	
  view	
  of	
  search	
  process	
  
—  Higher	
  level:	
  search	
  states	
  as	
  hidden	
  search	
  tactics	
  or	
  strategies	
  
—  Lower	
  level:	
  observable	
  actions	
  
—  Parameters	
  in	
  HMM	
  
—  Number	
  of	
  hidden	
  states	
  N	
  (N	
  ≠	
  M)	
  
—  Transition	
  probabilities	
  among	
  any	
  two	
  hidden	
  states	
  
—  Emission	
  probability	
  from	
  each	
  state	
  to	
  each	
  action	
  
A Hidden Markov Model for Search States
Observable actions
Hidden search tactics or
strategies
Model	
  Search	
  States	
  using	
  HMM	
  
18
Transition	
  	
  
probabilitie
s	
  
Emission	
  	
  
probability	
  
	
  
!1	
   !2	
   !3	
   !%	
  
&1	
   &2	
   &3	
   &%	
  
Categorizing	
  Observable	
  Ac-ons	
  
Method

Search

Scan

Select

Capture

Communicate
Object

Query

Topic statement

Item in search result

Chat messages

List of saved items

Single saved item
Source

Self

Partner

Shared/mix
Inspired by Belkin’s ISSs model (Belkin, 1995) but with modifications to
accommodate the context in collaborative web search.
19
Actions Descriptions
Search	
  –	
  query	
  –	
  self	
  (Q)
A	
  user	
  issues	
  a	
  query
Select-­‐	
  item-­‐self	
  (V)
A	
  user	
  clicks	
  on	
  a	
  result	
  in	
  the	
  returned	
  result	
  list
Capture-­‐item-­‐self	
  (S)
A	
  user	
  saves	
  a	
  snippet	
  or	
  bookmarks	
  a	
  webpage
Scan-­‐list	
  of	
  saved	
  item	
  –	
  mixed	
  (Wm) A	
   user	
   checks	
   the	
   workspace	
   without	
   clicking	
   on	
  
any	
  particular	
  item.
Select	
  –	
  single	
  saved	
  item	
  –self	
  (Ws) A	
  user	
  clicks	
  on	
  an	
  item	
  in	
  the	
  workspace	
  saved	
  by	
  
him/herself
Select	
  –	
  single	
  saved	
  item	
  –	
  partner	
  (Wp) A	
  user	
  clicks	
  on	
  an	
  item	
  in	
  the	
  workspace	
  saved	
  by	
  
the	
  partner
Scan-­‐topic	
  -­‐shared	
  (T) A	
  user	
  clicks	
  on	
  the	
  topic	
  statement	
  for	
  view	
  
Communicate-­‐	
  messages-­‐self	
  (Cs) A	
  user	
  sends	
  a	
  message	
  to	
  the	
  other	
  user	
  
Communicate-­‐message-­‐partner	
  (Cp) A	
  user	
  receives	
  a	
  message	
  from	
  the	
  other	
  user
20
Ac-ons	
  Observed	
  in	
  this	
  Study	
  
Parameters	
  Selec-on	
  and	
  Es-ma-on	
  	
  
—  Determine	
  number	
  of	
  hidden	
  states	
  N	
  
—  Bayesian	
  Information	
  Criterion	
  (BIC)	
  
—  Parameter	
  estimations	
  
—  Baum-­‐Welch	
  algorithm:	
  maximize	
  data	
  likelihood	
  
—  Random	
  assign	
  a	
  start	
  probability	
  π	
  for	
  each	
  state,	
  	
  
—  Estimate	
  the	
  transition	
  probabilities	
  and	
  emission	
  probability	
  through	
  a	
  
machine	
  learning	
  process	
  	
  
21
6000
6500
7000
7500
8000
2 3 4 5 6 7
26000	
  
27500	
  
29000	
  
30500	
  
32000	
  
4	
   5	
   6	
   7	
   8	
   9	
   10	
  
IND: N=4
 COL: N=6
BIC=-2×logL+log(S)
×NP
log-likelihood (logL) 
Number of parameters
(NP) 
sample size(S)
HMM	
  Output	
  for	
  Individual	
  Search	
  
Query View Save
Workspace
self
Topic
HQ 0.99
HV 0.91
HS 0.96
HD 0.57 0.42
Hidden States and Emission Probabilities in IND (values<0.05 are omitted)
22
Search-related hidden states: HQ, HV, HS
HQ: hidden states of querying
HV: hidden states of viewing a search result
HS: hidden states of saving a search result
Sensemaking-related hidden states: HD
HD: hidden states of defining current search problem
Compare	
  to	
  Exis-ng	
  Search	
  Models	
  
0.39
0.12 HQ
0.33
HV
HS
0.50
HD
0.07
0.85 0.59
0.49 0.26
0.31
0.12 HQ
0.32
HV
HS
0.56
HD
0.13*
0.86
0.38
0.53
0.34
0.39 0.23
Sub-processes in the ISP model HMM
Define Problem HD
Select Source, Formulate Query,
Execute Query
HQ
Examine Results HV
Extract Information HS
Reflect/Iterate/Stop HD
Mapping from sub-process in Marchionini’s ISP
model to the hidden states
transition probabilities of hidden
states in IND
The default transition in Marchinoinini’s model can
be mapped to into HDàHQàHVàHSàHD,
which is also the pattern of the highest probability
in HMM results.
23
HMM	
  Outputs	
  in	
  Collabora-ve	
  Search	
  
Query View Save
Workspace
mix
Workspace
self
Workspace
partner
Topic Chat
send
Chat
receive
HQ 0.82 0.13
HV 0.87 0.1
HS 0.88
HD 0.36 0.36 0.21
HW 0.37 0.44 0.12
HC 0.44 0.47
Hidden States and Emission Probability in COL
Search-related hidden states: HQ, HV, HS
Sensemaking-related hidden states: HD, HW, HC
HD: hidden states of defining current search problem
HW: hidden states of implicit communication
HC: hidden states of explicit communication
(Paul & Morris, 2010): chat-centric sensemaking (HC) and workspace-
centric sensemaking (HW)
(Evans & Chi, 2008): search and sensemaking are tightly coupled

 24
Detec-ng	
  Task	
  Differences	
  using	
  HMM	
  
0.11
HQ
0.26
HV
HS
0.45
HD
0.06
0.86
0.56
0.31
0.08
0.46
0.30
HC HW
0.56
0.09 0.14
0.16
0.16
0.14
0.15
HQ
0.22
HV
HS
0.33**
HD
0.12
0.84
0.28**
0.35**
0.20**
0.42
0.24
0.13
HC
0.89
HW
0.48
0.06 0.30**
0.07**
0.33*
0.13**
Comparison of Transition Probabilities of Hidden states
in COL for the two tasks (red arrows indicate significant
difference: *p<0.05, **p<0.01)
Cross-category transitions:
From search to sensemaking
From sensemaking to search

Cross-category transitions is more
common in collaborative search
than in individual search

Cross-category transition is more
common in decision-making task
than in information gathering task.

25
Information-gathering
 Decision-making
Query	
  Behaviors	
  and	
  Communica-ons	
  
26
27
Search condition
(individual or
collaborative)
Task type 
(information-gathering
or decision-making)
Query vocabulary features
(number of queries, query vocabulary
richness, query diversity)
Query reformulation patterns
(New, Generalization, Specification,
Reconstruction)
Query performance
(Precision, recall, Successful query rate,
user satisfaction, cognitive load)
Research	
  Design	
  
Independent Variables
 Dependent Variables
Communication Timing
Communication Content
28
•  Query	
  Vocabulary	
  Richness	
  (QVR)	
  
•  Query	
  Diversity	
  (QD)	
  	
  
•  Levenshtein	
  distance	
  (Shah	
  &	
  González-­‐Ibáñez,	
  2011)	
  	
  	
  
•  calculate	
  the	
  difference	
  between	
  a	
  pair	
  of	
  queries.	
  
•  Query	
  Result	
  Similarity	
  (QRS)	
  	
  
Query	
  Vocabulary	
  Features	
  
(Kromer, Snasel, & Platos, 2008)
Query	
  Reformula-on	
  PaPerns	
  
29
Type Definition
New
(N)
Qi is the first query or does not share any
common terms with Qi-1
Generalization
(Ge)
Qi shares common terms with Qi-1 ; and Qi
contains fewer terms than Qi-1
Specialization
(Sp)
Qi shares common terms with Qi-1 ; and Qi
contains more terms than Qi-1
Reconstruction
(Rc)
Qi shares common terms with Qi-1 ; and Qi
has the same length as Qi-1
•  Qi-1 and Qi are two consecutive queries in the same search session
•  The patterns are defined based on (He et al. 2002; Jansen et al. 2009)
Query	
  performance	
  
30
Objective	
  
Measurements	
  
Precision	
  &	
  
Recall	
  
Successful	
  
Query	
  Rate	
  
Subjective	
  
Measurements	
  
User	
  
Satisfaction	
  
Cognitive	
  Load	
  
Communica-on	
  Timing	
  Analysis	
  
—  Before	
  search:	
  communications	
  before	
  the	
  first	
  search	
  
action	
  
—  During	
  search:	
  communications	
  between	
  the	
  first	
  and	
  last	
  
search	
  action	
  
—  After	
  search:	
  communications	
  after	
  the	
  last	
  search	
  action	
  
—  (Search	
  actions:	
  issuing	
  a	
  query,	
  viewing	
  or	
  saving	
  a	
  search	
  result)	
  
31
	
  Chat	
   	
  Chat	
   	
  Chat	
   	
  Chat	
  
Before	
  
Search
During	
  
Search
During	
  
Search
A<er	
  
Search
The first
search
action
The last
search
action
Search
action
Link to rational
Collabora-on	
  Benefit	
  I:	
  Rich	
  
Vocabularies	
  and	
  Diverse	
  Queries	
  
32
T2Task
IND
COL
0
0.5
1
1.5
2
2.5
3
T1 T2
QVR
Task
IND
COL
Query vocabulary richness Query diversity
0
5
10
15
20
25
30
T1 T2
QD
Task
IND
COL
.00
.02
.04
.06
.08
.10
T1 T2
QRS
Task
IND
COL
Chat	
  time	
   QVR	
  
Total	
  	
   ↑(p=0.045)	
  
Before	
  search	
   -­‐	
  
During	
  search	
   -­‐	
  
After	
  search	
   -­‐	
  
ü Participants in the collaborative search
were able to employ wider range of
vocabularies for the queries and the queries
were more diverse.
ü A positive correlation was found between
the total chat time and the query vocabulary
richness.
*Differences
showed in the
graph are
significant
Collabora-on	
  Cost	
  I:	
  Low	
  Recall	
  and	
  
Low	
  Successful	
  Query	
  Rate	
  
33
Recall Successful Query Rate
Chat	
  time	
   Recall	
  
Total	
  	
   ↓(p=0.001)	
  
Before	
  search	
   ↓(p=0.022)	
  
During	
  search	
   -­‐	
  
After	
  search	
   ↓(p<0.001)	
  
ü Collaboration takes times and efforts.
Participants had less time to devote to
search, and they were more stringent on
the what documents to save.
ü A negative correlation was found between
the communication and recall.
Collabora-on	
  Benefit	
  II:	
  People	
  More	
  
Sa-sfied	
  and	
  less	
  stressed	
  
34
User Satisfaction Cognitive Load
*Differences
showed in the
graph are
significant
Chat	
  time	
   Satisfaction	
   CogLoad	
  
Task	
  social	
   ↑(p=0.017)	
   ↓(p=0.022)	
  
Task	
  
coordination	
  
-­‐	
  
-­‐	
  
Task	
  content	
  	
   -­‐	
   ↑(p=0.008)	
  
Non-­‐task	
   -­‐	
   -­‐	
  
ü Participants in collaborative search are
more satisfied with the performance and
have lower cognitive load.
ü A positive correlation was found between
the task social communication and
satisfaction (negative for Cogload).
Collabora-on	
  Affects	
  the	
  PaPerns	
  of	
  
Query	
  Reformula-on	
  
35
0.1
0.2
0.3
0.4
0.5
IND COL
New
Generalization
Specification
Reconstruction
0.1
0.2
0.3
0.4
0.5
Task	
  1 Task	
  2
New
Generalization
Specification
Reconstruction
ü Higher percentages of New and Specialization and lower percentage of
Reconstruction in the collaborative search. 
ü Participants in collaborative search were able to explore the divided
subtopics in depth while the participant in the individual search owns the
entire search topic and the scope maybe the first priority.
Communica-on	
  PaPerns	
  
36
Proportion of each communication content type within stage (Left: T1; Right: T2)
ü Communication is common in any of three stages. 
ü The communication content varies in the three stages. 
The before search stage communication is more focused on the task coordination. 
The during search stage communication is more focused on the task content. 
Task social communication is more common in the before search and after search
stage than in the during search stage.
Implica-ons	
  
37
What	
  We	
  Learned	
  
—  Collaborative	
  search	
  process	
  have	
  patterns	
  
—  More	
  collaboration-­‐oriented	
  actions	
  as	
  the	
  collaboration	
  level	
  
increase	
  
—  Transitions	
  within	
  search-­‐oriented	
  actions	
  and	
  within	
  
collaboration-­‐oriented	
  actions	
  are	
  more	
  frequent	
  than	
  
between	
  them	
  in	
  all	
  three	
  conditions.	
  	
  
—  Explicit	
  	
  and	
  implicit	
  communication	
  has	
  potential	
  
benefit	
  on	
  helping	
  using	
  generating	
  query	
  ideas.	
  
38
Implica-ons	
  for	
  Collabora-ve	
  Search	
  
Research	
  
—  Search	
  activities	
  and	
  sensemaking	
  activities	
  are	
  tightly	
  
coupled	
  in	
  the	
  collaborative	
  search.	
  
—  The	
  studies	
  of	
  collaborative	
  search	
  should	
  not	
  just	
  
concentrate	
  on	
  the	
  effectiveness	
  of	
  search,	
  but	
  also	
  on	
  
the	
  users’	
  perception	
  of	
  their	
  search	
  experiences,	
  
particularly	
  their	
  satisfaction	
  and	
  cognitive	
  load.	
  
—  The	
  wider	
  range	
  of	
  query	
  vocabulary	
  in	
  collaborative	
  
search	
  did	
  not	
  necessarily	
  lead	
  to	
  a	
  more	
  effective	
  
search	
  outcomes.	
  
39
Implica-ons	
  for	
  Collabora-ve	
  
Search	
  System	
  Design	
  
—  It’s	
  important	
  to	
  design	
  interface-­‐mediated	
  support	
  for	
  
the	
  coordination	
  among	
  team	
  members	
  as	
  the	
  
coordination	
  through	
  communication	
  is	
  costly.	
  
—  Provide	
  targeted	
  algorithm-­‐mediated	
  query	
  
suggestions	
  based	
  on	
  the	
  findings	
  of	
  how	
  users	
  
reformulate	
  queries	
  in	
  the	
  collaborative	
  search.	
  	
  
—  Designers	
  need	
  to	
  make	
  a	
  balance	
  between	
  the	
  support	
  
for	
  fulfilling	
  the	
  search	
  task	
  and	
  the	
  support	
  for	
  social	
  
interactions	
  among	
  team	
  members.	
  
40
Implica-ons	
  for	
  Other	
  Researchers	
  
—  Researchers	
  in	
  CSCW	
  and	
  CSCL	
  can	
  draw	
  insights	
  on	
  
—  Sense-­‐making	
  intertwined	
  with	
  the	
  activities	
  that	
  directly	
  aim	
  for	
  
fulfilling	
  the	
  task	
  requirements.	
  	
  
—  Consider	
  the	
  social	
  gain	
  and	
  emotional	
  support	
  when	
  evaluating	
  the	
  
team	
  effectiveness.	
  	
  
—  The	
  findings	
  on	
  the	
  differences	
  between	
  information-­‐gathering	
  and	
  
decision-­‐making	
  collaboration	
  tasks.	
  
	
  
—  Researchers,	
  try	
  HMM	
  if	
  you	
  
—  are	
  analyzing	
  complex	
  interactive	
  process	
  in	
  a	
  time	
  sequence	
  
—  don’t	
  have	
  a	
  theoretical	
  model	
  to	
  start	
  with	
  
—  want	
  to	
  see	
  the	
  patterns	
  of	
  your	
  data	
  before	
  applying	
  time-­‐consuming	
  
qualitative	
  annotation	
  
—  are	
  interested	
  in	
  the	
  hidden	
  strategies	
  or	
  tactics	
  underneath	
  the	
  
observable	
  actions	
  of	
  users	
  
41
Privacy	
  Concerns	
  in	
  
Social	
  Match-­‐based	
  
People	
  Search
Collaborate with Shuguang Han, Zhen Yue
43
Document	
  Retrieval	
  
44
People	
  Retrieval	
  
Social	
  Match	
  
is	
  Important	
  in	
  
People	
  Search	
  
45
because a tighter social similarity
make it easier for people to
connect

Then

Need the users’ social networks
to return the potential candidates
who have either direct or indirect
connections with the given
users.
But	
  Privacy	
  is	
  a	
  BIG	
  Concern	
  
46
—  users	
  in	
  many	
  social	
  network	
  
services	
  often	
  either	
  opt	
  out	
  
from	
  certain	
  social	
  networks	
  
or	
  provide	
  incomplete	
  or	
  even	
  
fake	
  information	
  on	
  those	
  
networks.	
  	
  
—  many	
  data	
  mining	
  algorithms	
  
may	
  not	
  work	
  or	
  even	
  harm	
  
the	
  user	
  experience	
  when	
  
equipped	
  with	
  such	
  
incomplete	
  and	
  noisy	
  social	
  
information	
  
People	
  Search	
  Use	
  Co-­‐Author	
  Network	
  
47
Which	
  has	
  the	
  advantage	
  	
  
of	
  lacking	
  privacy	
  concerns	
  	
  
	
  
But	
  this	
  limits	
  
	
  
the	
  type	
  of	
  people	
  search	
  
being	
  studied,	
  
	
  
So	
  should	
  study	
  
	
  
other	
  social	
  networks	
  
which	
  has	
  privacy	
  concerns	
  
	
  
	
  	
  
Our	
  Goals	
  
—  interested	
  in	
  the	
  privacy	
  related	
  issues	
  in	
  people	
  search	
  
and	
  the	
  impacts	
  of	
  these	
  issues	
  on	
  the	
  performance	
  of	
  
people	
  search	
  systems.	
  	
  
—  users	
  in	
  many	
  social	
  network	
  services	
  are	
  able	
  to	
  keep	
  both	
  their	
  
profile	
  and	
  social	
  connections	
  private	
  
—  we	
  focus	
  on	
  the	
  privacy	
  issues	
  of	
  sharing	
  social	
  connections	
  
—  simulating	
  the	
  privacy-­‐concerned	
  social	
  network	
  by	
  
using	
  the	
  public	
  available	
  coauthor	
  networks.	
  
—  Critical	
  need	
  for	
  privacy-­‐concerned	
  social	
  network	
  as	
  test	
  bed	
  
—  Difficult	
  to	
  finding	
  an	
  open	
  privacy-­‐concerned	
  social	
  network	
  or	
  very	
  
expensive	
  to	
  such	
  a	
  network	
  from	
  scratch	
  for	
  research	
  purpose,	
  	
  
48
Key	
  Assump-ons	
  
—  a	
  coauthor	
  network	
  is	
  the	
  same	
  or	
  similar	
  with	
  a	
  
privacy-­‐concerned	
  social	
  network,	
  because	
  studies	
  
show	
  
—  many	
  real-­‐world	
  social	
  networks	
  (including	
  coauthor	
  networks	
  and	
  
many	
  other	
  privacy-­‐concerned	
  networks	
  such	
  as	
  Facebook	
  social	
  
networks)	
  share	
  the	
  same	
  patterns	
  
—  All	
  small-­‐world	
  networks	
  and	
  their	
  degree	
  distributions	
  are	
  highly	
  skewed.	
  	
  
—  assortative	
  patterns	
  (the	
  preferences	
  of	
  connecting	
  people	
  who	
  share	
  
the	
  similar	
  features)	
  of	
  social	
  networks	
  are	
  all	
  assortatively	
  mixed,	
  	
  
—  whereas	
  the	
  technological	
  and	
  biological	
  seems	
  to	
  be	
  disassortative.	
  
—  so	
  studying	
  academic	
  coauthor	
  networks,	
  which	
  are	
  
publically	
  available,	
  can	
  be	
  the	
  surrogate	
  for	
  studying	
  
privacy-­‐	
  preserving	
  social	
  networks	
  	
  
49
Research	
  Focuses	
  
—  Identify	
  two	
  types	
  of	
  features,	
  both	
  used	
  in	
  people	
  
search	
  
—  Global	
  network	
  features:	
  the	
  features	
  that	
  are	
  propagated	
  through	
  the	
  
whole	
  networks	
  	
  
—  measured	
  by	
  the	
  PageRank	
  value	
  running	
  on	
  the	
  whole	
  social	
  networks	
  
—  Local	
  network	
  features:	
  the	
  features	
  that	
  are	
  directly	
  related	
  to	
  the	
  
ego-­‐network	
  of	
  the	
  querying	
  user	
  	
  
—  measured	
  by	
  the	
  proportion	
  of	
  common	
  social	
  connections	
  
50
Research	
  Ques-ons	
  
—  RQ1:	
  How	
  to	
  properly	
  simulate	
  different	
  types	
  of	
  
privacy-­‐preserving	
  social	
  networks?	
  	
  
—  RQ2:	
  How	
  does	
  each	
  privacy-­‐preserving	
  network	
  affect	
  
the	
  global	
  and	
  local	
  network	
  features?	
  	
  
—  RQ3:	
  How	
  does	
  the	
  obtained	
  global	
  and	
  local	
  network	
  
features	
  further	
  affect	
  the	
  people	
  search	
  performance?	
  
That	
  is,	
  what	
  are	
  the	
  impacts	
  of	
  these	
  features	
  derived	
  
from	
  privacy-­‐preserving	
  networks	
  on	
  the	
  search	
  process	
  
of	
  finding	
  the	
  best	
  candidates	
  when	
  comparing	
  with	
  the	
  
use	
  of	
  full	
  network	
  information?	
  	
  
51
Data	
  
—  Academic	
  publication	
  collection	
  
—  containing	
  219,677	
  conference	
  papers	
  from	
  the	
  ACM	
  Digital	
  Library.	
  	
  
—  between	
  1990	
  and	
  2013.	
  	
  
—  Only	
  public	
  available	
  information	
  of	
  a	
  paper:	
  the	
  title,	
  abstract	
  and	
  authors	
  
—  No	
  further	
  author	
  disambiugation	
  besides	
  ACM	
  Digital	
  Library	
  author	
  ID	
  
—  In	
  total,	
  the	
  collection	
  contains	
  253,390	
  unique	
  authors	
  and	
  953,685	
  coauthor	
  
connection	
  instances.	
  
—  Users’	
  people	
  search	
  activities:	
  Han	
  et	
  al.	
  [5]	
  evaluation	
  of	
  a	
  people	
  
search	
  system.	
  	
  
—  four	
  different	
  people	
  search	
  tasks,	
  each	
  aimed	
  to	
  search	
  for	
  5	
  candidates.	
  	
  
—  A	
  baseline	
  plain	
  content-­‐based	
  people	
  search	
  system	
  	
  
—  An	
  experimental	
  system	
  that	
  enhances	
  people	
  search	
  with	
  three	
  interactive	
  facets:	
  
content	
  relevance,	
  social	
  similarity	
  between	
  the	
  user	
  and	
  a	
  candidate	
  (the	
  local	
  
network	
  feature)	
  and	
  the	
  authority	
  of	
  a	
  candidate	
  (the	
  global	
  network	
  feature).	
  	
  
—  The	
  experiment	
  system	
  allowed	
  the	
  querying	
  users	
  to	
  tune	
  the	
  value	
  associated	
  with	
  each	
  facet	
  
in	
  order	
  to	
  generate	
  a	
  better	
  candidate	
  search	
  results.	
  	
  
—  24	
  participants	
  were	
  recruited	
  for	
  the	
  user	
  study.	
  	
  
—  At	
  the	
  beginning	
  of	
  the	
  user	
  study,	
  each	
  participant	
  was	
  asked	
  to	
  provide	
  their	
  publications	
  and	
  
their	
  close	
  social	
  connections	
  (such	
  as	
  advisors).	
  	
  
—  In	
  the	
  post-­‐task	
  questionnaire,	
  the	
  participants	
  were	
  asked	
  to	
  rate	
  the	
  relevance	
  of	
  each	
  marked	
  
candidate	
  in	
  a	
  Five-­‐point	
  Likert	
  scale	
  (1	
  as	
  non-­‐relevant	
  and	
  5	
  as	
  the	
  highly	
  relevant).	
  
52
Formulas	
  
53
pi is the probability of a
given user has privacy
concern, di is the degree of
association of a user and
dmax. Is the maximized
degree in the network, λ
helps to establish different
selection strategies 
Mean Absolute Error (MAE) between new authority
and ground-truth authority over all of the authors
Results	
  
54
Results	
  
55
Results	
  
56
Results	
  
57
Insights	
  
—  Both	
  the	
  local	
  and	
  global	
  network	
  features	
  are	
  important	
  for	
  the	
  
performance	
  of	
  people	
  search	
  (compare	
  to	
  not	
  using	
  social	
  
network).	
  	
  
—  Comparing	
  to	
  the	
  global	
  network	
  feature,	
  the	
  local	
  network	
  feature	
  is	
  more	
  
important.	
  	
  
—  Privacy-­‐concerns	
  reflected	
  in	
  local	
  and	
  global	
  network	
  features	
  can	
  
significantly	
  influences	
  on	
  the	
  performance	
  of	
  people	
  search	
  
—  The	
  privacy	
  concerns	
  from	
  the	
  high-­‐degree	
  candidates	
  in	
  the	
  network	
  will	
  have	
  
more	
  impacts	
  on	
  global	
  features.	
  	
  
—  The	
  local	
  network	
  feature	
  is	
  related	
  to	
  both	
  the	
  querying	
  users	
  and	
  the	
  candidates	
  
in	
  the	
  networks.	
  	
  
—  the	
  privacy	
  concerns	
  from	
  both	
  of	
  them	
  have	
  significant	
  impact	
  on	
  the	
  people	
  search	
  
performance.	
  	
  
—  The	
  privacy	
  concerns	
  from	
  high-­‐degree	
  candidates	
  have	
  bigger	
  influences	
  on	
  the	
  people	
  
search	
  than	
  that	
  of	
  the	
  lower-­‐degree	
  candidates,	
  especially	
  when	
  those	
  high-­‐degree	
  
candidates	
  are	
  related	
  to	
  the	
  querying	
  user.	
  	
  
—  We	
  also	
  find	
  that	
  if	
  the	
  querying	
  users	
  provide	
  more	
  social	
  connections,	
  the	
  search	
  
performance	
  would	
  increase	
  steadily.	
  
58
Scholars	
  on	
  Academic	
  
Social	
  Network	
  
Services
Collaborate with Wei Jeng and Jiepu Jiang
Formal scholarly communication describes activities or scholarly
outcomes that can be viable over time to an extended audience.
This availability over long periods of time, also known as
permanent access, traditionally referred to publications in books
or peer-reviewed journals.
Informal scholarly communication is made scholarly outcomes
“available to a restricted audience only” (as cited in Borgman,
2007, p. 49), such as self-publishing, Listerv, mails, or a “coffee
break” in a conferences where scholars can exchange
information.
Academic	
  Social	
  Networking	
  Service	
  
(ASNS)	
  
—  The	
  term	
  academic	
  social	
  networking	
  service	
  as	
  a	
  broad	
  
term	
  that	
  refers	
  to	
  an	
  online	
  service,	
  tool,	
  or	
  platform	
  
that	
  can	
  help	
  scholars	
  to	
  build	
  their	
  professional	
  
networks	
  with	
  other	
  researchers	
  and	
  facilitate	
  their	
  
various	
  activities	
  when	
  conducting	
  research.	
  	
  
—  Some	
  well-­‐known	
  examples	
  of	
  ASNSs	
  include	
  	
  
—  ResearchGate.net	
  (http://www.researchgate.net/)	
  
—  Academia.edu	
  (http://www.academia.edu/)	
  	
  
—  Mendeley.com	
  (http://www.mendeley.com/)	
  
Features	
  on	
  ASNSs	
  
—  ASNSs	
  allow	
  users	
  to	
  	
  
—  create	
  profiles	
  with	
  academic	
  properties	
  
—  upload	
  theirs	
  publications	
  
—  create	
  online	
  groups	
  	
  
—  Some	
  ASNS,	
  such	
  as	
  Mendeley	
  and	
  Zotero,	
  even	
  offer	
  
software	
  applications,	
  such	
  as	
  bibliographic	
  tools	
  to	
  
support	
  scholars	
  in	
  managing	
  their	
  documents	
  and	
  
citations.	
  
If academic social network sites are providing an
alternative channel to support informal scholarly
communication,
then it is important to study:
What the implications we can
learn from analyzing academic
users’ actual usages on those
sites.
65
Our	
  Research	
  Ques-ons	
  
— RQ1:	
  Who	
  are	
  the	
  users	
  of	
  an	
  academic	
  social	
  
networking	
  service	
  (ASNS)	
  that	
  supports	
  open	
  
groups?	
  	
  
— RQ2:	
  In	
  what	
  ways,	
  and	
  how	
  often,	
  do	
  such	
  
group	
  participants	
  use	
  an	
  ASNS?	
  	
  
— RQ3:	
  What	
  motivates	
  ASNS	
  users	
  to	
  utilize	
  
social	
  or	
  research	
  features	
  on	
  an	
  ASNS?	
  	
  
Research	
  Site:	
  Mendeley	
  
—  Launched	
  in	
  2008,	
  Mendeley	
  (http://
www.mendeley.com/)	
  is	
  one	
  of	
  the	
  most	
  popular	
  ASNSs	
  
and	
  has	
  more	
  than	
  two	
  million	
  users.	
  
—  Mendeley	
  allows	
  users	
  to	
  build	
  their	
  own	
  digital	
  
research	
  library	
  by	
  importing	
  PDF	
  files	
  from	
  their	
  local	
  
devices.	
  
—  There	
  are	
  three	
  common	
  ways	
  to	
  use	
  social	
  features	
  on	
  
Mendeley:	
  maintain	
  a	
  profile,	
  manage	
  existing	
  contacts,	
  
and	
  make	
  more	
  connections.	
  
The	
  profile	
  page	
  
The	
  group	
  page	
  
Our	
  Samples:	
  Mendeley	
  Group	
  
—  Mendeley	
  allows	
  users	
  to	
  start	
  groups	
  to	
  share	
  what	
  
they	
  are	
  interested	
  in	
  and	
  what	
  they	
  are	
  reading	
  about.	
  	
  
—  Two	
  types	
  of	
  groups	
  supported	
  on	
  the	
  site:	
  	
  
—  private	
  groups	
  that	
  are	
  only	
  visible	
  to	
  the	
  members;	
  
—  public	
  groups	
  that	
  are	
  publicly	
  visible	
  and	
  can	
  be	
  searched	
  in	
  
Mendeley’s	
  group	
  list.	
  
—  We	
  adopted	
  a	
  representative	
  sampling	
  method	
  to	
  
identify	
  Mendeley’s	
  large	
  open	
  group	
  users.	
  	
  
Method	
  and	
  data	
  collec-ng	
  
—  We	
  chose	
  a	
  cross-­‐sectional	
  survey	
  as	
  the	
  research	
  
method	
  to	
  answer	
  these	
  research	
  questions	
  and	
  
developed	
  a	
  questionnaire	
  with	
  30	
  questions.	
  	
  
—  The	
  questionnaire	
  was	
  distributed	
  to	
  97	
  open	
  groups	
  in	
  
Mendeley,	
  one	
  of	
  the	
  most	
  popular	
  ASNSs.	
  
The	
  instrument-­‐I	
  
—  Basic	
  Information	
  	
  
—  The	
  Extent	
  of	
  Use:	
  Questions	
  that	
  aim	
  to	
  determine	
  the	
  
extent	
  of	
  participants’	
  account	
  activities	
  on	
  Mendeley	
  
—  Common	
  Ways	
  to	
  Use:	
  	
  
—  as	
  a	
  document	
  management	
  tool,	
  	
  
—  a	
  reference	
  manager,	
  	
  
—  a	
  scholarly	
  search	
  engine,	
  	
  
—  an	
  online	
  portfolio,	
  	
  
—  a	
  friend	
  management	
  tool,	
  and	
  	
  
—  a	
  socialization	
  tool	
  
The	
  instrument-­‐II	
  
—  The	
  Extent	
  of	
  Group	
  Use	
  
—  Motivation:	
  
—  Information	
  	
  
—  Networking	
  
—  Visibility	
  
—  Altruistic	
  
Result:	
  Overview	
  
—  We	
  received	
  188	
  responses	
  via	
  the	
  questionnaire,	
  but	
  
only	
  146	
  users	
  completed	
  the	
  entire	
  questionnaire.	
  	
  
—  The	
  average	
  age	
  of	
  the	
  participants	
  was	
  35.04	
  years	
  
(SD=10.81),	
  and	
  64%	
  of	
  them	
  were	
  male	
  (N=94).	
  
—  We	
  obtained	
  responses	
  from	
  users	
  in	
  20	
  disciplines	
  in	
  
Mendeley.	
  	
  
—  The	
  top	
  three	
  disciplines	
  represented	
  were	
  computer	
  
and	
  information	
  science	
  (N=43),	
  biological	
  science	
  
(N=24),	
  and	
  social	
  science	
  (N=17).	
  	
  
The	
  Par-cipants	
  
— Distribution	
  of	
  academic	
  disciplines	
  
—  Early	
  adaptor:	
  Biomedicine	
  users	
  	
  
—  Newcomers:	
  Social	
  Sciences	
  
—  Lack	
  of	
  humanities,	
  literature,	
  philosophy,	
  and	
  design	
  users	
  
— Distribution	
  of	
  academic	
  positions	
  
Frequency	
  of	
  Account	
  Ac-vi-es	
  	
  
ASNS	
  users	
  are	
  not	
  as	
  active	
  as	
  general	
  SNS	
  users:	
  
—  53%	
  of	
  respondents	
  visited	
  their	
  accounts	
  on	
  a	
  weekly	
  
basis,	
  while	
  36%	
  of	
  them	
  accessed	
  the	
  site	
  at	
  least	
  once	
  
per	
  month.	
  
—  Also	
  more	
  than	
  half	
  (53%)	
  of	
  the	
  participants	
  reported	
  
that	
  they	
  were	
  checking	
  the	
  news	
  feeds	
  only	
  on	
  a	
  
monthly	
  basis.	
  
Ways	
  to	
  Use	
  Mendeley	
  
—  Participants	
  primarily	
  used	
  Mendeley	
  as	
  	
  
—  a	
  document	
  management	
  tool	
  
—  citation	
  management	
  software	
  
—  The	
  portion	
  of	
  those	
  using	
  Mendeley	
  as	
  a	
  social	
  
networking	
  site	
  was	
  relatively	
  low:	
  Only	
  11%	
  of	
  
respondents	
  used	
  Mendeley	
  to	
  manage	
  their	
  existing	
  
academic	
  friends	
  and	
  to	
  expand	
  their	
  professional	
  
networks.	
  	
  
—  These	
  results	
  indicate	
  that	
  most	
  of	
  our	
  participants	
  
use	
  Mendeley	
  for	
  its	
  research	
  features,	
  rather	
  than	
  
social	
  features.	
  	
  
Mo-va-ons	
  for	
  Joining	
  Groups	
  
Mo-va-on	
   Motivation	
  Items	
  	
   M	
  
Information	

 Keep	
  up	
  with	
  a	
  user’s	
  research	
  domain	

 4.30
Get	
  research-­‐related	
  questions	
  answered	

 3.41
Follow	
  topics	
  that	
  community	
  is	
  paying	
  attention	
  
to	

3.98
Networking	

 Connect	
  with	
  people	
  who	
  have	
  similar	
  
research	
  interests	
  	

3.91
Expand	
  current	
  social	
  network	

 3.23
Meet	
  more	
  academic	
  people	

 3.27
Keep	
  in	
  touch	
  with	
  people	
  one	
  already	
  knows	

 3.08
Visibility
	

Gain	
  professional	
  visibility	

 3.48
Be	
  present	
  in	
  current	
  discussions	

 3.27
Altruistic	
   Contribute	
  to	
  the	
  reading	
  list	
   3.62
Table: Users’ motivation in terms of joining a Mendeley group created by others.
Factors	
  that	
  may	
  influence	
  on	
  the	
  outcome	
  of	
  
group	
  joining	
  mo-ves	
  
—  Results	
  suggest	
  that	
  people	
  are	
  most	
  motivated	
  by	
  
visibility	
  and	
  altruism	
  when	
  considering	
  whether	
  to	
  
join	
  or	
  follow	
  more	
  groups.	
  
—  Even	
  if	
  users	
  frequently	
  and	
  regularly	
  engaged	
  in	
  
research-­‐based	
  activities	
  on	
  Mendeley	
  (such	
  as	
  a	
  
document	
  management	
  or	
  citation	
  management	
  tool)	
  ,	
  
it	
  would	
  not	
  make	
  any	
  difference	
  in	
  terms	
  of	
  their	
  
intentions	
  of	
  joining	
  groups.	
  
Insights	
  Based	
  on	
  the	
  Findings	
  
— What	
  is	
  Mendeley	
  exactly?	
  	
  
— Discipline	
  Distribution	
  and	
  Development	
  on	
  
Mendeley	
  
— Academic	
  Networking	
  or	
  Social	
  Networking?	
  
— The	
  incentives	
  of	
  group	
  joining	
  
Mendeley:	
  A	
  Plaorm	
  for	
  Higher	
  
Educa-on	
  Users	
  
—  Our	
  findings	
  confirmed	
  that	
  the	
  majority	
  of	
  Mendeley	
  
users	
  were	
  from	
  the	
  higher	
  education	
  environment.	
  
More	
  specifically,	
  junior	
  researchers	
  (i.e.,	
  doctoral	
  
students,	
  post-­‐doctoral	
  fellows,	
  and	
  graduate	
  students).	
  	
  
—  For	
  those	
  researchers	
  who	
  would	
  like	
  to	
  study	
  junior	
  
scholars’	
  information	
  behaviors	
  or	
  run	
  a	
  survey	
  on	
  a	
  
wide	
  range	
  of	
  online	
  scholars,	
  we	
  believe	
  that	
  an	
  ASNS	
  
such	
  as	
  Mendeley	
  are	
  the	
  right	
  platforms	
  to	
  use	
  to	
  
reach	
  those	
  types	
  of	
  participants.	
  	
  
Discipline	
  Distribu-on	
  and	
  
Development	
  on	
  Mendeley	
  
—  Our	
  results	
  show	
  that	
  the	
  discipline	
  development	
  in	
  
Mendeley	
  is	
  uneven.	
  	
  
—  Early	
  users	
  in	
  Mendeley	
  groups	
  mostly	
  came	
  from	
  the	
  
fields	
  of	
  computer	
  &	
  information	
  science	
  and	
  
biomedicine,	
  whereas	
  more	
  recent	
  users	
  are	
  mostly	
  
from	
  the	
  fields	
  of	
  social	
  science,	
  education	
  and	
  
psychology.	
  	
  
—  We	
  do	
  not	
  see	
  many	
  group	
  users	
  from	
  the	
  humanities	
  
and	
  other	
  related	
  fields.	
  	
  
Academic	
  Networking	
  vs.	
  Social	
  
Networking	
  
—  “Academic”	
  but	
  not	
  “Social”?	
  
—  Users	
  of	
  Mendeley	
  seem	
  to	
  mainly	
  concentrate	
  on	
  the	
  
utilities	
  directly	
  related	
  to	
  their	
  research	
  work,	
  while	
  
mostly	
  ignoring	
  its	
  social	
  features,	
  such	
  as	
  “friend	
  
making”.	
  
—  Warning	
  for	
  ASNS	
  developers:	
  Do	
  not	
  simply	
  adopt	
  
“Facebook-­‐like”	
  or	
  “LinkedIn-­‐like”	
  social	
  elements	
  
when	
  designing	
  an	
  ASNS	
  platform	
  for	
  academic	
  users.	
  
Join	
  a	
  Group?	
  Show	
  Me	
  the	
  Incen-ve.	
  
—  A	
  gap	
  between	
  motivation	
  and	
  incentive:	
  The	
  altruistic	
  
motivation	
  was	
  one	
  of	
  the	
  most	
  critical	
  reasons	
  
associated	
  with	
  their	
  group	
  engagement,	
  yet	
  none	
  of	
  
current	
  features	
  of	
  Mendeley	
  reward	
  scholars	
  for	
  their	
  
altruistic	
  activities.	
  
—  Possible	
  incentive	
  mechanisms	
  to	
  encourage	
  
interactions	
  :	
  
—  providing	
  of	
  affective	
  feedback	
  by	
  group	
  owners	
  or	
  members,	
  to	
  users	
  
who	
  contributed	
  
—  establishing	
  level-­‐based	
  honor	
  system	
  or	
  badging	
  system	
  
Limita-ons	
  
—  We	
  sampled	
  only	
  open	
  and	
  large	
  groups	
  on	
  Mendeley	
  
instead	
  of	
  a	
  random	
  sample.	
  	
  
—  The	
  discipline	
  and	
  position	
  distribution	
  of	
  the	
  
participants	
  may	
  be	
  biased	
  towards	
  users	
  who	
  are	
  the	
  
use	
  group	
  of	
  social	
  feature	
  and	
  highly	
  engaged	
  in	
  group	
  
activities.	
  
—  If	
  researchers	
  would	
  like	
  to	
  investigate	
  the	
  wider	
  
landscape	
  of	
  ASNS	
  users,	
  larger-­‐scaled	
  and	
  random	
  
sampling	
  approaches	
  are	
  needed.	
  
Closing	
  Remarks
87
Challenges	
  
Challenges	
  
Challenges	
  
Challenges	
  
—  Know	
  the	
  boundary	
  of	
  
Social	
  Information	
  
Access	
  
—  How	
  to	
  identify	
  which	
  tasks	
  
are	
  good	
  for	
  social	
  
information	
  access?	
  
—  How	
  to	
  effectively	
  integrate	
  
social	
  networking,	
  direct	
  
messaging,	
  and	
  social	
  
recommendations	
  with	
  
current	
  search	
  facilities.	
  
Related	
  Publica-ons	
  
—  Z.	
  Yue,	
  S.	
  Han,	
  D.	
  He.	
  J.	
  Jiang.	
  Influences	
  on	
  Query	
  Reformulation	
  
in	
  Collaborative	
  Web	
  Search.	
  IEEE	
  Compute,	
  2014	
  (3):46-­‐53.	
  
—  Z.	
  Yue,	
  S.	
  Han,	
  D.	
  He.	
  Modeling	
  Search	
  Processes	
  using	
  Hidden	
  
States	
  in	
  Collaborative	
  Exploratory	
  Web	
  Search.	
  The	
  17th	
  ACM	
  
Conference	
  on	
  Computer	
  Supported	
  Cooperative	
  Work	
  and	
  Social	
  
Computing	
  (CSCW	
  2014).	
  
—  Z.	
  Yue,	
  S.	
  Han,	
  D.	
  He.	
  An	
  Investigation	
  on	
  the	
  Query	
  Behavior	
  in	
  
Task-­‐based	
  Collaborative	
  Exploratory	
  Web	
  Search.	
  The	
  76th	
  
Annual	
  Meeting	
  of	
  the	
  Association	
  for	
  Information	
  Science	
  and	
  
Technology.	
  (ASIST	
  2013).	
  
—  Jeng,	
  W.,	
  He,	
  D.,	
  &	
  Jiang,	
  J.	
  (In	
  press).	
  User	
  Participation	
  in	
  an	
  
Academic	
  Social	
  Networking	
  Service:	
  A	
  Survey	
  of	
  Open	
  Group	
  
Users	
  on	
  Mendeley.	
  Journal	
  of	
  the	
  Association	
  for	
  Information	
  
Science	
  and	
  Technology.	
  
92
93
Really Tough Questions 
Please!!!
Acknowledgement	
  	
  
—  The	
  work	
  presented	
  here	
  were	
  conducted	
  by	
  faculty	
  and	
  
students	
  in	
  Information	
  Retrieval,	
  Integration	
  and	
  
Synthesis	
  Lab	
  at	
  School	
  of	
  Information	
  Sciences	
  
—  Other	
  people	
  participated	
  in	
  these	
  works	
  are	
  
—  Prof.	
  Peter	
  Brusilovsky,	
  Prof	
  Dan	
  Wu	
  etc.	
  
—  These	
  work	
  are	
  partially	
  supported	
  by	
  the	
  National	
  
Science	
  Foundation	
  

Social Information Access: A Personal Update

  • 1.
         Social  Informa-on  Access   -­‐-­‐  a  Personal  Update 2014/6/18
  • 2.
    Agenda   2 Collaborative ExploratorySearch Privacy Concerns in Social-based People Search Some Reflections Conclusions Social Information Access Scholars in Academic Social Networks
  • 3.
    Informa-on  Access   — Information  Access:  an  interactive  process  starts  with  a   user  noticing  his/her  needs  and  ends  with  the  user   obtaining  the  necessary  information   —  Iterative,  multiple  stages,  many  back  loops  
  • 4.
    User  Generated  Content   Social  Networks  
  • 5.
    Social  Informa-on  Access   —  Social  Information  Access:  information   access  using  “community  wisdom”     —  distilled  from  the  actions  in  real/virtual   community     —  Collaboration  in  explicit  or  implicit  manner   —  Social  information  access  technologies   capitalize  on  the  natural  tendency  of   people  to  follow  direct  and  indirect  cues   of  others’  activities   —  going  to  a  restaurant  that  seems  to  attract  many   customers,  or   —  asking  others  what  movies  to  watch.  
  • 6.
    Space  of  Social  Informa-on  Access   —  [Brusilovsky2012]’s    taxonomy  for  social  info  access   6
  • 7.
    More  Social  Informa-on  Access   —  Collaboration  can  be  explicit,  not  just  implicit   —  Explicit  Collaboration:  users  work  as  a  team  to  complete  the  same   task   —  Issues:  How  to  model  collaboration?   7 Implicit Collaboration Explicit Collaboration
  • 8.
    More  Social  Informa-on  Access   —  Target  can  be  people,  and  people’s  social  connections     are  important   —  Relationship  is  as  important  as  the  documents  generated  by  the  people   —  Issues:  What  are  the  impacts  of  privacy  concerns?   8
  • 9.
    More  Social  Informa-on  Access   —  User  generated  content  can  be  generic,  or  academic   —  E-­‐Science  and  CyberScholarship  are  increasingly  popular   —  Issues:  What  are  scientists  doing  online?   9 Popular Social Networks Academic Online Social Networks
  • 10.
    Explicit   Collaboration:   Collaborative   Exploratory  Search Collaborate with Zhen Yue, Shuguang Han
  • 11.
    Collabora-ve  Exploratory  Search   11 —  Daily  collaborative  Web   search  increased   dramatically   —  0.9%  in  2006  -­‐>  11%  in  2012   (Morris,  2013)   Complex  interactions  -­‐>   difficult  to  study   collaborative  search   processes  
  • 12.
    —  Many  studies  on  individual  search  process     —  Well-­‐established  models  such  as  Kuhlthau’s  model  (Kuhlthau,  1991)   and  Marchionini’s  model  (Marchionini,  1995)   —  Many  studies  on  analyzing  the  transition  patterns  of  actions  (Chen   &  Cooper,  2002),  search  tactics  (Xie  &  Joo,  2010)  or  search  strategies   (Belkin,  1995)  in  the  process     —  A  few  studies  on  collaborative  search  process   —  Several  studies  look  into  the  application  of  individual  search   models  in  the  collaborative  environment  (Hyldegard,  2006;  Shah  &   González-­‐ibáñez,  2010).   —  An  investigation  on  before,  during  and  after  search  stages  in  social   search  (Evans  &  Chi,  2008).   Status  on  Search  Processes   12
  • 13.
    Research  Ques-ons   — RQ1.  How  to  model  the  search  states  in  the   collaborative  exploratory  search  process?    What  are  the   characteristics  of  search  states  in  the  collaborative   exploratory  search  process?   —  Search  states:  basic  units  of  a  search  process   —  RQ2.  What  are  the  characteristics  of  query  behaviors  in   the  collaborative  exploratory  search  process?   —  RQ3.  What  are  the  characteristics  of  communications   between  team  members  in  the  collaborative   exploratory  search  process?   13 Holistic view of collaborative search processes Two key components of the collaborative search process
  • 14.
    CollabSearch:  a  Collaborative  Search  System     http://crystal.exp.sis.pitt.edu:8080/CollaborativeSearch/ q  Search functions - Web Search - Save/edit/rate/tag Web pages/snippets - Space for search task description q  Collaboration functions - Chat - Share search queries - Share saved Web pages/snippets System:  CollabSearch   14
  • 15.
    Experiment  Design   — Two  Experiment  Conditions   —  COL:  Collaborative  search     –  Two  participants  worked  as  a  team  on  the  same  search  task  simultaneously.     —  IND:  Individual  search     –  One  participant  worked  on  the  search  task  individually.     —  Participants     —  36  participants  (18  pairs)  in  COL  condition   —  18  participants  in  the  IND  condition   —  Two  Exploratory  Search  Tasks  (30mins/task)   —  Information-­‐gathering  task     –  collecting  information  for  writing  a  report  on  the  impact  of  social  network   (Shah  &  Marchionini,  2010).     —  Decision-­‐making  task     –  collecting  information  for  planning  a  trip  to  Finland  (Paul  &  Morris,  2010).     —  The  orders  of  the  two  tasks  were  rotated   15
  • 16.
    HMM  Model  of  Search  Process   16
  • 17.
    Approaches  for  Study  Search  Processes   —  Analyze  qualitative  constructs  in  the  search  process     —  Kuhlthau,  1991;  Ellis,  1993;  Marchionini,  1995   —  Search  pattern  analysis  based  on  logged  behavior   —  Directly  use  logged  actions  (Holscher  &  Strube,  2000)   —  Ignore  user  intentions  such  as  search  tactics   —  Manually  code  search  tactics/strategies  on  log  data  (Xie&Joo,2010)   —  Time-­‐consuming  and  need  a  theoretical  model  to  generate  codes   —  Our  approach:  Hidden  Markov  Model   —  Model  search  tactics  as  hidden  states   —  An  automatic  approach  for  analyzing  time  sequential  events   17
  • 18.
    —  A  two-­‐level  view  of  search  process   —  Higher  level:  search  states  as  hidden  search  tactics  or  strategies   —  Lower  level:  observable  actions   —  Parameters  in  HMM   —  Number  of  hidden  states  N  (N  ≠  M)   —  Transition  probabilities  among  any  two  hidden  states   —  Emission  probability  from  each  state  to  each  action   A Hidden Markov Model for Search States Observable actions Hidden search tactics or strategies Model  Search  States  using  HMM   18 Transition     probabilitie s   Emission     probability     !1   !2   !3   !%   &1   &2   &3   &%  
  • 19.
    Categorizing  Observable  Ac-ons   Method Search Scan Select Capture Communicate Object Query Topic statement Item in search result Chat messages List of saved items Single saved item Source Self Partner Shared/mix Inspired by Belkin’s ISSs model (Belkin, 1995) but with modifications to accommodate the context in collaborative web search. 19
  • 20.
    Actions Descriptions Search  –  query  –  self  (Q) A  user  issues  a  query Select-­‐  item-­‐self  (V) A  user  clicks  on  a  result  in  the  returned  result  list Capture-­‐item-­‐self  (S) A  user  saves  a  snippet  or  bookmarks  a  webpage Scan-­‐list  of  saved  item  –  mixed  (Wm) A   user   checks   the   workspace   without   clicking   on   any  particular  item. Select  –  single  saved  item  –self  (Ws) A  user  clicks  on  an  item  in  the  workspace  saved  by   him/herself Select  –  single  saved  item  –  partner  (Wp) A  user  clicks  on  an  item  in  the  workspace  saved  by   the  partner Scan-­‐topic  -­‐shared  (T) A  user  clicks  on  the  topic  statement  for  view   Communicate-­‐  messages-­‐self  (Cs) A  user  sends  a  message  to  the  other  user   Communicate-­‐message-­‐partner  (Cp) A  user  receives  a  message  from  the  other  user 20 Ac-ons  Observed  in  this  Study  
  • 21.
    Parameters  Selec-on  and  Es-ma-on     —  Determine  number  of  hidden  states  N   —  Bayesian  Information  Criterion  (BIC)   —  Parameter  estimations   —  Baum-­‐Welch  algorithm:  maximize  data  likelihood   —  Random  assign  a  start  probability  π  for  each  state,     —  Estimate  the  transition  probabilities  and  emission  probability  through  a   machine  learning  process     21 6000 6500 7000 7500 8000 2 3 4 5 6 7 26000   27500   29000   30500   32000   4   5   6   7   8   9   10   IND: N=4 COL: N=6 BIC=-2×logL+log(S) ×NP log-likelihood (logL) Number of parameters (NP) sample size(S)
  • 22.
    HMM  Output  for  Individual  Search   Query View Save Workspace self Topic HQ 0.99 HV 0.91 HS 0.96 HD 0.57 0.42 Hidden States and Emission Probabilities in IND (values<0.05 are omitted) 22 Search-related hidden states: HQ, HV, HS HQ: hidden states of querying HV: hidden states of viewing a search result HS: hidden states of saving a search result Sensemaking-related hidden states: HD HD: hidden states of defining current search problem
  • 23.
    Compare  to  Exis-ng  Search  Models   0.39 0.12 HQ 0.33 HV HS 0.50 HD 0.07 0.85 0.59 0.49 0.26 0.31 0.12 HQ 0.32 HV HS 0.56 HD 0.13* 0.86 0.38 0.53 0.34 0.39 0.23 Sub-processes in the ISP model HMM Define Problem HD Select Source, Formulate Query, Execute Query HQ Examine Results HV Extract Information HS Reflect/Iterate/Stop HD Mapping from sub-process in Marchionini’s ISP model to the hidden states transition probabilities of hidden states in IND The default transition in Marchinoinini’s model can be mapped to into HDàHQàHVàHSàHD, which is also the pattern of the highest probability in HMM results. 23
  • 24.
    HMM  Outputs  in  Collabora-ve  Search   Query View Save Workspace mix Workspace self Workspace partner Topic Chat send Chat receive HQ 0.82 0.13 HV 0.87 0.1 HS 0.88 HD 0.36 0.36 0.21 HW 0.37 0.44 0.12 HC 0.44 0.47 Hidden States and Emission Probability in COL Search-related hidden states: HQ, HV, HS Sensemaking-related hidden states: HD, HW, HC HD: hidden states of defining current search problem HW: hidden states of implicit communication HC: hidden states of explicit communication (Paul & Morris, 2010): chat-centric sensemaking (HC) and workspace- centric sensemaking (HW) (Evans & Chi, 2008): search and sensemaking are tightly coupled 24
  • 25.
    Detec-ng  Task  Differences  using  HMM   0.11 HQ 0.26 HV HS 0.45 HD 0.06 0.86 0.56 0.31 0.08 0.46 0.30 HC HW 0.56 0.09 0.14 0.16 0.16 0.14 0.15 HQ 0.22 HV HS 0.33** HD 0.12 0.84 0.28** 0.35** 0.20** 0.42 0.24 0.13 HC 0.89 HW 0.48 0.06 0.30** 0.07** 0.33* 0.13** Comparison of Transition Probabilities of Hidden states in COL for the two tasks (red arrows indicate significant difference: *p<0.05, **p<0.01) Cross-category transitions: From search to sensemaking From sensemaking to search Cross-category transitions is more common in collaborative search than in individual search Cross-category transition is more common in decision-making task than in information gathering task. 25 Information-gathering Decision-making
  • 26.
    Query  Behaviors  and  Communica-ons   26
  • 27.
    27 Search condition (individual or collaborative) Tasktype (information-gathering or decision-making) Query vocabulary features (number of queries, query vocabulary richness, query diversity) Query reformulation patterns (New, Generalization, Specification, Reconstruction) Query performance (Precision, recall, Successful query rate, user satisfaction, cognitive load) Research  Design   Independent Variables Dependent Variables Communication Timing Communication Content
  • 28.
    28 •  Query  Vocabulary  Richness  (QVR)   •  Query  Diversity  (QD)     •  Levenshtein  distance  (Shah  &  González-­‐Ibáñez,  2011)       •  calculate  the  difference  between  a  pair  of  queries.   •  Query  Result  Similarity  (QRS)     Query  Vocabulary  Features   (Kromer, Snasel, & Platos, 2008)
  • 29.
    Query  Reformula-on  PaPerns   29 Type Definition New (N) Qi is the first query or does not share any common terms with Qi-1 Generalization (Ge) Qi shares common terms with Qi-1 ; and Qi contains fewer terms than Qi-1 Specialization (Sp) Qi shares common terms with Qi-1 ; and Qi contains more terms than Qi-1 Reconstruction (Rc) Qi shares common terms with Qi-1 ; and Qi has the same length as Qi-1 •  Qi-1 and Qi are two consecutive queries in the same search session •  The patterns are defined based on (He et al. 2002; Jansen et al. 2009)
  • 30.
    Query  performance   30 Objective   Measurements   Precision  &   Recall   Successful   Query  Rate   Subjective   Measurements   User   Satisfaction   Cognitive  Load  
  • 31.
    Communica-on  Timing  Analysis   —  Before  search:  communications  before  the  first  search   action   —  During  search:  communications  between  the  first  and  last   search  action   —  After  search:  communications  after  the  last  search  action   —  (Search  actions:  issuing  a  query,  viewing  or  saving  a  search  result)   31  Chat    Chat    Chat    Chat   Before   Search During   Search During   Search A<er   Search The first search action The last search action Search action Link to rational
  • 32.
    Collabora-on  Benefit  I:  Rich   Vocabularies  and  Diverse  Queries   32 T2Task IND COL 0 0.5 1 1.5 2 2.5 3 T1 T2 QVR Task IND COL Query vocabulary richness Query diversity 0 5 10 15 20 25 30 T1 T2 QD Task IND COL .00 .02 .04 .06 .08 .10 T1 T2 QRS Task IND COL Chat  time   QVR   Total     ↑(p=0.045)   Before  search   -­‐   During  search   -­‐   After  search   -­‐   ü Participants in the collaborative search were able to employ wider range of vocabularies for the queries and the queries were more diverse. ü A positive correlation was found between the total chat time and the query vocabulary richness. *Differences showed in the graph are significant
  • 33.
    Collabora-on  Cost  I:  Low  Recall  and   Low  Successful  Query  Rate   33 Recall Successful Query Rate Chat  time   Recall   Total     ↓(p=0.001)   Before  search   ↓(p=0.022)   During  search   -­‐   After  search   ↓(p<0.001)   ü Collaboration takes times and efforts. Participants had less time to devote to search, and they were more stringent on the what documents to save. ü A negative correlation was found between the communication and recall.
  • 34.
    Collabora-on  Benefit  II:  People  More   Sa-sfied  and  less  stressed   34 User Satisfaction Cognitive Load *Differences showed in the graph are significant Chat  time   Satisfaction   CogLoad   Task  social   ↑(p=0.017)   ↓(p=0.022)   Task   coordination   -­‐   -­‐   Task  content     -­‐   ↑(p=0.008)   Non-­‐task   -­‐   -­‐   ü Participants in collaborative search are more satisfied with the performance and have lower cognitive load. ü A positive correlation was found between the task social communication and satisfaction (negative for Cogload).
  • 35.
    Collabora-on  Affects  the  PaPerns  of   Query  Reformula-on   35 0.1 0.2 0.3 0.4 0.5 IND COL New Generalization Specification Reconstruction 0.1 0.2 0.3 0.4 0.5 Task  1 Task  2 New Generalization Specification Reconstruction ü Higher percentages of New and Specialization and lower percentage of Reconstruction in the collaborative search. ü Participants in collaborative search were able to explore the divided subtopics in depth while the participant in the individual search owns the entire search topic and the scope maybe the first priority.
  • 36.
    Communica-on  PaPerns   36 Proportionof each communication content type within stage (Left: T1; Right: T2) ü Communication is common in any of three stages. ü The communication content varies in the three stages. The before search stage communication is more focused on the task coordination. The during search stage communication is more focused on the task content. Task social communication is more common in the before search and after search stage than in the during search stage.
  • 37.
  • 38.
    What  We  Learned   —  Collaborative  search  process  have  patterns   —  More  collaboration-­‐oriented  actions  as  the  collaboration  level   increase   —  Transitions  within  search-­‐oriented  actions  and  within   collaboration-­‐oriented  actions  are  more  frequent  than   between  them  in  all  three  conditions.     —  Explicit    and  implicit  communication  has  potential   benefit  on  helping  using  generating  query  ideas.   38
  • 39.
    Implica-ons  for  Collabora-ve  Search   Research   —  Search  activities  and  sensemaking  activities  are  tightly   coupled  in  the  collaborative  search.   —  The  studies  of  collaborative  search  should  not  just   concentrate  on  the  effectiveness  of  search,  but  also  on   the  users’  perception  of  their  search  experiences,   particularly  their  satisfaction  and  cognitive  load.   —  The  wider  range  of  query  vocabulary  in  collaborative   search  did  not  necessarily  lead  to  a  more  effective   search  outcomes.   39
  • 40.
    Implica-ons  for  Collabora-ve   Search  System  Design   —  It’s  important  to  design  interface-­‐mediated  support  for   the  coordination  among  team  members  as  the   coordination  through  communication  is  costly.   —  Provide  targeted  algorithm-­‐mediated  query   suggestions  based  on  the  findings  of  how  users   reformulate  queries  in  the  collaborative  search.     —  Designers  need  to  make  a  balance  between  the  support   for  fulfilling  the  search  task  and  the  support  for  social   interactions  among  team  members.   40
  • 41.
    Implica-ons  for  Other  Researchers   —  Researchers  in  CSCW  and  CSCL  can  draw  insights  on   —  Sense-­‐making  intertwined  with  the  activities  that  directly  aim  for   fulfilling  the  task  requirements.     —  Consider  the  social  gain  and  emotional  support  when  evaluating  the   team  effectiveness.     —  The  findings  on  the  differences  between  information-­‐gathering  and   decision-­‐making  collaboration  tasks.     —  Researchers,  try  HMM  if  you   —  are  analyzing  complex  interactive  process  in  a  time  sequence   —  don’t  have  a  theoretical  model  to  start  with   —  want  to  see  the  patterns  of  your  data  before  applying  time-­‐consuming   qualitative  annotation   —  are  interested  in  the  hidden  strategies  or  tactics  underneath  the   observable  actions  of  users   41
  • 42.
    Privacy  Concerns  in   Social  Match-­‐based   People  Search Collaborate with Shuguang Han, Zhen Yue
  • 43.
  • 44.
  • 45.
    Social  Match   is  Important  in   People  Search   45 because a tighter social similarity make it easier for people to connect Then Need the users’ social networks to return the potential candidates who have either direct or indirect connections with the given users.
  • 46.
    But  Privacy  is  a  BIG  Concern   46 —  users  in  many  social  network   services  often  either  opt  out   from  certain  social  networks   or  provide  incomplete  or  even   fake  information  on  those   networks.     —  many  data  mining  algorithms   may  not  work  or  even  harm   the  user  experience  when   equipped  with  such   incomplete  and  noisy  social   information  
  • 47.
    People  Search  Use  Co-­‐Author  Network   47 Which  has  the  advantage     of  lacking  privacy  concerns       But  this  limits     the  type  of  people  search   being  studied,     So  should  study     other  social  networks   which  has  privacy  concerns        
  • 48.
    Our  Goals   — interested  in  the  privacy  related  issues  in  people  search   and  the  impacts  of  these  issues  on  the  performance  of   people  search  systems.     —  users  in  many  social  network  services  are  able  to  keep  both  their   profile  and  social  connections  private   —  we  focus  on  the  privacy  issues  of  sharing  social  connections   —  simulating  the  privacy-­‐concerned  social  network  by   using  the  public  available  coauthor  networks.   —  Critical  need  for  privacy-­‐concerned  social  network  as  test  bed   —  Difficult  to  finding  an  open  privacy-­‐concerned  social  network  or  very   expensive  to  such  a  network  from  scratch  for  research  purpose,     48
  • 49.
    Key  Assump-ons   — a  coauthor  network  is  the  same  or  similar  with  a   privacy-­‐concerned  social  network,  because  studies   show   —  many  real-­‐world  social  networks  (including  coauthor  networks  and   many  other  privacy-­‐concerned  networks  such  as  Facebook  social   networks)  share  the  same  patterns   —  All  small-­‐world  networks  and  their  degree  distributions  are  highly  skewed.     —  assortative  patterns  (the  preferences  of  connecting  people  who  share   the  similar  features)  of  social  networks  are  all  assortatively  mixed,     —  whereas  the  technological  and  biological  seems  to  be  disassortative.   —  so  studying  academic  coauthor  networks,  which  are   publically  available,  can  be  the  surrogate  for  studying   privacy-­‐  preserving  social  networks     49
  • 50.
    Research  Focuses   — Identify  two  types  of  features,  both  used  in  people   search   —  Global  network  features:  the  features  that  are  propagated  through  the   whole  networks     —  measured  by  the  PageRank  value  running  on  the  whole  social  networks   —  Local  network  features:  the  features  that  are  directly  related  to  the   ego-­‐network  of  the  querying  user     —  measured  by  the  proportion  of  common  social  connections   50
  • 51.
    Research  Ques-ons   — RQ1:  How  to  properly  simulate  different  types  of   privacy-­‐preserving  social  networks?     —  RQ2:  How  does  each  privacy-­‐preserving  network  affect   the  global  and  local  network  features?     —  RQ3:  How  does  the  obtained  global  and  local  network   features  further  affect  the  people  search  performance?   That  is,  what  are  the  impacts  of  these  features  derived   from  privacy-­‐preserving  networks  on  the  search  process   of  finding  the  best  candidates  when  comparing  with  the   use  of  full  network  information?     51
  • 52.
    Data   —  Academic  publication  collection   —  containing  219,677  conference  papers  from  the  ACM  Digital  Library.     —  between  1990  and  2013.     —  Only  public  available  information  of  a  paper:  the  title,  abstract  and  authors   —  No  further  author  disambiugation  besides  ACM  Digital  Library  author  ID   —  In  total,  the  collection  contains  253,390  unique  authors  and  953,685  coauthor   connection  instances.   —  Users’  people  search  activities:  Han  et  al.  [5]  evaluation  of  a  people   search  system.     —  four  different  people  search  tasks,  each  aimed  to  search  for  5  candidates.     —  A  baseline  plain  content-­‐based  people  search  system     —  An  experimental  system  that  enhances  people  search  with  three  interactive  facets:   content  relevance,  social  similarity  between  the  user  and  a  candidate  (the  local   network  feature)  and  the  authority  of  a  candidate  (the  global  network  feature).     —  The  experiment  system  allowed  the  querying  users  to  tune  the  value  associated  with  each  facet   in  order  to  generate  a  better  candidate  search  results.     —  24  participants  were  recruited  for  the  user  study.     —  At  the  beginning  of  the  user  study,  each  participant  was  asked  to  provide  their  publications  and   their  close  social  connections  (such  as  advisors).     —  In  the  post-­‐task  questionnaire,  the  participants  were  asked  to  rate  the  relevance  of  each  marked   candidate  in  a  Five-­‐point  Likert  scale  (1  as  non-­‐relevant  and  5  as  the  highly  relevant).   52
  • 53.
    Formulas   53 pi isthe probability of a given user has privacy concern, di is the degree of association of a user and dmax. Is the maximized degree in the network, λ helps to establish different selection strategies Mean Absolute Error (MAE) between new authority and ground-truth authority over all of the authors
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
    Insights   —  Both  the  local  and  global  network  features  are  important  for  the   performance  of  people  search  (compare  to  not  using  social   network).     —  Comparing  to  the  global  network  feature,  the  local  network  feature  is  more   important.     —  Privacy-­‐concerns  reflected  in  local  and  global  network  features  can   significantly  influences  on  the  performance  of  people  search   —  The  privacy  concerns  from  the  high-­‐degree  candidates  in  the  network  will  have   more  impacts  on  global  features.     —  The  local  network  feature  is  related  to  both  the  querying  users  and  the  candidates   in  the  networks.     —  the  privacy  concerns  from  both  of  them  have  significant  impact  on  the  people  search   performance.     —  The  privacy  concerns  from  high-­‐degree  candidates  have  bigger  influences  on  the  people   search  than  that  of  the  lower-­‐degree  candidates,  especially  when  those  high-­‐degree   candidates  are  related  to  the  querying  user.     —  We  also  find  that  if  the  querying  users  provide  more  social  connections,  the  search   performance  would  increase  steadily.   58
  • 59.
    Scholars  on  Academic   Social  Network   Services Collaborate with Wei Jeng and Jiepu Jiang
  • 60.
    Formal scholarly communicationdescribes activities or scholarly outcomes that can be viable over time to an extended audience. This availability over long periods of time, also known as permanent access, traditionally referred to publications in books or peer-reviewed journals.
  • 61.
    Informal scholarly communicationis made scholarly outcomes “available to a restricted audience only” (as cited in Borgman, 2007, p. 49), such as self-publishing, Listerv, mails, or a “coffee break” in a conferences where scholars can exchange information.
  • 63.
    Academic  Social  Networking  Service   (ASNS)   —  The  term  academic  social  networking  service  as  a  broad   term  that  refers  to  an  online  service,  tool,  or  platform   that  can  help  scholars  to  build  their  professional   networks  with  other  researchers  and  facilitate  their   various  activities  when  conducting  research.     —  Some  well-­‐known  examples  of  ASNSs  include     —  ResearchGate.net  (http://www.researchgate.net/)   —  Academia.edu  (http://www.academia.edu/)     —  Mendeley.com  (http://www.mendeley.com/)  
  • 64.
    Features  on  ASNSs   —  ASNSs  allow  users  to     —  create  profiles  with  academic  properties   —  upload  theirs  publications   —  create  online  groups     —  Some  ASNS,  such  as  Mendeley  and  Zotero,  even  offer   software  applications,  such  as  bibliographic  tools  to   support  scholars  in  managing  their  documents  and   citations.  
  • 65.
    If academic socialnetwork sites are providing an alternative channel to support informal scholarly communication, then it is important to study: What the implications we can learn from analyzing academic users’ actual usages on those sites. 65
  • 66.
    Our  Research  Ques-ons   — RQ1:  Who  are  the  users  of  an  academic  social   networking  service  (ASNS)  that  supports  open   groups?     — RQ2:  In  what  ways,  and  how  often,  do  such   group  participants  use  an  ASNS?     — RQ3:  What  motivates  ASNS  users  to  utilize   social  or  research  features  on  an  ASNS?    
  • 67.
    Research  Site:  Mendeley   —  Launched  in  2008,  Mendeley  (http:// www.mendeley.com/)  is  one  of  the  most  popular  ASNSs   and  has  more  than  two  million  users.   —  Mendeley  allows  users  to  build  their  own  digital   research  library  by  importing  PDF  files  from  their  local   devices.   —  There  are  three  common  ways  to  use  social  features  on   Mendeley:  maintain  a  profile,  manage  existing  contacts,   and  make  more  connections.  
  • 68.
  • 69.
  • 70.
    Our  Samples:  Mendeley  Group   —  Mendeley  allows  users  to  start  groups  to  share  what   they  are  interested  in  and  what  they  are  reading  about.     —  Two  types  of  groups  supported  on  the  site:     —  private  groups  that  are  only  visible  to  the  members;   —  public  groups  that  are  publicly  visible  and  can  be  searched  in   Mendeley’s  group  list.   —  We  adopted  a  representative  sampling  method  to   identify  Mendeley’s  large  open  group  users.    
  • 71.
    Method  and  data  collec-ng   —  We  chose  a  cross-­‐sectional  survey  as  the  research   method  to  answer  these  research  questions  and   developed  a  questionnaire  with  30  questions.     —  The  questionnaire  was  distributed  to  97  open  groups  in   Mendeley,  one  of  the  most  popular  ASNSs.  
  • 72.
    The  instrument-­‐I   — Basic  Information     —  The  Extent  of  Use:  Questions  that  aim  to  determine  the   extent  of  participants’  account  activities  on  Mendeley   —  Common  Ways  to  Use:     —  as  a  document  management  tool,     —  a  reference  manager,     —  a  scholarly  search  engine,     —  an  online  portfolio,     —  a  friend  management  tool,  and     —  a  socialization  tool  
  • 73.
    The  instrument-­‐II   — The  Extent  of  Group  Use   —  Motivation:   —  Information     —  Networking   —  Visibility   —  Altruistic  
  • 74.
    Result:  Overview   — We  received  188  responses  via  the  questionnaire,  but   only  146  users  completed  the  entire  questionnaire.     —  The  average  age  of  the  participants  was  35.04  years   (SD=10.81),  and  64%  of  them  were  male  (N=94).   —  We  obtained  responses  from  users  in  20  disciplines  in   Mendeley.     —  The  top  three  disciplines  represented  were  computer   and  information  science  (N=43),  biological  science   (N=24),  and  social  science  (N=17).    
  • 75.
    The  Par-cipants   — Distribution  of  academic  disciplines   —  Early  adaptor:  Biomedicine  users     —  Newcomers:  Social  Sciences   —  Lack  of  humanities,  literature,  philosophy,  and  design  users   — Distribution  of  academic  positions  
  • 76.
    Frequency  of  Account  Ac-vi-es     ASNS  users  are  not  as  active  as  general  SNS  users:   —  53%  of  respondents  visited  their  accounts  on  a  weekly   basis,  while  36%  of  them  accessed  the  site  at  least  once   per  month.   —  Also  more  than  half  (53%)  of  the  participants  reported   that  they  were  checking  the  news  feeds  only  on  a   monthly  basis.  
  • 77.
    Ways  to  Use  Mendeley   —  Participants  primarily  used  Mendeley  as     —  a  document  management  tool   —  citation  management  software   —  The  portion  of  those  using  Mendeley  as  a  social   networking  site  was  relatively  low:  Only  11%  of   respondents  used  Mendeley  to  manage  their  existing   academic  friends  and  to  expand  their  professional   networks.     —  These  results  indicate  that  most  of  our  participants   use  Mendeley  for  its  research  features,  rather  than   social  features.    
  • 78.
    Mo-va-ons  for  Joining  Groups   Mo-va-on   Motivation  Items     M   Information Keep  up  with  a  user’s  research  domain 4.30 Get  research-­‐related  questions  answered 3.41 Follow  topics  that  community  is  paying  attention   to 3.98 Networking Connect  with  people  who  have  similar   research  interests   3.91 Expand  current  social  network 3.23 Meet  more  academic  people 3.27 Keep  in  touch  with  people  one  already  knows 3.08 Visibility Gain  professional  visibility 3.48 Be  present  in  current  discussions 3.27 Altruistic   Contribute  to  the  reading  list   3.62 Table: Users’ motivation in terms of joining a Mendeley group created by others.
  • 79.
    Factors  that  may  influence  on  the  outcome  of   group  joining  mo-ves   —  Results  suggest  that  people  are  most  motivated  by   visibility  and  altruism  when  considering  whether  to   join  or  follow  more  groups.   —  Even  if  users  frequently  and  regularly  engaged  in   research-­‐based  activities  on  Mendeley  (such  as  a   document  management  or  citation  management  tool)  ,   it  would  not  make  any  difference  in  terms  of  their   intentions  of  joining  groups.  
  • 80.
    Insights  Based  on  the  Findings   — What  is  Mendeley  exactly?     — Discipline  Distribution  and  Development  on   Mendeley   — Academic  Networking  or  Social  Networking?   — The  incentives  of  group  joining  
  • 81.
    Mendeley:  A  Plaorm  for  Higher   Educa-on  Users   —  Our  findings  confirmed  that  the  majority  of  Mendeley   users  were  from  the  higher  education  environment.   More  specifically,  junior  researchers  (i.e.,  doctoral   students,  post-­‐doctoral  fellows,  and  graduate  students).     —  For  those  researchers  who  would  like  to  study  junior   scholars’  information  behaviors  or  run  a  survey  on  a   wide  range  of  online  scholars,  we  believe  that  an  ASNS   such  as  Mendeley  are  the  right  platforms  to  use  to   reach  those  types  of  participants.    
  • 82.
    Discipline  Distribu-on  and   Development  on  Mendeley   —  Our  results  show  that  the  discipline  development  in   Mendeley  is  uneven.     —  Early  users  in  Mendeley  groups  mostly  came  from  the   fields  of  computer  &  information  science  and   biomedicine,  whereas  more  recent  users  are  mostly   from  the  fields  of  social  science,  education  and   psychology.     —  We  do  not  see  many  group  users  from  the  humanities   and  other  related  fields.    
  • 83.
    Academic  Networking  vs.  Social   Networking   —  “Academic”  but  not  “Social”?   —  Users  of  Mendeley  seem  to  mainly  concentrate  on  the   utilities  directly  related  to  their  research  work,  while   mostly  ignoring  its  social  features,  such  as  “friend   making”.   —  Warning  for  ASNS  developers:  Do  not  simply  adopt   “Facebook-­‐like”  or  “LinkedIn-­‐like”  social  elements   when  designing  an  ASNS  platform  for  academic  users.  
  • 84.
    Join  a  Group?  Show  Me  the  Incen-ve.   —  A  gap  between  motivation  and  incentive:  The  altruistic   motivation  was  one  of  the  most  critical  reasons   associated  with  their  group  engagement,  yet  none  of   current  features  of  Mendeley  reward  scholars  for  their   altruistic  activities.   —  Possible  incentive  mechanisms  to  encourage   interactions  :   —  providing  of  affective  feedback  by  group  owners  or  members,  to  users   who  contributed   —  establishing  level-­‐based  honor  system  or  badging  system  
  • 85.
    Limita-ons   —  We  sampled  only  open  and  large  groups  on  Mendeley   instead  of  a  random  sample.     —  The  discipline  and  position  distribution  of  the   participants  may  be  biased  towards  users  who  are  the   use  group  of  social  feature  and  highly  engaged  in  group   activities.   —  If  researchers  would  like  to  investigate  the  wider   landscape  of  ASNS  users,  larger-­‐scaled  and  random   sampling  approaches  are  needed.  
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
    Challenges   —  Know  the  boundary  of   Social  Information   Access   —  How  to  identify  which  tasks   are  good  for  social   information  access?   —  How  to  effectively  integrate   social  networking,  direct   messaging,  and  social   recommendations  with   current  search  facilities.  
  • 92.
    Related  Publica-ons   — Z.  Yue,  S.  Han,  D.  He.  J.  Jiang.  Influences  on  Query  Reformulation   in  Collaborative  Web  Search.  IEEE  Compute,  2014  (3):46-­‐53.   —  Z.  Yue,  S.  Han,  D.  He.  Modeling  Search  Processes  using  Hidden   States  in  Collaborative  Exploratory  Web  Search.  The  17th  ACM   Conference  on  Computer  Supported  Cooperative  Work  and  Social   Computing  (CSCW  2014).   —  Z.  Yue,  S.  Han,  D.  He.  An  Investigation  on  the  Query  Behavior  in   Task-­‐based  Collaborative  Exploratory  Web  Search.  The  76th   Annual  Meeting  of  the  Association  for  Information  Science  and   Technology.  (ASIST  2013).   —  Jeng,  W.,  He,  D.,  &  Jiang,  J.  (In  press).  User  Participation  in  an   Academic  Social  Networking  Service:  A  Survey  of  Open  Group   Users  on  Mendeley.  Journal  of  the  Association  for  Information   Science  and  Technology.   92
  • 93.
  • 94.
    Acknowledgement     — The  work  presented  here  were  conducted  by  faculty  and   students  in  Information  Retrieval,  Integration  and   Synthesis  Lab  at  School  of  Information  Sciences   —  Other  people  participated  in  these  works  are   —  Prof.  Peter  Brusilovsky,  Prof  Dan  Wu  etc.   —  These  work  are  partially  supported  by  the  National   Science  Foundation