Crowd Mining 
TovaMilo
Data Everywhere 
The amount and diversity of Data being generated and collected is exploding 
Web pages, Sensors data, Satellite pictures, DNA sequences, … 
Crowd Mining 2
From Data to Knowledge 
Crowd Mining 
Buried in this flood of data are the keys to 
-New economicopportunities-Discoveries in medicine, science and the humanities-Improving productivity & efficiency 
However, raw data alone is not sufficient!!! 
We can only make sense of our world by turning this data into knowledge and insight. 
3
The research frontier 
Crowd Mining 
• 
Knowledge representation. 
•knowledge collection, transformation, integration, sharing. 
•knowledge discovery. 
We will focus today on human knowledge 
Think of humanity and its collective mind expanding… 
4
The engagement of crowds of Web users for data procurement 
11/6/2014 5 
Background - Crowd (Data) sourcing 
5 
Crowd Mining
General goals of crowdsourcing 
• 
Main goal 
– 
“Outsourcing” a task (data collection & processing) to a crowd of users 
• 
Kinds of tasks 
– 
Can be performed by a computer, but inefficiently 
– 
Can be performed by a computer, but inaccurately 
– 
Tasks that can’t be performed by a computer 
Crowd Mining 
So, are we done? 
6
MoDaS(Mob Data Sourcing) 
• 
Crowdsourcinghas tons of potential, yet limited triumph… 
– 
Difficulty of managing huge volumes of data and users of questionable quality and reliability. 
• 
Every single initiative had to battle, almost from scratch, the same non-trivial challenges 
• 
The ad hoc solutions, even when successful, are application specific and rarely sharable 
=> Solid scientific foundations for Web-scale data sourcing 
With focus on knowledge management using the crowd 
MoDaS 
EU-FP7 ERC 
MoDaSMob Data Sourcing 
7
Challenges –(very) brief overview 
• 
What questions to ask? [SIGMOD13, VLDB13,ICDT14, SIGMOD14] 
• 
How to define & determine correctness of answers? [ICDE11, WWW12] 
• 
Who to ask? how many people? How to best use the resources? [ICDE12, VLDB13, ICDT13, ICDE13] Crowd Mining 
Data Mining 
Data Cleaning 
Probabilistic Data 
Optimizations and Incremental Computation 
EU-FP7 ERC 
MoDaSMob Data Sourcing 
8
A simple example –crowd data sourcing (Qurk) 
• 
? 
Crowd Mining 
9 
Picture 
name 
Lucy 
Don 
Ken 
… 
… 
The goal: 
Find the names of all the women in the peopletable 
SELECT name 
FROM people p 
WHERE isFemale(p) 
isFemale(%name, %photo) 
Question: “Is %name a female?”, %photo 
Answers: “Yes”/ “No” 
9
A simple example –crowd data sourcing 
Crowd Mining 
10 
Picture 
name 
Lucy 
Don 
Ken 
… 
… 
10
Crowd Mining: Crowdsourcing in an open world 
• 
Human knowledge forms an open world 
• 
Assume we want to find out what is interestingand importantin some domain area 
Folk medicine, people’s habits, … 
• 
What questions to ask? 
Crowd Mining 11
Back to classic databases... 
• 
Significant data patterns are identified using data miningtechniques. 
• 
A useful type of pattern: association rules 
– 
E.g., stomach ache chamomile 
• 
Queries are dynamically constructed in the learning process 
• 
Is it possible to mine the crowd? 
Crowd Mining 12
Turning to the crowd 
Let us model the history of every user as a personal database 
• 
Every case = a transactionconsisting of items 
• 
Not recorded anywhere –a hidden DB 
– 
It is hard for people to recall many details about many transactions! 
– 
But … they can often provide summaries, in the form of personal rules 
“To treat a sore throat I often use garlic” 
Crowd Mining 
Treated a sore throatwith garlicand oregano leaves… 
Treated a sore throatand low feverwith garlicand ginger… 
Treated a heartburnwith water, baking sodaand lemon… 
Treatednauseawith ginger, the patient experienced sleepiness… 
… 
13
Two types of questions 
• 
Free recollection (mostly simple, prominent patterns) 
Open questions 
• 
Concrete questions (may be more complex) 
Closed questions 
We use the two types interleavingly. 
When a patient has both 
headaches and fever, how often do you use a willow tree bark infusion? 
Tell me about an illness and how you treat it 
“I typically treat nausea with ginger infusion” 
Crowd Mining 14
Contributions (at a very high level) 
• 
Formal model for crowd mining; allowed questions and the answers interpretation; personal rules and their overall significance. 
• 
A Frameworkof the generic components required for mining the crowd 
• 
Significance and error estimations. [and, how will this change if we ask more questions…] 
• 
Crowd-mining algorithms 
• 
[Implementation & benchmark. both synthetic & real data.] 
Crowd Mining 15
The model: User support and confidence 
• 
A set of usersU 
• 
Each user u ∈U has a (hidden!) transaction databaseDu 
• 
Each rule X Y is associated with: 
user 
support 
user 
confidence 
Crowd Mining 16
Model for closed and open questions 
• 
Closed questions: X ?Y 
– 
Answer:(approximate) user support and confidence 
• 
Open questions:? ?? 
– 
Answer:an arbitrary rule with its user support and confidence 
“I typically have a headache once a week. In 90% of the times, coffee helps. 
Crowd Mining 17
Significant rules 
• 
Significant rules: Rules were the meanuser support and confidence are above some specified thresholds Θs,Θc. 
• 
Goal: identifying the significant rules while asking the smallest possible number of questionsto the crowd 
Crowd Mining 18
Framework components 
• 
Generic frameworkfor crowd-mining 
• 
One particular choice of implementation of each black boxes 
Crowd Mining 19
• 
Treating the current answers as a random sample of a hidden distribution , we can approximate the distribution of the hidden mean 
• 
μ–the sample average 
• 
Σ–the sample covariance 
• 
K–the number of collected samples 
• 
In a similar manner we estimate thehidden distribution 
11/6/2014 20 
Estimating the mean distribution 
20 
Crowd Mining
• 
Define Mras the probability mass above both thresholds for rule r 
• 
ris significant if Mris greater than 0.5 
• 
The error prob. is the remaining mass 
• 
Estimate how error will change if another question is asked 
• 
Choose rule with largest error reduction 
11/6/2014 21 
Rule Significance and error probability 
21 
Crowd Mining
Completing the picture (first attempt…) 
• 
Which rules should be considered? 
Similarly to classic data mining (e.g. Apriori) 
Start with small rules, then expend to rules similar to significant rules 
• 
Should we ask an open or closed question? 
Similarly to sequential sampling 
Use some fixed ratio of open/closed questions to balance the tradeoff between precision and recall 
Crowd Mining 22
Semantic knowledge can save work 
Given a taxonomy of is-a relationships among items, e.g. espresso is a coffee 
frequent({headache, espresso}) ⇒frequent({headache, coffee}) 
Advantages 
• 
Allows inference on itemsetfrequencies 
• 
Allows avoiding semantically equivalent itemsets{espresso}, {espresso, coffee}, {espresso, beverage}… 
Crowd Mining 23
Completing the picture (second attempt…) 
How to measure the efficiency of Crowd Mining Algorithms ??? 
• 
Two distinguished cost factors: 
– 
Crowd complexity:# of crowd queries used by the algorithm 
– 
Computational complexity:the complexity of computing the crowd queries and processing the answers 
[Crowd comp. lower bound is a trivial computational comp. lower bound] 
• 
There exists a tradeoffbetween the complexity measures 
– 
Naïve questions selection -> more crowd questions 
Crowd Mining 24
Complexity boundaries 
• 
Notations: 
– 
|Ψ|-the taxonomy size 
– 
|I(Ψ)|-the number of itemsets(modulo equivalences) 
– 
|S(Ψ)| -the number of possible solutions 
– 
Maximal Frequent Itemsets(MFI), Minimal Infrequent Itemsets(MII) 
Crowd Mining 
≤ 
≤ 
25
Now, back to the bigger picture… 
Crowd Mining 
“I’m looking for activities to do in a child-friendly attraction in New York, and a good restaurant near by” 
“You can go bike riding in Central Park and eat at MaozVegetarian. 
Tips:Rent bikes at the boathouse” 
“You can go visit the Bronx Zoo and eat at Pine Restaurant. 
Tips:Order antipasti at Pine. 
Skip dessert and go for ice cream across the street” 
The user’s question in natural language: 
Some of the answers: 
26
Pros and Cons of Existing Solutions 
• 
Web Searchreturns valuable data 
– 
Requires further reading and filtering 
• 
Not all restaurants are appropriate after a sweaty activity 
– 
Can only retrieve data from existing records 
• 
A forum is more likely to produce answers that match the precise information need 
– 
The #of obtained answers is typically small 
– 
Still requires reading, aggregating, identifying consensus… 
• 
Our new, alternative approach: crowd mining! 
Crowd Mining 27
Additional examples 
Crowd Mining 
To answer these questions, one has to combine 
•General, ontological knowledge 
–E.g., the geographical locations of NYC attractions 
•And personal, perhaps unrecorded knowledgeabout people’s habits and preferences 
–E.g., which are the most popular combinations of attractions and restaurants matching Ann’s query 
A dietician may wish to study the culinary preferences in some population, focusing on food dishes that are rich in fiber 
A medical researcher may wish to study the usage of some ingredients in self-treatments of bodily symptoms, which may be related to a particular disease 
28
ID 
Transaction Details 
1 
<Football>doAt<Central_Park> 
2 
<Biking>doAt<Central_Park>. 
<BaseBall>doAt<Central_Park> 
<Rent_Bikes>doAt<Boathouse> 
3 
<Falafel>eatAt<MaozVeg.> 
4 
<Antipasti>eatAt<Pine> 
5 
<Visit>doAt<Bronx_Zoo>. 
<Antipasti>eatAt<Pine>
Crowd Mining Query language (based on SPARQL and DMQL) 
• 
SPARQL-like where clause is evaluated against the ontology. 
– 
$ indicates a variable. 
– 
* used to indicate pathof length 0 or more 
• 
Satisfyingclause are fact sets to be mined from the crowd 
– 
MOREindicates that other frequently co-occuringfacts can also be returned 
– 
SUPPORT indicatesthreshold 
Crowd Mining 
SELECT FACT-SETS 
WHERE 
{$wsubclassOf*Attraction. 
$xinstanceOf$w. 
$xinsideNYC. 
$xhasLabel“child friendly”. 
$ysubclassOf*Activity. 
$zinstanceOfRestaurant. 
$znearBy$x.} 
SATISFYING 
{$y+ doAt$x. 
[] eatAt$z. 
MORE} 
WITH SUPPORT = 0.4 
“Find popular combinations of an activity in a child-friendly attraction in NYC, and a good restaurant near by (and any other relevant advice).” 
30
(Open and Closed) Questions to the crowd 
Crowd Mining 
How often do you eat at MaozVegeterian? 
{ ( [] eatAt<Maoz_Vegeterian> ) } 
How often do you swimin Central Park? 
{ ( <Swim> doAt<Central_Park> ) } 
Supp=0.1 
Supp=0.3 
What do you doin Central Park ? 
{ ( $y doAt<Central_Park>) } 
$y = Football, supp = 0.6 
What else do you do when bikingin Central Park? 
{ (<Biking> doAt<Central_Park>) , 
($y doAt<Central_Park>) } 
$y = Football, supp = 0.6 
31
Example of extracted association rules 
• 
< Ball Games, doAt, Central_Park> (Supp:0.4 Conf:1) 
< [ ] , eatAt, Maoz_Vegeterian> (Supp:0.2 Conf:0.2) 
• 
< visit, doAt, Bronx_Zoo> (Supp: 0.2 Conf: 1) 
< [ ] , eatAt, Pine_Restaurant> (Supp: 0.2 Conf: 0.2) 
Crowd Mining 32
Can we trust the crowd ? 
Crowd Mining 
33 
Common solution: ask multiple times 
We may get different answers 
Legitimate diversity 
Wrong answers/lies 
33
34 
Can we trust the crowd ? 
Crowd Mining 34
Things are non trivial … 
Crowd Mining 
35 
• 
Different experts for different areas 
•“Difficult” questions vs. “simple” questions 
•Data in added and updated all the time 
•Optimal use of resources… (both machines and human) 
Solutions based on 
-Statistical mathematical models 
-Declarative specifications 
-Provenance 
35
Summary 
Crowd Mining 
The crowd is an incredible resource! 
Many challenges: 
•(very) interactive computation 
•A huge amount of data 
•Varying quality and trust 
“Computers are useless, they can only give you answers” 
-Pablo Picasso 
But, as it seems, they can also ask us questions! 
36
Thanks Crowd Mining 
Antoine Amerilli, Yael Amsterdamer, MoriaBen Ner, Rubi Boim, Susan Davidson, OhadGreenshpan, Benoit Gross, Yael Grossman, Ana Itin, Ezra Levin, Ilia Lotosh, SlavaNovgordov, NeoklisPolyzotis, SudeepaRoy, Pierre Senellart, Amit Somech, Wang-ChiewTan… 
EU-FP7 ERC 
MoDaSMob Data Sourcing 
37

Tova Milo on Crowd Mining

  • 1.
  • 2.
    Data Everywhere Theamount and diversity of Data being generated and collected is exploding Web pages, Sensors data, Satellite pictures, DNA sequences, … Crowd Mining 2
  • 3.
    From Data toKnowledge Crowd Mining Buried in this flood of data are the keys to -New economicopportunities-Discoveries in medicine, science and the humanities-Improving productivity & efficiency However, raw data alone is not sufficient!!! We can only make sense of our world by turning this data into knowledge and insight. 3
  • 4.
    The research frontier Crowd Mining • Knowledge representation. •knowledge collection, transformation, integration, sharing. •knowledge discovery. We will focus today on human knowledge Think of humanity and its collective mind expanding… 4
  • 5.
    The engagement ofcrowds of Web users for data procurement 11/6/2014 5 Background - Crowd (Data) sourcing 5 Crowd Mining
  • 6.
    General goals ofcrowdsourcing • Main goal – “Outsourcing” a task (data collection & processing) to a crowd of users • Kinds of tasks – Can be performed by a computer, but inefficiently – Can be performed by a computer, but inaccurately – Tasks that can’t be performed by a computer Crowd Mining So, are we done? 6
  • 7.
    MoDaS(Mob Data Sourcing) • Crowdsourcinghas tons of potential, yet limited triumph… – Difficulty of managing huge volumes of data and users of questionable quality and reliability. • Every single initiative had to battle, almost from scratch, the same non-trivial challenges • The ad hoc solutions, even when successful, are application specific and rarely sharable => Solid scientific foundations for Web-scale data sourcing With focus on knowledge management using the crowd MoDaS EU-FP7 ERC MoDaSMob Data Sourcing 7
  • 8.
    Challenges –(very) briefoverview • What questions to ask? [SIGMOD13, VLDB13,ICDT14, SIGMOD14] • How to define & determine correctness of answers? [ICDE11, WWW12] • Who to ask? how many people? How to best use the resources? [ICDE12, VLDB13, ICDT13, ICDE13] Crowd Mining Data Mining Data Cleaning Probabilistic Data Optimizations and Incremental Computation EU-FP7 ERC MoDaSMob Data Sourcing 8
  • 9.
    A simple example–crowd data sourcing (Qurk) • ? Crowd Mining 9 Picture name Lucy Don Ken … … The goal: Find the names of all the women in the peopletable SELECT name FROM people p WHERE isFemale(p) isFemale(%name, %photo) Question: “Is %name a female?”, %photo Answers: “Yes”/ “No” 9
  • 10.
    A simple example–crowd data sourcing Crowd Mining 10 Picture name Lucy Don Ken … … 10
  • 11.
    Crowd Mining: Crowdsourcingin an open world • Human knowledge forms an open world • Assume we want to find out what is interestingand importantin some domain area Folk medicine, people’s habits, … • What questions to ask? Crowd Mining 11
  • 12.
    Back to classicdatabases... • Significant data patterns are identified using data miningtechniques. • A useful type of pattern: association rules – E.g., stomach ache chamomile • Queries are dynamically constructed in the learning process • Is it possible to mine the crowd? Crowd Mining 12
  • 13.
    Turning to thecrowd Let us model the history of every user as a personal database • Every case = a transactionconsisting of items • Not recorded anywhere –a hidden DB – It is hard for people to recall many details about many transactions! – But … they can often provide summaries, in the form of personal rules “To treat a sore throat I often use garlic” Crowd Mining Treated a sore throatwith garlicand oregano leaves… Treated a sore throatand low feverwith garlicand ginger… Treated a heartburnwith water, baking sodaand lemon… Treatednauseawith ginger, the patient experienced sleepiness… … 13
  • 14.
    Two types ofquestions • Free recollection (mostly simple, prominent patterns) Open questions • Concrete questions (may be more complex) Closed questions We use the two types interleavingly. When a patient has both headaches and fever, how often do you use a willow tree bark infusion? Tell me about an illness and how you treat it “I typically treat nausea with ginger infusion” Crowd Mining 14
  • 15.
    Contributions (at avery high level) • Formal model for crowd mining; allowed questions and the answers interpretation; personal rules and their overall significance. • A Frameworkof the generic components required for mining the crowd • Significance and error estimations. [and, how will this change if we ask more questions…] • Crowd-mining algorithms • [Implementation & benchmark. both synthetic & real data.] Crowd Mining 15
  • 16.
    The model: Usersupport and confidence • A set of usersU • Each user u ∈U has a (hidden!) transaction databaseDu • Each rule X Y is associated with: user support user confidence Crowd Mining 16
  • 17.
    Model for closedand open questions • Closed questions: X ?Y – Answer:(approximate) user support and confidence • Open questions:? ?? – Answer:an arbitrary rule with its user support and confidence “I typically have a headache once a week. In 90% of the times, coffee helps. Crowd Mining 17
  • 18.
    Significant rules • Significant rules: Rules were the meanuser support and confidence are above some specified thresholds Θs,Θc. • Goal: identifying the significant rules while asking the smallest possible number of questionsto the crowd Crowd Mining 18
  • 19.
    Framework components • Generic frameworkfor crowd-mining • One particular choice of implementation of each black boxes Crowd Mining 19
  • 20.
    • Treating thecurrent answers as a random sample of a hidden distribution , we can approximate the distribution of the hidden mean • μ–the sample average • Σ–the sample covariance • K–the number of collected samples • In a similar manner we estimate thehidden distribution 11/6/2014 20 Estimating the mean distribution 20 Crowd Mining
  • 21.
    • Define Mrasthe probability mass above both thresholds for rule r • ris significant if Mris greater than 0.5 • The error prob. is the remaining mass • Estimate how error will change if another question is asked • Choose rule with largest error reduction 11/6/2014 21 Rule Significance and error probability 21 Crowd Mining
  • 22.
    Completing the picture(first attempt…) • Which rules should be considered? Similarly to classic data mining (e.g. Apriori) Start with small rules, then expend to rules similar to significant rules • Should we ask an open or closed question? Similarly to sequential sampling Use some fixed ratio of open/closed questions to balance the tradeoff between precision and recall Crowd Mining 22
  • 23.
    Semantic knowledge cansave work Given a taxonomy of is-a relationships among items, e.g. espresso is a coffee frequent({headache, espresso}) ⇒frequent({headache, coffee}) Advantages • Allows inference on itemsetfrequencies • Allows avoiding semantically equivalent itemsets{espresso}, {espresso, coffee}, {espresso, beverage}… Crowd Mining 23
  • 24.
    Completing the picture(second attempt…) How to measure the efficiency of Crowd Mining Algorithms ??? • Two distinguished cost factors: – Crowd complexity:# of crowd queries used by the algorithm – Computational complexity:the complexity of computing the crowd queries and processing the answers [Crowd comp. lower bound is a trivial computational comp. lower bound] • There exists a tradeoffbetween the complexity measures – Naïve questions selection -> more crowd questions Crowd Mining 24
  • 25.
    Complexity boundaries • Notations: – |Ψ|-the taxonomy size – |I(Ψ)|-the number of itemsets(modulo equivalences) – |S(Ψ)| -the number of possible solutions – Maximal Frequent Itemsets(MFI), Minimal Infrequent Itemsets(MII) Crowd Mining ≤ ≤ 25
  • 26.
    Now, back tothe bigger picture… Crowd Mining “I’m looking for activities to do in a child-friendly attraction in New York, and a good restaurant near by” “You can go bike riding in Central Park and eat at MaozVegetarian. Tips:Rent bikes at the boathouse” “You can go visit the Bronx Zoo and eat at Pine Restaurant. Tips:Order antipasti at Pine. Skip dessert and go for ice cream across the street” The user’s question in natural language: Some of the answers: 26
  • 27.
    Pros and Consof Existing Solutions • Web Searchreturns valuable data – Requires further reading and filtering • Not all restaurants are appropriate after a sweaty activity – Can only retrieve data from existing records • A forum is more likely to produce answers that match the precise information need – The #of obtained answers is typically small – Still requires reading, aggregating, identifying consensus… • Our new, alternative approach: crowd mining! Crowd Mining 27
  • 28.
    Additional examples CrowdMining To answer these questions, one has to combine •General, ontological knowledge –E.g., the geographical locations of NYC attractions •And personal, perhaps unrecorded knowledgeabout people’s habits and preferences –E.g., which are the most popular combinations of attractions and restaurants matching Ann’s query A dietician may wish to study the culinary preferences in some population, focusing on food dishes that are rich in fiber A medical researcher may wish to study the usage of some ingredients in self-treatments of bodily symptoms, which may be related to a particular disease 28
  • 29.
    ID Transaction Details 1 <Football>doAt<Central_Park> 2 <Biking>doAt<Central_Park>. <BaseBall>doAt<Central_Park> <Rent_Bikes>doAt<Boathouse> 3 <Falafel>eatAt<MaozVeg.> 4 <Antipasti>eatAt<Pine> 5 <Visit>doAt<Bronx_Zoo>. <Antipasti>eatAt<Pine>
  • 30.
    Crowd Mining Querylanguage (based on SPARQL and DMQL) • SPARQL-like where clause is evaluated against the ontology. – $ indicates a variable. – * used to indicate pathof length 0 or more • Satisfyingclause are fact sets to be mined from the crowd – MOREindicates that other frequently co-occuringfacts can also be returned – SUPPORT indicatesthreshold Crowd Mining SELECT FACT-SETS WHERE {$wsubclassOf*Attraction. $xinstanceOf$w. $xinsideNYC. $xhasLabel“child friendly”. $ysubclassOf*Activity. $zinstanceOfRestaurant. $znearBy$x.} SATISFYING {$y+ doAt$x. [] eatAt$z. MORE} WITH SUPPORT = 0.4 “Find popular combinations of an activity in a child-friendly attraction in NYC, and a good restaurant near by (and any other relevant advice).” 30
  • 31.
    (Open and Closed)Questions to the crowd Crowd Mining How often do you eat at MaozVegeterian? { ( [] eatAt<Maoz_Vegeterian> ) } How often do you swimin Central Park? { ( <Swim> doAt<Central_Park> ) } Supp=0.1 Supp=0.3 What do you doin Central Park ? { ( $y doAt<Central_Park>) } $y = Football, supp = 0.6 What else do you do when bikingin Central Park? { (<Biking> doAt<Central_Park>) , ($y doAt<Central_Park>) } $y = Football, supp = 0.6 31
  • 32.
    Example of extractedassociation rules • < Ball Games, doAt, Central_Park> (Supp:0.4 Conf:1) < [ ] , eatAt, Maoz_Vegeterian> (Supp:0.2 Conf:0.2) • < visit, doAt, Bronx_Zoo> (Supp: 0.2 Conf: 1) < [ ] , eatAt, Pine_Restaurant> (Supp: 0.2 Conf: 0.2) Crowd Mining 32
  • 33.
    Can we trustthe crowd ? Crowd Mining 33 Common solution: ask multiple times We may get different answers Legitimate diversity Wrong answers/lies 33
  • 34.
    34 Can wetrust the crowd ? Crowd Mining 34
  • 35.
    Things are nontrivial … Crowd Mining 35 • Different experts for different areas •“Difficult” questions vs. “simple” questions •Data in added and updated all the time •Optimal use of resources… (both machines and human) Solutions based on -Statistical mathematical models -Declarative specifications -Provenance 35
  • 36.
    Summary Crowd Mining The crowd is an incredible resource! Many challenges: •(very) interactive computation •A huge amount of data •Varying quality and trust “Computers are useless, they can only give you answers” -Pablo Picasso But, as it seems, they can also ask us questions! 36
  • 37.
    Thanks Crowd Mining Antoine Amerilli, Yael Amsterdamer, MoriaBen Ner, Rubi Boim, Susan Davidson, OhadGreenshpan, Benoit Gross, Yael Grossman, Ana Itin, Ezra Levin, Ilia Lotosh, SlavaNovgordov, NeoklisPolyzotis, SudeepaRoy, Pierre Senellart, Amit Somech, Wang-ChiewTan… EU-FP7 ERC MoDaSMob Data Sourcing 37