2. Open science
Open Science
Tools
Open
Reproducible
Research
Doing science supported by scientific digital
infrastructures has changed the way of
doing science
Source: strategyzer.com
Open access
Open Data
Limited access,
data, tools
DIGITAL
SCIENTIFIC
INFRASTRUCTU
RES
3. Current challenge for the Science 2.0 platforms
Scientific platforms help the users on scientific related activities
Source: The Center for Open Science
Different infrastructures rely their impact and operational dynamics diferrent. In
particular, the creation infrastructures are more sensible to the community creation for
the continuity of the developed creations. The lack of engagement increases the
resistance adoption of Open science practices according to a survey from European
Commission (2015).
Digital Scientific output challenges
Traditional Scientific outputs.
E.g. Journal publications
Digital Scientific outputs. E.g.
scientific-oriented programming code,
repositories, etc.
Yields to…
Source: strategyzer.com
Communities of creation
4. What is engagement?
Behavioral
Engagementfactors[1]PlatformCommunity
Cognitive
Emotional
Qualitative Quantitative
Typeof
studies
Cognitive
Emotional
Qualitative
Physical
Log behavior on the
platform
Networks have help to understand human
behaviors in the past, why not here?
[1] Fredricks, J., Blumenfeld, P.C., Paris, A.H.: School engagement: potential
of the concept, state of the evidence. Rev. Educ. Res. 74, 59–109 (2004)
(?)
Engagement
Psychology
PhilosophySociology
Computer
Science
6. Aims
Identify users’ infrastructure life journeys where the engagement increased
above average infrastructure population
Generate a predictive model for new users where the current behavior
indicates the next possible behavior state in order to optimize the
engagement levels
8. Profiling using density clustering method
Big Data Implications on profiling user’s Behavior
In addition to density clustering methods for the detection of profiles in continuous data,
we will use the hierarchical form to expose embedded types of behavior
User type A
User type A mixed
with B User type B
?
Where to
cut?
Users living X number of days
Embedded
types
1. Maria Aristeidou, Eileen Scanlon, Mike Sharples, Profiles of engagement in online communities of citizen science
participation, In Computers in Human Behavior, Volume 74, 2017, Pages 246-256, ttps://doi.org/10.1016/j.chb.2017.04.044.
2. Shirkhorshidi A.S., Aghabozorgi S., Wah T.Y., Herawan T. (2014) Big Data Clustering: A Review. In: Murgante B. et al.
(eds) Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol
8583. Springer, Cham
11. Case of Study
Nanohub
> 5700
Resources
> 500
simulation
tools
> 300,000
Tutorial/
Lecture
Users
In 172
countries
> 1.4 M
visitors
> 16, 000
Simulation
Users
NanoHub represents a novel form of virtual collective science production
14. 1. Cluster tendency Validation
Validation of clustering tendency using Hopkins statistic
Source: Yerpo, Population distribution
Natural dispersion of among communities’ data
Engagement
behavior dataset*
has presented a
strong clustering
tendency
0.01 0.30 0.50 0.70 1.000.99
2-Ddataview
Dataset: > 40,000 registered users in the range between 2010 - 2016
15. 17
PCA Results
Goal: Identify the impact of each proven
metrics from nanohub dataset.
Metrics Percentage of
Variance explained
Standard deviation of absence days
between logins
Discussion
1. The periodicity var value was not making
sense because that meant that the user
engagement is just defined by the
download behavior.
2. The session Creation value was zero
because the number of users who create
simulation in contrast with the number of
users who download is smaller. Therfor,
the influence of download activity was
based on an unbalance proportion
• Conclusion: The current variables are
not presenting sufficiently significant
based on the unbalance dataset.
16. TSE representation of populations
User’s life representation
User Behaviors across populations created by lifespanDistribution of users by lifespan
The current high dimensional representation
of the user’s behavior showed non regular
shapes that supports the usage of density
based clustering algorithms
17. Engagement Behavior has a path
1 2 3
1
2
3
Similarity Matrix Example
populations
Population 2 & 3 tested in Two-sample KS
test and had shown non significant
difference
Younger day population behave
significantly different behavior but
elder populations become similar
Not similar
Strongly
similar
18. Lifespan populations
MERGING ALL THE POPULATIONS THAT ARE SIMILAR BETWEEN THEMSELVES
Not similar
Strongly
similar
Despite the statistical similarity
A = B =C => A=C FALSE
MERGING MULTIPLE
LIFESPANS
GRAPH BASED SOLUTION:
MAXIMAL CLIQUES
A
B
C
E
D
19. Contributions
Evidence of non spherical behavioral
engagement
The effects of unbalance data over time for
profiling analysis
Evidence of similarity among high engaged
populations
20. Next steps
Pattern recognition over time
Day 1 Day 2 Day 4
Snapshot populations
User 1
User 2
User behavior overtime
A
C
B
A
C
B
A
C
B
Network of Shared knowledge interests
A BUsers
Common
interest
represented
by X
resource
Critical behavior for longer
engagement
21. Thanks
Special thank you to …
Dr. Gerhard
Klimeck
Dr. Michael
Zentner
Dr. Sabine
Brunswicker
Dr. Satyam
Mukherjee
National
Science
Foundation
CONACYT Mexico Government
Babak Ravandi
24. Continuous login behavior
Goal: Identify if the habit
concept as a significant
variables for further
analysis.
Q: How many users with
every day logins are
expected to be seen after x
days?
A: 7 is the support value of
max continuous days that
more than 1 user is seen.
25. Correlation matrix
Correlation matrix of variables of interest
Goal: Select the
variables with more
influence across the
set of variables
Results: Not
conclusive yet
26. CriteriaClustering t Hierarchical Density-
based
Non-hierarchical
No apriori # of
clusters
YES NO
Convex clusters YES NO
Dif size of the
clusters
YES Some
Scalabe NO YES
Convergance YES NO
Nested clusters YES NO
Unique cluster YES NO
Noise detection YES Algorithm dependant
Parameters
robustness
NO Algorithm dependant
Non parametric Method selection