An experimental comparison of globally-optimal data de-identification algorithmsarx-deidentifier
Collaboration and data sharing have become core elements of biomedical research. At the same time, there is a growing understanding of privacy threats related to data sharing, especially when sensitive data from distributed sources become available for linkage. Statistical disclosure control comprises well-known data anonymization techniques that allow the protection of data by introducing fuzziness. To protect datasets from different types of threats, different privacy criteria are commonly implemented. Data anonymization is an important measure, but it is computationally complex, and it can significantly reduce the expressiveness of data. To attenuate these problems, a number of algorithms has been proposed, which aim at increasing data quality or improving efficiency. Previous evaluations of such algorithms lack a systematic approach, as they focus on specific algorithms, specific privacy criteria, and specific runtime environments. Therefore, it is difficult for decision makers to decide which algorithm is best suited for their requirements. As a first step towards a comprehensive and systematic evaluation of anonymity algorithms, we report on our ongoing efforts for providing an open source benchmark. In this contribution, we focus on optimal algorithms utilizing global recoding with full-domain generalization. We present a systematic evaluation of domain-specific algorithms and generic search methods for a broad set of privacy criteria, including k-anonymity, l-diversity, t-closeness and d-presence, and their use in multiple real-world datasets. Our results show that there is no single solution fitting all needs, and that generic search methods can outperform highly specialized algorithms.
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataarx-deidentifier
Website with further information: http://arx.deidentifier.org
Description of this talk:
Collaboration and data sharing have become core elements of biomedical research. Especially when sensitive data from distributed sources are linked, privacy threats have to be considered. Statistical disclosure control allows the protection of sensitive data by introducing fuzziness. Reduction of data quality, however, needs to be balanced against gains in protection. Therefore, tools are needed which provide a good overview of the anonymization process to those responsible for data sharing. These tools require graphical interfaces and the use of intuitive and replicable methods. In addition, extensive testing, documentation and openness to reviews by the community are important. Existing publicly available software is limited in functionality, and often active support is lacking. We present the data anonymization tool ARX, which has been developed in close cooperation between the Chair for Biomedical Informatics, the Chair for IT Security and the Chair for Database Systems at Technische Universität München (TUM), Germany. ARX enables the de-identification of structured data (i.e., tabular data) and implements a wide variety of privacy methods in a highly efficient manner. It is extensible, well documented and actively supported. ARX provides an intuitive cross-platform graphical interface and offers a public API for integration with other software systems.
An experimental comparison of globally-optimal data de-identification algorithmsarx-deidentifier
Collaboration and data sharing have become core elements of biomedical research. At the same time, there is a growing understanding of privacy threats related to data sharing, especially when sensitive data from distributed sources become available for linkage. Statistical disclosure control comprises well-known data anonymization techniques that allow the protection of data by introducing fuzziness. To protect datasets from different types of threats, different privacy criteria are commonly implemented. Data anonymization is an important measure, but it is computationally complex, and it can significantly reduce the expressiveness of data. To attenuate these problems, a number of algorithms has been proposed, which aim at increasing data quality or improving efficiency. Previous evaluations of such algorithms lack a systematic approach, as they focus on specific algorithms, specific privacy criteria, and specific runtime environments. Therefore, it is difficult for decision makers to decide which algorithm is best suited for their requirements. As a first step towards a comprehensive and systematic evaluation of anonymity algorithms, we report on our ongoing efforts for providing an open source benchmark. In this contribution, we focus on optimal algorithms utilizing global recoding with full-domain generalization. We present a systematic evaluation of domain-specific algorithms and generic search methods for a broad set of privacy criteria, including k-anonymity, l-diversity, t-closeness and d-presence, and their use in multiple real-world datasets. Our results show that there is no single solution fitting all needs, and that generic search methods can outperform highly specialized algorithms.
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataarx-deidentifier
Website with further information: http://arx.deidentifier.org
Description of this talk:
Collaboration and data sharing have become core elements of biomedical research. Especially when sensitive data from distributed sources are linked, privacy threats have to be considered. Statistical disclosure control allows the protection of sensitive data by introducing fuzziness. Reduction of data quality, however, needs to be balanced against gains in protection. Therefore, tools are needed which provide a good overview of the anonymization process to those responsible for data sharing. These tools require graphical interfaces and the use of intuitive and replicable methods. In addition, extensive testing, documentation and openness to reviews by the community are important. Existing publicly available software is limited in functionality, and often active support is lacking. We present the data anonymization tool ARX, which has been developed in close cooperation between the Chair for Biomedical Informatics, the Chair for IT Security and the Chair for Database Systems at Technische Universität München (TUM), Germany. ARX enables the de-identification of structured data (i.e., tabular data) and implements a wide variety of privacy methods in a highly efficient manner. It is extensible, well documented and actively supported. ARX provides an intuitive cross-platform graphical interface and offers a public API for integration with other software systems.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
Mediated participatory design for contextually aware in vehicle experiencesStavros Tasoudis
Automotive UI 2016, 8th international conference in automotive user interfaces and vehicular applications, work in progress presentation of Stavros Tasoudis.
The Geography of Distance Education Research - Bibliographic Characteristics ...alanwylie
Keynote presentation by Olaf Zawacki-Richter, University of Oldenburg, Germany, Center for Lifelong Learning, Faculty of Educational and Social Sciences for the DEHub/ODLAA Education 2011 to 2021- Global challenges and perspectives of blended and distance learning the (14 to 18 February 2011).
Computational methods for intelligent matchmaking for knowledge workJari Jussila
Computational methods for intelligent matchmaking for knowledge work - Case CMAD. Poster presented at CMADFI, 23 January 2017. Jayesh Prakash Gupta, Jari Jussila, Ekaterina Olshannikova, Karan Menon, Jukka Huhtamäki, Thomas Olsson, Prof. Ravi Vatrapu & Prof. Hannu Kärkkäinen.
The AIRCC's International Journal of Computer Science and Information Technology (IJCSIT) is devoted to fields of Computer Science and Information Systems. The IJCSIT is a peer-reviewed scientific journal published in electronic form as well as print form. The mission of this journal is to publish original contributions in its field in order to propagate knowledge amongst its readers and to be a reference publication.
Mediated participatory design for contextually aware in vehicle experiencesStavros Tasoudis
Automotive UI 2016, 8th international conference in automotive user interfaces and vehicular applications, work in progress presentation of Stavros Tasoudis.
The Geography of Distance Education Research - Bibliographic Characteristics ...alanwylie
Keynote presentation by Olaf Zawacki-Richter, University of Oldenburg, Germany, Center for Lifelong Learning, Faculty of Educational and Social Sciences for the DEHub/ODLAA Education 2011 to 2021- Global challenges and perspectives of blended and distance learning the (14 to 18 February 2011).
Computational methods for intelligent matchmaking for knowledge workJari Jussila
Computational methods for intelligent matchmaking for knowledge work - Case CMAD. Poster presented at CMADFI, 23 January 2017. Jayesh Prakash Gupta, Jari Jussila, Ekaterina Olshannikova, Karan Menon, Jukka Huhtamäki, Thomas Olsson, Prof. Ravi Vatrapu & Prof. Hannu Kärkkäinen.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
insect taxonomy importance systematics and classification
Relevance Clues: Developing an Experimental Design to Examine the Criteria Behind Relevance Judgments
1. RELEVANCE CLUES
Developing an experimental design to examine the criteria
behind relevance judgments
Christiane Behnert
Hamburg University of Applied Sciences, Germany
Department of Information
ISI 2017 – 15th International Symposium on Information Science
14th March, Humboldt-Universität zu Berlin, Germany
2.
3.
4. FACULTY OF DESIGN, MEDIA & INFORMATION
Department of Information
1. Research goal
2. Related research
1. Relevance criteria
2. Document representations
3. Experimental research design
1. Reasons
2. Requirements
4. Next steps
5. References
OUTLINE
4
5. FACULTY OF DESIGN, MEDIA & INFORMATION
Department of Information
What are the clues and criteria
by which users judge
a search result’s relevance?
1. RESEARCH GOAL
5
6. FACULTY OF DESIGN, MEDIA & INFORMATION
Department of Information
2.1. Relevance criteria [Mizzaro, 1997; Saracevic, 2016]
Topicality
Recency
Clarity
Availability
Novelty
…
2. RELATED RESEARCH
6
7. FACULTY OF DESIGN, MEDIA & INFORMATION
Department of Information
2.2. Document representations
Pre-access vs. post-access criteria [Watson, 2014]
– Surrogates vs. full-text documents
Popularity data in academia [Plassmeier et al., 2015]
– Number of citations (e.g., Google Scholar)
– Number of downloads (e.g., ACM Digital library)
– Circulation counts
– Number of copies in a library
– …
2. RELATED RESEARCH
7
8. FACULTY OF DESIGN, MEDIA & INFORMATION
Department of Information
3.1. Reasons
Knowledge about actual behaviour [Kelly, 2009; Kelly & Cresenzi, 2016]
Causal conclusions [Sedlmeier & Renkewitz, 2007]
Stimulus effect
(independent variable dependent variable)
Covariance
Temporal precedence
Exclusion of an alternative explanation
(confounding variables)
3. EXPERIMENTAL RESEARCH DESIGN
8
9. FACULTY OF DESIGN, MEDIA & INFORMATION
Department of Information
3.2. Requirements (1/2)
3. EXPERIMENTAL RESEARCH DESIGN
9
10. FACULTY OF DESIGN, MEDIA & INFORMATION
Department of Information
3.2. Requirements (2/2) [Sedlmeier & Renkewitz, 2007]
Manipulation (potential relevance clues)
Changes in variable expressions
Control (possible confounding variables)
Randomisation
Within-subjects design
Reduces sample size
Multifactorial design
Combination of variables
3. EXPERIMENTAL RESEARCH DESIGN
10
11. FACULTY OF DESIGN, MEDIA & INFORMATION
Department of Information
User model
Hypotheses
Experiments
Results
Adjust user model
4. NEXT RESEARCH STEPS
11
12. FACULTY OF DESIGN, MEDIA & INFORMATION
Department of Information
Kelly, D. (2009). Methods for evaluating interactive information retrieval systems with users. Foundations and
Trends® in Information Retrieval, 3(1—2).
Kelly, D., & Cresenzi, A. (2016). From design to analysis: Conducting controlled laboratory experiments with
users. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in
Information Retrieval - SIGIR ’16 (1207–1210). New York, New York, USA: ACM Press.
Mizzaro, S. (1997). Relevance: The whole history. Journal of the American Society for Information Science,
48(9), 810–832.
Plassmeier, K., Borst, T., Behnert, C., & Lewandowski, D. (2015). Evaluating popularity data for relevance
ranking in library information systems. In Proceedings of the 78th ASIS&T Annual Meeting (Vol. 51). Retrieved
from https://www.asist.org/files/meetings/am15/proceedings/submissions/posters/270poster.pdf
Saracevic, T. (2016). The notion of relevance in Information Science: Everybody knows what relevance is. But,
what is it really? (G. Marchionini, Ed.), Synthesis Lectures on Information Concepts, Retrieval, and Services; 50.
Morgan & Claypool.
Sedlmeier, P., & Renkewitz, F. (2007). Forschungsmethoden und Statistik in der Psychologie. München ;
Boston [u.a.]: Pearson Studium, 123-180.
Watson, C. (2014). An exploratory study of secondary students’ judgments of the relevance and reliability of
information. Journal of the Association for Information Science and Technology, 65(7), 1385–1408.
5. REFERENCES
12
13. Thank you very much!
I appreciate your feedback!
Christiane Behnert, M.A.
Hamburg University of Applied Sciences, Germany
christiane.behnert@haw-hamburg.de
http://searchstudies.org/christiane-behnert/