Ophir Frieder, who holds the Robert L. McDevitt, K.S.G., K.C.H.S. and Catherine H. McDevitt L.C.H.S. Chair in Computer Science and Information Processing at Georgetown University, gave this talk on Searching in Harsh Environments as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Ophir rebuts the myth that "google has solved search", and discusses the challenges of searching for complex object, through hidden collections, and in harsh environments For more see: http://informatics.mit.edu/blg
Reproducibility from an infomatics perspectiveMicah Altman
Scientific reproducibility is most viewed through a methodological or statistical lens, and increasingly, through a computational lens. Over the last several years, I've taken part in collaborations to that approach reproducibility from the perspective of informatics: as a flow of information across a lifecycle that spans collection, analysis, publication, and reuse.
These slides sketch of this approach, and were presented at a recent workshop on reproducibility at the National Academy of Sciences, and at one our Program on Information Science brown bag talks. See: informatics.mit.edu
Managing Confidential Information – Trends and ApproachesMicah Altman
Personal information is ubiquitous and it is becoming increasingly easy to link information to individuals. Laws, regulations and policies governing information privacy are complex, but most intervene through either access or anonymization at the time of data publication.
Trends in information collection and management -- cloud storage, "big" data, and debates about the right to limit access to published but personal information complicate data management, and make traditional approaches to managing confidential data decreasingly effective.
This session presented as part of the the Program on Information Science seminar series, examines trends information privacy. And the session will also discuss emerging approaches and research around managing confidential research information throughout its lifecycle.
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
In his abstract, Scriffignano summarizes as follows:
l explore some of the ways in which the massive availability of data is changing and the types of questions we must ask in the context of making business decisions. Truth be told, nearly all organizations struggle to make sense out of the mounting data already within the enterprise. At the same time, businesses, individuals, and governments continue to try to outpace one another, often in ways that are informed by newly-available data and technology, but just as often using that data and technology in alarmingly inappropriate or incomplete ways. Multiple “solutions” exist to take data that is poorly understood, promising to derive meaning that is often transient at best. A tremendous amount of “dark” innovation continues in the space of fraud and other bad behavior (e.g. cyber crime, cyber terrorism), highlighting that there are very real risks to taking a fast-follower strategy in making sense out of the ever-increasing amount of data available. Tools and technologies can be very helpful or, as Scriffignano puts it, “they can accelerate the speed with which we hit the wall.” Drawing on unstructured, highly dynamic sources of data, fascinating inference can be derived if we ask the right questions (and maybe use a bit of different math!). This session will cover three main themes: The new normal (how the data around us continues to change), how are we reacting (bringing data science into the room), and the path ahead (creating a mindset in the organization that evolves). Ultimately, what we learn is governed as much by the data available as by the questions we ask. This talk, both relevant and occasionally irreverent, will explore some of the new ways data is being used to expose risk and opportunity and the skills we need to take advantage of a world awash in data.
his talk provides an overview of the changing landscape of information privacy with a focus on the possible consequences of these changes for researchers and research institutions.
Personal information continues to become more available, increasingly easy to link to individuals, and increasingly important for research. New laws, regulations and policies governing information privacy continue to emerge, increasing the complexity of management. Trends in information collection and management — cloud storage, “big” data, and debates about the right to limit access to published but personal information complicate data management, and make traditional approaches to managing confidential data decreasingly effective.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...Micah Altman
Julia Flanders, who is the Director of the Digital Scholarship Group in the Northeastern University Library, and a Professor of Practice in Northeastern's English Department gave a talk on Jobs, Roles, Skills, Tools: Working in the Digital Academy as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Julia discusses the evolving landscape of digital humanities (and digital scholarship more broadly) and considers the relationship between technology, tool development, and professional roles.
For more see: http://informatics.mit.edu/event/brown-bag-jobs-roles-skills-tools-working-digital-academy-julia-flanders
Big Data & Privacy -- Response to White House OSTPMicah Altman
Big data has huge implications for privacy, as summarized in our commentary below:
Both the government and third parties have the potential to collect extensive (sometimes exhaustive), fine grained, continuous, and identifiable records of a person’s location, movement history, associations and interactions with others, behavior, speech, communications, physical and medical conditions, commercial transactions, etc. Such “big data” has the ability to be used in a wide variety of ways, both positive and negative. Examples of potential applications include improving government and organizational transparency and accountability, advancing research and scientific knowledge, enabling businesses to better serve their customers, allowing systematic commercial and non-commercial manipulation, fostering pervasive discrimination, and surveilling public and private spheres.
On January 23, 2014, President Obama asked John Podesta to develop in 90 days, a 'comprehensive review' on big data and privacy.
This lead to a series of workshop on big data and technology at MIT, and on social cultural & ethical dimensions at NYU, with a third planned to discuss legal issues at Berkeley. A number of colleagues from our Privacy Tools for Research project and from the BigData@CSAIL projects have contributed to these workshops and raised many thoughtful issues (and the workshop sessions are online and well worth watching).
My colleagues at the Berkman Center, David O'Brien, Alexandra Woods, Salil Vadhan and I have submitted responses to these questions that outline a broad, comprehensive, and systematic framework for analyzing these types of questions and taxonomize a variety of modern technological, statistical, and cryptographic approaches to simultaneously providing privacy and utility. This comment is made on behalf of the Privacy Tools for Research Project, of which we are a part, and has benefitted from extensive commentary by the other project collaborators.
Comments to FTC on Mobile Data PrivacyMicah Altman
FTC has been hosting a series of seminars on consumer privacy, on which it has requested comments. The most recent seminar explored privacy issues related to mobile device tracking. As the seminar summary points out ...
In most cases, this tracking is invisible to consumers and occurs with no consumer interaction. As a result, the use of these technologies raises a number of potential privacy concerns and questions.
The presentations raised an interesting and important combination of questions about how to promote business and economic innovation while protecting individual privacy. I have submitted a comment on these changes with some proposed recommendations.
To summarize (quoting from the submitted the comment):
Knowledge of an individual’s location history and associations with others has the potential to be used in a wide variety of harmful ways. ... [Furthermore], since all physical activity has a unique spatial and temporal context, location history provides a linchpin for integrating multiple sources of data that may describe an individual. Moreover, locational traces are difficult or impossible to render non-identifiable using traditional masking methods.
Reproducibility from an infomatics perspectiveMicah Altman
Scientific reproducibility is most viewed through a methodological or statistical lens, and increasingly, through a computational lens. Over the last several years, I've taken part in collaborations to that approach reproducibility from the perspective of informatics: as a flow of information across a lifecycle that spans collection, analysis, publication, and reuse.
These slides sketch of this approach, and were presented at a recent workshop on reproducibility at the National Academy of Sciences, and at one our Program on Information Science brown bag talks. See: informatics.mit.edu
Managing Confidential Information – Trends and ApproachesMicah Altman
Personal information is ubiquitous and it is becoming increasingly easy to link information to individuals. Laws, regulations and policies governing information privacy are complex, but most intervene through either access or anonymization at the time of data publication.
Trends in information collection and management -- cloud storage, "big" data, and debates about the right to limit access to published but personal information complicate data management, and make traditional approaches to managing confidential data decreasingly effective.
This session presented as part of the the Program on Information Science seminar series, examines trends information privacy. And the session will also discuss emerging approaches and research around managing confidential research information throughout its lifecycle.
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
In his abstract, Scriffignano summarizes as follows:
l explore some of the ways in which the massive availability of data is changing and the types of questions we must ask in the context of making business decisions. Truth be told, nearly all organizations struggle to make sense out of the mounting data already within the enterprise. At the same time, businesses, individuals, and governments continue to try to outpace one another, often in ways that are informed by newly-available data and technology, but just as often using that data and technology in alarmingly inappropriate or incomplete ways. Multiple “solutions” exist to take data that is poorly understood, promising to derive meaning that is often transient at best. A tremendous amount of “dark” innovation continues in the space of fraud and other bad behavior (e.g. cyber crime, cyber terrorism), highlighting that there are very real risks to taking a fast-follower strategy in making sense out of the ever-increasing amount of data available. Tools and technologies can be very helpful or, as Scriffignano puts it, “they can accelerate the speed with which we hit the wall.” Drawing on unstructured, highly dynamic sources of data, fascinating inference can be derived if we ask the right questions (and maybe use a bit of different math!). This session will cover three main themes: The new normal (how the data around us continues to change), how are we reacting (bringing data science into the room), and the path ahead (creating a mindset in the organization that evolves). Ultimately, what we learn is governed as much by the data available as by the questions we ask. This talk, both relevant and occasionally irreverent, will explore some of the new ways data is being used to expose risk and opportunity and the skills we need to take advantage of a world awash in data.
his talk provides an overview of the changing landscape of information privacy with a focus on the possible consequences of these changes for researchers and research institutions.
Personal information continues to become more available, increasingly easy to link to individuals, and increasingly important for research. New laws, regulations and policies governing information privacy continue to emerge, increasing the complexity of management. Trends in information collection and management — cloud storage, “big” data, and debates about the right to limit access to published but personal information complicate data management, and make traditional approaches to managing confidential data decreasingly effective.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...Micah Altman
Julia Flanders, who is the Director of the Digital Scholarship Group in the Northeastern University Library, and a Professor of Practice in Northeastern's English Department gave a talk on Jobs, Roles, Skills, Tools: Working in the Digital Academy as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Julia discusses the evolving landscape of digital humanities (and digital scholarship more broadly) and considers the relationship between technology, tool development, and professional roles.
For more see: http://informatics.mit.edu/event/brown-bag-jobs-roles-skills-tools-working-digital-academy-julia-flanders
Big Data & Privacy -- Response to White House OSTPMicah Altman
Big data has huge implications for privacy, as summarized in our commentary below:
Both the government and third parties have the potential to collect extensive (sometimes exhaustive), fine grained, continuous, and identifiable records of a person’s location, movement history, associations and interactions with others, behavior, speech, communications, physical and medical conditions, commercial transactions, etc. Such “big data” has the ability to be used in a wide variety of ways, both positive and negative. Examples of potential applications include improving government and organizational transparency and accountability, advancing research and scientific knowledge, enabling businesses to better serve their customers, allowing systematic commercial and non-commercial manipulation, fostering pervasive discrimination, and surveilling public and private spheres.
On January 23, 2014, President Obama asked John Podesta to develop in 90 days, a 'comprehensive review' on big data and privacy.
This lead to a series of workshop on big data and technology at MIT, and on social cultural & ethical dimensions at NYU, with a third planned to discuss legal issues at Berkeley. A number of colleagues from our Privacy Tools for Research project and from the BigData@CSAIL projects have contributed to these workshops and raised many thoughtful issues (and the workshop sessions are online and well worth watching).
My colleagues at the Berkman Center, David O'Brien, Alexandra Woods, Salil Vadhan and I have submitted responses to these questions that outline a broad, comprehensive, and systematic framework for analyzing these types of questions and taxonomize a variety of modern technological, statistical, and cryptographic approaches to simultaneously providing privacy and utility. This comment is made on behalf of the Privacy Tools for Research Project, of which we are a part, and has benefitted from extensive commentary by the other project collaborators.
Comments to FTC on Mobile Data PrivacyMicah Altman
FTC has been hosting a series of seminars on consumer privacy, on which it has requested comments. The most recent seminar explored privacy issues related to mobile device tracking. As the seminar summary points out ...
In most cases, this tracking is invisible to consumers and occurs with no consumer interaction. As a result, the use of these technologies raises a number of potential privacy concerns and questions.
The presentations raised an interesting and important combination of questions about how to promote business and economic innovation while protecting individual privacy. I have submitted a comment on these changes with some proposed recommendations.
To summarize (quoting from the submitted the comment):
Knowledge of an individual’s location history and associations with others has the potential to be used in a wide variety of harmful ways. ... [Furthermore], since all physical activity has a unique spatial and temporal context, location history provides a linchpin for integrating multiple sources of data that may describe an individual. Moreover, locational traces are difficult or impossible to render non-identifiable using traditional masking methods.
"Reproducibility from the Informatics Perspective"Micah Altman
Dr. Altman will provide expert comment on the need for informatics modeling as part of the National Academies workshop: Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results
This workshop focuses on the topic of addressing statistical challenges in assessing and fostering the reproducibility of scientific results by examining three issues from a statistical perspective: the extent of reproducibility, the causes of reproducibility failures, and potential remedies.
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
This talk reviews emerging big data sources for social scientific analysis and explores the challenges these present. Many of these sources pose distinct challenges for acquisition, processing, analysis, inference, sharing, and preservation.
Dr Micah Altman is Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, at the Massachusetts Institute of Technology. Dr. Altman is also a Non-Resident Senior Fellow at The Brookings Institution. Prior to arriving at MIT, Dr. Altman served at Harvard University for fifteen years as the Associate Director of the Harvard-MIT Data Center, Archival Director of the Henry A. Murray Archive, and Senior Research Scientist in the Institute for Quantitative Social Sciences.
Dr. Altman conducts research in social science, information science and research methods -- focusing on the intersections of information, technology, privacy, and politics; and on the dissemination, preservation, reliability and governance of scientific knowledge.
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...Micah Altman
In his talk for the MIT Libraries Program on Information Science, Steve Griffin discusses how how research libraries can play a key and expanded role in enabling digital scholarship and creating the supporting activities that sustain it.
This presentation was provided by John Wilbanks of Sage Bionetworks, during the NISO Symposium, Privacy Implications of Research Data held on September 11, 2016 in conjunction with International Data Week in Denver, Colorado
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...Micah Altman
This class focuses on the tools and good practices for storing confidential data, sharing data for collaboration, and publishing data or derivative results for broad use. Topics covered in this class include: an overview of information security standards and frameworks; information security core practices (credentials, authentication, authorization, and auditing); information partitioning and secure linking; file, disk, and network encryption tools and practices; cloud storage practices for confidential information; data “de-identification” tools and practices; statistical disclosure limitation approaches and tools; and data use agreements.
Cottbus Brandenburg University of Technology Lecture series on Smart RegionsCritically Assembling Data, Processes & Things: Toward and Open Smart CityJune 5, 2018
This lecture will critically focus on smart cities from a data based socio-technological assemblage approach. It is a theoretical and methodological framework that allows for an empirical examination of how smart cities are socially and technically constructed, and to study them as discursive regimes and as a large technological infrastructural systems.
The lecture will refer to the research outcomes of the ERC funded Programmable City Project led by Rob Kitchin at Maynooth University and will feature examples of empirical research conducted in Dublin and other Irish cities.
In addition, the lecture will discuss the research outcomes of the Canadian Open Smart Cities project funded by the Government of Canada GeoConnections Program. Examples will be drawn from five case studies namely about the cities of Edmonton, Guelph, Ottawa and Montreal, and the Ontario Smart Grid as well as number of international best practices. The recent Infrastructure Canada Canadian Smart City Challenge and the controversial Sidewalk Lab Waterfront Toronto project will also be discussed.
It will be argued that no two smart cities are alike although the technological solutionist and networked urbanist approaches dominate and it is suggested that these kind of smart cities may not live up to the promise of being better places to live.
In this lecture, the ideals of an Open Smart City are offered instead and in this kind of city residents, civil society, academics, and the private sector collaborate with public officials to mobilize data and technologies when warranted in an ethical, accountable and transparent way in order to govern the city as a fair, viable and livable commons that balances economic development, social progress and environmental responsibility. Although an Open Smart City does not yet exist, it will be argued that it is possible.
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
In the work included below, and presented at the Simons Institute, we describe work-in progress that aims to align emerging methods of data protections with research uses.
Infrastructure and practices for data citation have made substantial progress over the last decade. This increases the potential rewards for data publication and reproducible science, however overall incentives remain relatively weak.
authorsNote: This summarizes a presentation given at the *National Academies of Sciences* as part of [Data Citation Workshop: Developing Policy And Practice*](http://sites.nationalacademies.org/pga/brdi/index.htm) .
La comparsa, pochi decenni fa, di Internet e della connettività globale ha dato origine ad un fenomeno assolutamente nuovo: un accumulo di enormi quantità di dati conservati in banche digitali, la cui quantità raddoppia ogni pochi giorni e in prospettiva ogni poche ore. E’ la realtà dei Big Data, di cui molto si parla e discute, sovente con toni entusiastici. Ma Big Data vuol dire anche problemi di utilizzo, di interpretazione e rischi di distorsioni. Se questo è rilevante per i dati che hanno un valore economico, l’accumulo di informazione e il come viene trattata ha risvolti altrettanto rilevanti sulla formazione di conoscenza.
Per affrontare queste sfide, cruciali sono il rapporto fra etica e scienza, l’analisi critica su come i dati vengono prodotti e proposti, e il coinvolgimento di tutti i soggetti sociali chiamati in causa.
12 settembre 2019 | Torino, Polo del '900
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
Libraries enable patrons to access a wide range of information, but much of the access to this information is now directly managedy publishers. This has lead to a significant gap across library values, patrons perception of privacy, and effective privacy protection for access to digital resources.
In the work included below, and presented at NERCOMP 2019, we review privacy principles based on ALA, IFLA, and NISO policies. We then organizing and comparing high level privacy protections required by ALA checklist, NISO, and GDPR. This framework of principles and controls is then used to score the privacy policies and practices of major vendors of research library content. We evaluate each element of the vendors privacy policy, and use instrumented browsers to identify the types of tracking mechanisms used by different vendors. We use this set of privacy scores to support analyses of change over time, and of potential gaps between patron expectations and privacy policies and practices.
Conference of Irish Geographies 2018
The Earth as Our Home
Automating Homelessness May 12, 2018
The research for these studies is funded by a European Research Council Advanced Investigator award ERC-2012-AdG-323636-SOFTCITY.
Data science remains a high-touch activity, especially in life, physical, and social sciences. Data management and manipulation tasks consume too much bandwidth: Specialized tools and technologies are difficult to use together, issues of scale persist despite the Cambrian explosion of big data systems, and public data sources (including the scientific literature itself) suffer curation and quality problems.
Together, these problems motivate a research agenda around “human-data interaction:” understanding and optimizing how people use and share quantitative information.
I’ll describe some of our ongoing work in this area at the University of Washington eScience Institute.
In the context of the Myria project, we're building a big data "polystore" system that can hide the idiosyncrasies of specialized systems behind a common interface without sacrificing performance. In scientific data curation, we are automatically correcting metadata errors in public data repositories with cooperative machine learning approaches. In the Viziometrics project, we are mining patterns of visual information in the scientific literature using machine vision, machine learning, and graph analytics. In the VizDeck and Voyager projects, we are developing automatic visualization recommendation techniques. In graph analytics, we are working on parallelizing best-of-breed graph clustering algorithms to handle multi-billion-edge graphs.
The common thread in these projects is the goal of democratizing data science techniques, especially in the sciences.
Writing Analytics for Epistemic Features of Student Writing #icls2016 talkSimon Knight
Talk presented at #ICLS2016 presented in Singapore. I discuss levels of description as sites of epistemic cognition focusing on writing and use of textual features to associate rubric scores with epistemic cognition.
My thanks to my collaborators (listed on the paper) particularly Laura Allen, who also generously let me adapt the later slides on NLP studies of writing.
Abstract: Literacy, encompassing the ability to produce written outputs from the reading of multiple sources, is a key learning goal. Selecting information, and evaluating and integrating claims from potentially competing documents is a complex literacy task. Prior research exploring differing behaviours and their association to constructs such as epistemic cognition has used ‘multiple document processing’ (MDP) tasks. Using this model, 270 paired participants, wrote a review of a document. Reports were assessed using a rubric associated with features of complex literacy behaviours. This paper focuses on the conceptual and empirical associations between those rubric-marks and textual features of the reports on a set of natural language processing (NLP) indicators. Findings indicate the potential of NLP indicators for providing feedback regarding the writing of such outputs, demonstrating clear relationships both across rubric facets and between rubric facets and specific NLP indicators.
This discussion, covened by the Dubai Future Foundation, focusses on identifying the significance of the concept of well-being for social-science and policy; and the opportunities to measure it at scale.
Program on Information Science Brown Bag:David Weinberger on Libraries as Pla...Micah Altman
David Weinberger, who is a Shorenstein Fellow at Harvard University, and former co-director of the Harvard Library labs, presented a talk on Libraries as Platforms: Enabling Libraries to Become Community Centers of Meaning part of the Program on Information Science Brown Bag Series.
"Reproducibility from the Informatics Perspective"Micah Altman
Dr. Altman will provide expert comment on the need for informatics modeling as part of the National Academies workshop: Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results
This workshop focuses on the topic of addressing statistical challenges in assessing and fostering the reproducibility of scientific results by examining three issues from a statistical perspective: the extent of reproducibility, the causes of reproducibility failures, and potential remedies.
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
This talk reviews emerging big data sources for social scientific analysis and explores the challenges these present. Many of these sources pose distinct challenges for acquisition, processing, analysis, inference, sharing, and preservation.
Dr Micah Altman is Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, at the Massachusetts Institute of Technology. Dr. Altman is also a Non-Resident Senior Fellow at The Brookings Institution. Prior to arriving at MIT, Dr. Altman served at Harvard University for fifteen years as the Associate Director of the Harvard-MIT Data Center, Archival Director of the Henry A. Murray Archive, and Senior Research Scientist in the Institute for Quantitative Social Sciences.
Dr. Altman conducts research in social science, information science and research methods -- focusing on the intersections of information, technology, privacy, and politics; and on the dissemination, preservation, reliability and governance of scientific knowledge.
Brown Bag: New Models of Scholarly Communication for Digital Scholarship, by ...Micah Altman
In his talk for the MIT Libraries Program on Information Science, Steve Griffin discusses how how research libraries can play a key and expanded role in enabling digital scholarship and creating the supporting activities that sustain it.
This presentation was provided by John Wilbanks of Sage Bionetworks, during the NISO Symposium, Privacy Implications of Research Data held on September 11, 2016 in conjunction with International Data Week in Denver, Colorado
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...Micah Altman
This class focuses on the tools and good practices for storing confidential data, sharing data for collaboration, and publishing data or derivative results for broad use. Topics covered in this class include: an overview of information security standards and frameworks; information security core practices (credentials, authentication, authorization, and auditing); information partitioning and secure linking; file, disk, and network encryption tools and practices; cloud storage practices for confidential information; data “de-identification” tools and practices; statistical disclosure limitation approaches and tools; and data use agreements.
Cottbus Brandenburg University of Technology Lecture series on Smart RegionsCritically Assembling Data, Processes & Things: Toward and Open Smart CityJune 5, 2018
This lecture will critically focus on smart cities from a data based socio-technological assemblage approach. It is a theoretical and methodological framework that allows for an empirical examination of how smart cities are socially and technically constructed, and to study them as discursive regimes and as a large technological infrastructural systems.
The lecture will refer to the research outcomes of the ERC funded Programmable City Project led by Rob Kitchin at Maynooth University and will feature examples of empirical research conducted in Dublin and other Irish cities.
In addition, the lecture will discuss the research outcomes of the Canadian Open Smart Cities project funded by the Government of Canada GeoConnections Program. Examples will be drawn from five case studies namely about the cities of Edmonton, Guelph, Ottawa and Montreal, and the Ontario Smart Grid as well as number of international best practices. The recent Infrastructure Canada Canadian Smart City Challenge and the controversial Sidewalk Lab Waterfront Toronto project will also be discussed.
It will be argued that no two smart cities are alike although the technological solutionist and networked urbanist approaches dominate and it is suggested that these kind of smart cities may not live up to the promise of being better places to live.
In this lecture, the ideals of an Open Smart City are offered instead and in this kind of city residents, civil society, academics, and the private sector collaborate with public officials to mobilize data and technologies when warranted in an ethical, accountable and transparent way in order to govern the city as a fair, viable and livable commons that balances economic development, social progress and environmental responsibility. Although an Open Smart City does not yet exist, it will be argued that it is possible.
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
In the work included below, and presented at the Simons Institute, we describe work-in progress that aims to align emerging methods of data protections with research uses.
Infrastructure and practices for data citation have made substantial progress over the last decade. This increases the potential rewards for data publication and reproducible science, however overall incentives remain relatively weak.
authorsNote: This summarizes a presentation given at the *National Academies of Sciences* as part of [Data Citation Workshop: Developing Policy And Practice*](http://sites.nationalacademies.org/pga/brdi/index.htm) .
La comparsa, pochi decenni fa, di Internet e della connettività globale ha dato origine ad un fenomeno assolutamente nuovo: un accumulo di enormi quantità di dati conservati in banche digitali, la cui quantità raddoppia ogni pochi giorni e in prospettiva ogni poche ore. E’ la realtà dei Big Data, di cui molto si parla e discute, sovente con toni entusiastici. Ma Big Data vuol dire anche problemi di utilizzo, di interpretazione e rischi di distorsioni. Se questo è rilevante per i dati che hanno un valore economico, l’accumulo di informazione e il come viene trattata ha risvolti altrettanto rilevanti sulla formazione di conoscenza.
Per affrontare queste sfide, cruciali sono il rapporto fra etica e scienza, l’analisi critica su come i dati vengono prodotti e proposti, e il coinvolgimento di tutti i soggetti sociali chiamati in causa.
12 settembre 2019 | Torino, Polo del '900
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
Libraries enable patrons to access a wide range of information, but much of the access to this information is now directly managedy publishers. This has lead to a significant gap across library values, patrons perception of privacy, and effective privacy protection for access to digital resources.
In the work included below, and presented at NERCOMP 2019, we review privacy principles based on ALA, IFLA, and NISO policies. We then organizing and comparing high level privacy protections required by ALA checklist, NISO, and GDPR. This framework of principles and controls is then used to score the privacy policies and practices of major vendors of research library content. We evaluate each element of the vendors privacy policy, and use instrumented browsers to identify the types of tracking mechanisms used by different vendors. We use this set of privacy scores to support analyses of change over time, and of potential gaps between patron expectations and privacy policies and practices.
Conference of Irish Geographies 2018
The Earth as Our Home
Automating Homelessness May 12, 2018
The research for these studies is funded by a European Research Council Advanced Investigator award ERC-2012-AdG-323636-SOFTCITY.
Data science remains a high-touch activity, especially in life, physical, and social sciences. Data management and manipulation tasks consume too much bandwidth: Specialized tools and technologies are difficult to use together, issues of scale persist despite the Cambrian explosion of big data systems, and public data sources (including the scientific literature itself) suffer curation and quality problems.
Together, these problems motivate a research agenda around “human-data interaction:” understanding and optimizing how people use and share quantitative information.
I’ll describe some of our ongoing work in this area at the University of Washington eScience Institute.
In the context of the Myria project, we're building a big data "polystore" system that can hide the idiosyncrasies of specialized systems behind a common interface without sacrificing performance. In scientific data curation, we are automatically correcting metadata errors in public data repositories with cooperative machine learning approaches. In the Viziometrics project, we are mining patterns of visual information in the scientific literature using machine vision, machine learning, and graph analytics. In the VizDeck and Voyager projects, we are developing automatic visualization recommendation techniques. In graph analytics, we are working on parallelizing best-of-breed graph clustering algorithms to handle multi-billion-edge graphs.
The common thread in these projects is the goal of democratizing data science techniques, especially in the sciences.
Writing Analytics for Epistemic Features of Student Writing #icls2016 talkSimon Knight
Talk presented at #ICLS2016 presented in Singapore. I discuss levels of description as sites of epistemic cognition focusing on writing and use of textual features to associate rubric scores with epistemic cognition.
My thanks to my collaborators (listed on the paper) particularly Laura Allen, who also generously let me adapt the later slides on NLP studies of writing.
Abstract: Literacy, encompassing the ability to produce written outputs from the reading of multiple sources, is a key learning goal. Selecting information, and evaluating and integrating claims from potentially competing documents is a complex literacy task. Prior research exploring differing behaviours and their association to constructs such as epistemic cognition has used ‘multiple document processing’ (MDP) tasks. Using this model, 270 paired participants, wrote a review of a document. Reports were assessed using a rubric associated with features of complex literacy behaviours. This paper focuses on the conceptual and empirical associations between those rubric-marks and textual features of the reports on a set of natural language processing (NLP) indicators. Findings indicate the potential of NLP indicators for providing feedback regarding the writing of such outputs, demonstrating clear relationships both across rubric facets and between rubric facets and specific NLP indicators.
This discussion, covened by the Dubai Future Foundation, focusses on identifying the significance of the concept of well-being for social-science and policy; and the opportunities to measure it at scale.
Program on Information Science Brown Bag:David Weinberger on Libraries as Pla...Micah Altman
David Weinberger, who is a Shorenstein Fellow at Harvard University, and former co-director of the Harvard Library labs, presented a talk on Libraries as Platforms: Enabling Libraries to Become Community Centers of Meaning part of the Program on Information Science Brown Bag Series.
BROWN BAG TALK WITH CHAOQUN NI- TRANSFORMATIVE INTERACTIONS IN THE SCIENTIFIC...Micah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
A competitive scientific workforce is essential for the health and well-being of a society. However, U.S. dominance in the global knowledge economy has been challenged in recent years: the U.S. is outspent by China (in terms of R&D funding) and out-produced by the EU (in terms of doctoral graduates and scientific publications). Furthermore, gender inequalities persist, with men producing more scientific articles than women in every state.
From Dr. Ni, "I argue that, for a country to be scientifically competitive, it must maximize its human intellectual capital-base and support this workforce equitably and efficiently. I propose here a large-scale and heterogeneous analysis of the sociality, equality, and dynamicity of the scientific workforce through novel computational models for understanding and predicting the career trajectory of scientists based on their transformative interactions, gender, and levels of funding. This analysis will be able to isolate factors that contribute to the health and well-being of the scientific workforce. The computational models will quantify the impact of those transformative events and interactions and provide models to predict the career trajectory of scientists based on their gender, the size and position of the social network, and other demographic factors."
Chaoqun Ni got her Bachelor’s and Master’s degree in E-Commerce and Information System from Wuhan University, and Doctoral Degree in Information Science from Indiana University in Bloomington.
Chaoqun Ni's research has appeared in a variety of computer science, informatics, library, and scientific publications, including Nature, Scientometrics, Journal of Association for Information Science and Technology, and Simmons SLIS' Library and Information Science Research. In addition to receiving a Dean's Fellowship from the Department of Information & Library Science at Indiana University Bloomington, Ni received the Association for Information Science and Technology's New Leader Award in 2011, and the Association for Library and Information Science Education Doctoral Student Award in 2014.
BROWN BAG: THE VISUAL COMPONENT: MORE THAN PRETTY PICTURES - WITH FELICE FRANKELMicah Altman
Visual representation of all kinds are becoming more important in our ever growing image-based society, especially in science and technology. Yet there has been little emphasis on developing standards in creating or critiquing those representations. We must begin to consider images as more than tangential components of information and find ways to seamlessly search for accurate and honest depictions of complex scientific phenomena. I will discuss a few ideas to that end and show my own process of making visual representations in sciences and engineering. I will also make the case that representations are just as "intellectual" as text.
About the discussant:
Science photographer Felice Frankel is a research scientist in the Center for Materials Science and Engineering at the Massachusetts Institute of Technology with additional support from Chemical Engineering, Materials Science and Engineering, and Mechanical Engineering.
She is a fellow of the American Association for the Advancement of Science, a Guggenheim Fellow, and was a Senior Research Fellow in Harvard University’s Faculty of Arts and Sciences and a Visiting Scholar at Harvard Medical School’s Department of Systems Biology.
She most recently developed and instructed the first online MOOC addressing science and engineering photography. Click the following link to access 31 tutorials and supplemental material: “Making Science and Engineering Pictures, A Practical Guide to Presenting Your Work.” (course 0.111x)
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social, and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
Can computers be feminist? Program on Information Science Talk by Gillian SmithMicah Altman
For more on this talk see: informatic.mit.edu
Gillian Smith who an Assistant Professor in Art+Design and Computer Science at Northeastern University, gave this talk entitled Can Computers be Feminist? Procedural Politics and Computational Creativity, as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated through the slides below, Gillian presented a perspective on computing viewed as a co-creation of developer, algorithm, data, and user. And the talk argued that developers embed, through the selection of algorithm and data, specific ethical and epistemic commitments within the software they produce.
Gary Price, MIT Program on Information ScienceMicah Altman
Gary Price, who is chief editor of InfoDocket, contributing editor of Search Engine Land, co-founder of Full Text Reports and who has worked with internet search firms and library systems developers alike, gave this talk on Issues in Curating the Open Web at Scale as part of the Program on Information Science Brown Bag Series.
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
Rebecca Kennison, who is the Principal of K|N Consultants, the co-founder of the Open Access Network; and was was the founding director of the Center for Digital Research and Scholarship, gave this talk on Come Together Right Now: An Introduction To The Open Access Network as part of the Program on Information Science Brown Bag Series.
Inform- interacting with a dynamic shape displayHari Teja Joshi
ABSTRACT
Past research on shape displays has primarily focused on rendering content and user interface elements through shape output, with less emphasis on dynamically changing UIs. We propose utilizing shape displays in three different ways to mediate interaction: to facilitate by providing dynamic physical affordances through shape change, to restrict by guiding users with dynamic physical constraints, and to manipulate by actuating physical objects. We outline potential interaction techniques and introduce Dynamic Physical Affordances and Constraints with our inFORM system, built on top of a state-of-the-art shape display, which provides for variable stiffness rendering and real-time user input through direct touch and tangible interaction. A set of motivating examples demonstrates how dynamic affordances, constraints and object actuation can create novel interaction possibilities.
Test driven cloud development using Oracle SOA CS and Oracle Developer CSSven Bernhardt
Slides of Oracle Open World presentation about Test-driven cloud development using Oracle SOA CS and Oracle Developer CS by Danilo Schmiedel and Sven Bernhardt.
Abstract:
Automated tests are key for quality assurance and for ensuring business agility from a long-term perspective. That is especially important in complex integration projects if you develop your integrations on-premises or in the cloud. If a hybrid strategy is used, it is important to have a consistent testing approach for cloud and on-premises. In this session learn how to implement a consistent approach based on Oracle SOA Cloud Service that works on-premises and in the cloud. See how this approach can test BPEL, BPMN, SB, Java, human tasks, XSLT, and XQuery across all relevant test layers (elementary unit tests, component tests, end-to-end tests) consistently.
Brown Bag: DMCA §1201 and Video Game Preservation Institutions: A Case Study ...Micah Altman
Kendra Albert, who has served as research associate at the Harvard Law School; as an intern at the Electronic Frontiers Foundation; as a fellow at the Berkman Center for Internet & Society; and is now completing her J.D. at Harvard Law, presented this talk as part of the Program on Information Science Brown Bag Series.
Kendra brings a fresh perspective developed through collaborating with librarians and archivists on projects such as as perma.cc, EFF's response to DMCA 1201, and our PrivacyTools project.
In her talk, Kendra discusses the intersection of law, librarianship and advocacy, focuses on the following question:
Archival institutions and libraries are often on the front lines of battles over ownership of digital content and the legality of ensuring copies are preserved. How can institutions devoted to preservation use their expertise to advocate for users?
Keynote for the initial PyCon AU, 26 June 2010 at the Sydney Masonic Center. This is the grand unveiling of the Plexus project - plexus.relationalspace.org.
The eXtensible Markup Language (XML) is not a language itself, but rather a meta-language used to create markup languages to suit whatever purpose you may have. In this session you will learn the basic rules of XML and the philosophy behind it. You will also be introduced to the basics of the popular XML editor, oxygen.
II Konferencja Naukowa : Nauka o informacji (informacja naukowa) w okresie zmian, Warszawa, 15-16.04.2013 r. Instytut Informacji Naukowej i Studiów Bibliologicznych, Uniwersytet Warszawski
The 2nd Scientific Conference : Information Science in an Age of Change, April 15-16, 2013. Institute of Information and Book Studies, University of Warsaw
On Tuesday 18 September 2007, Ben Shneiderman gave a talk at the Centre for HCI Design, City University London, on the topic of information visualisation for high-dimensional spaces. Over 100 people from industry and academia attended the talk.
http://hcid.soi.cty.ac.uk/
Big Data LDN 2018: PROMISE AND PITFALLS OF TEXT ANALYTICSMatt Stubbs
Date: 13th November 2018
Location: AI Lab Theatre
Time: 13:10 - 13:40
Speaker: Normand Peladeau
Organisation: Provalis Research
About: Over the last 10 years text analytics has become quite popular witnessed by the numerous offerings from commercial companies and open source libraries, for automatic information extraction, sentiment analysis, relation extraction, to name a few applications. Many of these products make bold claims about their high accuracy and impressive ability to tackle the most difficult challenges in the analysis of human language (polysemy, entity resolution, sarcasm, etc.). Their use of buzz words like AI, NLP, deep semantic, gives them an aura of scientific credibility, yet users who dare to look closely are often disappointed by the performance. In this presentation, we will discuss why human language represents such a challenge for data analysts. We will look inside the black box of some text analytics techniques to get a better understanding of the main challenges that still need to be solved. We will also illustrate some successful applications to help the audience appreciate the true value text analytics can offer. We will go behind the curtain to show you what is questionable so that you can establish realistic expectations and appreciate the real power and potential of text analytics.
“Big data” in human services organisations: Practical problems and ethical di...husITa
“Big data” initiatives that aim to bring together and mine data from multiple databases across government and non-government agencies promise new insights into human service delivery. Specifically they aim to provide information about what services are being used, how, by whom and with what outcome. However, the process of achieving such insights poses both practical problems and ethical dilemmas. In this presentation, drawing from an extensive literature review and research with government and non-government human service organisations focussing on the design and redevelopment of electronic information systems, the most significant problems and dilemmas will be explored. It will be argued that current frameworks for ethical social work and human service practice will need to be expanded to accommodate developments in technology which have made ‘Big data’ projects possible.
Measuring reliability and validity in human coding and machine classificationStuart Shulman
Slides delivered as a part of #CAQDAS14.
In 1989 the Department of Sociology at the University of Surrey convened the world's first conference on qualitative software, which brought together qualitative methodologists and software developers who debated the pros and cons of the use of technology for qualitative data analysis. The result was a book (Fielding & Lee (1991) Using Computers in Qualitative Research, Sage Publications), the setting-up of the CAQDAS Networking Project and many other conferences concerning the topics over the years.
This conference will be another opportunity for methodologists, developers and researchers to come together and debate the issues.There will be keynote papers by leading experts in the field, software support clinics and opportunities to present work in progress.
http://www.surrey.ac.uk/sociology/files/Programme%20.pdf
People Like You Like Presentations Like ThisDavid Millard
My EUROCALL 2017 keynote. On how Web Science can help us understand how we got the Web we have.
Abstract: The web has its roots in utopian visions of how technology could benefit humankind. Social media systems were part of this vision, using personalisation technologies to adapt the web you see to better suite your interests. But in the last decade we have seen an increasing number of problems, from anti-social behaviour to fake news, and concerns are being raised about the dangers of personal data collection, from mass surveillance to political propaganda. This talk will present the discipline of Web Science which asks how we got the Web we have, and aims to understand the dynamics of human interaction online, so that we can develop software and policies that help rather than harm. I argue that this is needed now more than ever, as future technology promises to be just as disruptive to our culture and society as the Web, and it is only by understanding these interactions, and acknowledging their consequences, that we can build technology that changes the world in ways that we actually want.
This presentation was provided by Daniella Lowenberg of the California Digital Library during the NISO Virtual Conference, Advancing Altmetrics, held on Wednesday, December 13, 2017.
Similar to MIT Program on Information Science Talk -- Ophir Frieder on Searching in Harsh Environments (20)
Selecting efficient and reliable preservation strategiesMicah Altman
This article addresses the problem of formulating efficient and reliable operational preservation policies that ensure bit-level information integrity over long periods, and in the presence of a diverse range of real-world technical, legal, organizational, and economic threats. We develop a systematic, quantitative prediction framework that combines formal modeling, discrete-event-based simulation, hierarchical modeling, and then use empirically calibrated sensitivity analysis to identify effective strategies.
Presentation by Philip Cohen on collaborative work with Micah Altman as part of the MIT CREOS research talk series. Presented in fall 2018, in Cambridge, MA.
Contemporary journal peer review is beset by a range of problems. These include (a) long delay times to publication, during which time research is inaccessible; (b) weak incentives to conduct reviews, resulting in high refusal rates as the pace of journal publication increases; (c) quality control problems that produce both errors of commission (accepting erroneous work) and omission (passing over important work, especially null findings); (d) unknown levels of bias, affecting both who is asked to perform peer review and how reviewers treat authors, and; (e) opacity in the process that impedes error correction and more systematic learning, and enables conflicts of interest to pass undetected. Proposed alternative practices attempt to address these concerns -- especially open peer review, and post-publication peer review. However, systemic solutions will require revisiting the functions of peer review in its institutional context.
Presentation by Philip Cohen and Micah Altman on developing an exchange system for peer review in support for open science. Prepared for presentation at the ACRL-SSRC meeting on Open scholarship in the social sciences. Washington DC, Dec 2018
Redistricting in the US -- An OverviewMicah Altman
This presentation was prepared for the International Seminar on Electoral Districting, National Electoral Institute El Colegio de México. http://www.ine.mx/seminario-internacional-distritacion-electoral/
This presentation was prepared for the International Seminar on Electoral Districting, National Electoral Institute El Colegio de México. http://www.ine.mx/seminario-internacional-distritacion-electoral/
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
Scott Bradner is a Berkman Center affiliate who worked for 50 at Harvard in the areas of computer programming, system management, networking, IT security, and identity management. Scott Bradner was involved in the design, operation and use of data networks at Harvard University since the early days of the ARPANET and served in many leadership roles in the IETF. He presented the talk recorded below, entitled, A History of the Internet -- as part of Program on Information Science Brown Bag Series:
Bradner abstracted his talk as follows:
In a way the Russians caused the Internet. This talk will describe how that happened (hint it was not actually the Bomb) and follow the path that has led to the current Internet of (unpatchable) Things (the IoT) and the Surveillance Economy.
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
The web is now firmly established as the primary communication and publication platform for sharing and accessing social and cultural materials. This networked world has created both opportunities and pitfalls for libraries and archives in their mission to preserve and provide ongoing access to knowledge. How can the affordances of the web be leveraged to drastically extend the plurality of representation in the archive? What challenges are imposed by the intrinsic ephemerality and mutability of online information? What methodological reorientations are demanded by the scale and dynamism of machine-generated cultural artifacts? This talk will explore the interplay of the web, contemporary historical records, and the programs, technologies, and approaches by which libraries and archives are working to extend their mission to preserve and provide access to the evidence of human activity in a world distinguished by the ubiquity of born-digital materials.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
Cassidy Sugimoto is Associate Professor in the School of Informatics and Computing, Indiana University Bloomington, who researches within the domain of scholarly communication and scientometrics, examining the formal and informal ways in which knowledge producers consume and disseminate scholarship. She presented this talk, entitled Labor And Reward In Science: Do Women Have An Equal Voice In Scholarly Communication? A Brown Bag With Cassidy Sugimoto, as part of the Program on Information Science Brown Bag Series.
Despite progress, gender disparities in science persist. Women remain underrepresented in the scientific workforce and under rewarded for their contributions. This talk will examine multiple layers of gender disparities in science, triangulating data from scientometrics, surveys, and social media to provide a broader perspective on the gendered nature of scientific communication. The extent of gender disparities and the ways in which new media are changing these patterns will be discussed. The talk will end with a discussion of interventions, with a particular focus on the roles of libraries, publishers, and other actors in the scholarly ecosystem..
Utilizing VR and AR in the Library Space:Micah Altman
Matt Bernhardt is a web developer in the MIT libraries and a collaborator in our program. He presented this talk, entitled Reality Bytes - Utilizing VR and AR in The Library Space, as part of Program on Information Science Brown Bag Series.
Terms like "virtual reality" and "augmented reality" have existed for a long time. In recent years, thanks to products like Google Cardboard and games like Pokemon Go, an increasing number of people have gained first-hand experience with these once-exotic technologies. The MIT Libraries are no exception to this trend. The Program on Information Science has conducted enough experimentation that we would like to share what we have learned, and solicit ideas for further investigation.
For slides and comments see: http://informatics.mit.edu/blog
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
Catherine D'Ignazio is an Assistant Professor of Civic Media and Data Visualization at Emerson College, a principal investigator at the Engagement Lab, and a research affiliate at the MIT Media Lab/Center for Civic Media. She presented this talk, entitled, Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots as part of Program on Information Science Brown Bag Series.
Communities, governments, libraries and organizations are swimming in data—demographic data, participation data, government data, social media data—but very few understand what to do with it. Though governments and foundations are creating open data portals and corporations are creating APIs, these rarely focus on use, usability, building community or creating impact. So although there is an explosion of data, there is a significant lag in data literacy at the scale of communities and citizens. This creates a situation of data-haves and have-nots which is troubling for an open data movement that seeks to empower people with data. But there are emerging technocultural practices that combine participation, creativity, and context to connect data to everyday life. These include data journalism, citizen science, emerging forms for documenting and publishing metadata, novel public engagement in government processes, and participatory data art. This talk surveys these practices both lovingly and critically, including their aspirations and the challenges they face in creating citizens that are truly empowered with data.
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
Access to high-quality, relevant information is absolutely foundational for a quality education. Yet, so many schools across the developing world lack fundamental resources, like textbooks, libraries, electricity and Internet connectivity. The SolarSPELL (Solar Powered Educational Learning Library) is designed specifically to address these infrastructural challenges, by bringing relevant, digital educational content to offline, off-grid locations. SolarSPELL is a portable, ruggedized, solar-powered digital library that broadcasts a webpage with open-access educational content over an offline WiFi hotspot, content that is curated for a particular audience in a specified locality—in this case, for schoolchildren and teachers in remote locations. It is a hands-on, iteratively developed project that has involved undergraduate students in all facets and at every stage of development. This talk will examine the design, development, and deployment of a for-the-field technology that looks simple but has a quite complex background.
Laura Hosman is Assistant Professor at Arizona State University, holding a joint appointment in the School for the Future of Innovation in Society and in The Polytechnic School. Her work is action-oriented and focuses on the role for information and communications technology (ICT) in developing countries. Presently, she focuses on ICT-in-education projects, and brings her passion for experiential learning to the classroom by leading real-world-focused, project-based courses that have seen student-built technology deployed in schools in Haiti, Vanuatu, Micronesia, Samoa, and Tonga.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...Micah Altman
Dr Altman's talk summarizes the lifecycle of research attribution, with special attention to person identifiers and contributor roles. The talk describes and discusses ORCID’s new “collect-and-connect” program, and the CASRAI CRediT contributor taxonomy as exemplars of emerging good practice. We close by describing how identifiers are being incorporated into a broader range of scholarly outputs, such as software.
Dr Micah Altman presented this at the Society for American Archivists 2016 Research Forum.
In this presentation I discuss some key potential topics for preservation research in the next five years.
Software Repositories for Research -- An Environmental ScanMicah Altman
Presented at the Software Preservation Network Forum:
"We discuss the results of an environmental scan characterizing the current landscape of software repositories, hubs, and publication venues that are used in research and scholarships. The study aims to characterize the research and scholarship use cases supported by exemplar repositories, their models for sustainability, and the related key affordances, significant properties which the repository offers/maintains. We supplement this with a scan of funder and publisher policies toward software curation and citation; and a summary of key policy resources and guidelines. Using this environmental scan, we discuss a preliminary gap analysis. It hoped that by addressing these key questions, new insights will be provided into the types of decisions research Libraries can expect to make when designing future pilot software curation services."
Dr Altman will present on simulating risk to digital preservation at IPres 2015 to be held at UNC-Chapel Hill.
We develop a discrete event-based simulation to evaluate risks of collection loss due to correlated and uncorrelated low and high-level threats. And we examine how replication, auditing, compression, encryption, and institutional diversification ameliorate this risk.
More information on this project is available from http://informatics.mit.edu
Reputation Management for Early Career ResearchersMicah Altman
In the rapidly changing world of research and scholarly communications, researchers are faced with a fast growing range of options to publicly disseminate, review, and discuss research—options which will affect their long-term reputation. Early career scholars must be especially thoughtful in choosing how much effort to invest in dissemination and communication, and what strategies to use.
Dr. Micah Altman briefly reviews a number of bibliometric and scientometric studies of quantitative research impact, a sampling of influential qualitative writings advising this area, and an environmental scan of emerging researcher profile systems. Based on this review, and on professional experience on dozens of review panels, Dr. Altman suggests some steps early career researchers may consider when disseminating their research and participating in public reviews and discussion.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Monitoring Java Application Security with JDK Tools and JFR Events
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Harsh Environments
1. Searching in Harsh Environments
Ophir Frieder
Computer Science Dept. | Georgetown University &
Biostatistics, Bioinformatics, & Biomathematics| Georgetown University Medical Center
ophir@ir.cs.georgetown.edu March 2016
2. Correcting the Search Myth
If it’s search, then Google solved it!
Some of what Google solved
Was solved by others first
Google’s focus is computerized data,
Much data are not digitized
Google is hardly a key social media player
Social media data are everywhere
2
3. Diverse Search Applications
Complex Document Information Processing
The whole is greater than the sum of its parts
Searching is easy
Unless it is in adverse (misspelled) environments
Social Media Search & Surveillance
Detecting outbreaks in their infancy
3
6. Complex documents include
handwritten notes,
diagrams,
graphics,
printed or formatted text
Point solutions exist:
OCR, Information Retrieval,
Information Extraction,
Image Processing, Text
Clustering, Computational
Stylistics, …
No definition of state-of-the-art
for the integrated problem
Manual partitioning/collating:
Expensive, time-consuming,
error-prone
Some are even more complex!
6
7. Optical character recognition
(OCR)
Document clustering and browsing
Document structure extraction
Extraction from tables/lists
Handwriting analysis and signature
recognition
Figure caption identification
and extraction
Conventional and image retrieval
systems
Entity and relationship extraction
Existing Technology Point Solutions
7
13. 13
Without Logos: At which institution?
Without Text: What positions do I hold?
Ophir Frieder
McDevitt Prof. of Comp. Sci. & Inf. Proc.
&
Prof. of Biostatistics, Bioinformatics, & Biomathematics
Integration Helps
13
14. Technology comes and goes
but….
Benchmarks (Collections) are ever (forever) lasting
14
15. Cover the richness of inputs
Range of formats, lengths, & genres
Variance in print and image quality
Document should include:
Handwritten text and notations
Diverse fonts
Graphical elements
graphs, tables, photos, logos, and diagrams
Test Collection Characteristics
15
16. Sufficiently high volume of documents
Vast volume of redundant & irrelevant documents
Support diverse applications
Include private communications within and between
groups planning activities and deploying resources
Publicly available data!
Minimal cost
Minimal licensing
16
Test Collection Characteristics
17. 17
Data made public via legal proceedings
Master Settlement Agreement subset of UCSF Legacy
Tobacco Document Library
Documents scanned by individual companies; hence scan
quality widely varies
~ 7 million documents
~ 42 million scanned TIFF format pages (~ 1.5 TB)
~ 5 GB Metadata
~ 100 GB OCR
Dataset: https://ir.nist.gov/cdip/cdip-images/
17
CDIP Test Collection
18. The CDIP Test Collection
(NIST TREC V1.0)
18
Used multiple years in TREC Legal Track
Records (62GB) made available to TREC
participants (through ftp/dvd)
40 queries simulating legal case investigations
with relevant judgments produced by 35 lawyers.
Novel queries with relevant judgments generated
by tobacco researchers
19. CDIP Benchmark data – as a novel text
test collection for “live scenarios”
NIST TREC Legal Track, 2006 - 2009
Housed permanently at NIST
Complex Document search
Ground truth difficult
800 hand checked sub-collection
Evaluation
19
20. Completed:
Subset of 800 documents
Manually labelled authorship & organizational unit
Evaluated:
Authorship, organizational, monetary, date, and
address-based retrieval tasks
Ongoing:
Subset of 20K documents.
Open Problem:
Performance evaluation (measures) for larger sets
Preliminary Results
20
26. Collaborators
Initial effort
Gady Agam – Illinois Inst. of Tech.
Shlomo Argamon – Illinois Inst. of Tech.
David Doermann – Univ. of Maryland DARPA
David Grossman – Illinois Inst. of Tech. Grossman Lab
David D. Lewis – DDL Consulting
Sargur Srihari – SUNY Buffalo
Ongoing effort
Gideon Frieder – George Washington Univ.
Jon Parker – Georgetown Univ. MITRE
26
27. S. Argamon, G. Agam, O. Frieder, D. Grossman, D. Lewis, G. Sohn,, and K. Voorhees, “A Complex
Document Information Processing Prototype,” ACM SIGIR, 2006.
D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, and J. Heard, “Building a Test Collection for
Complex Document Information Processing,” ACM SIGIR, 2006.
G. Agam, S. Argamon, O. Frieder, D. Grossman, and D. Lewis, “Content-Based Document Image
Retrieval in Complex Document Collections,” Document Recognition and Retrieval, 2007.
G. Bal, G. Agam, O. Frieder, and G. Frieder, “Interactive Degraded Document Enhancement and Ground
Truth Generation,” Document Recognition and Retrieval , 2008.
T. Obafemi-Ajayi, G. Agam, and O. Frieder, “Historical Document Enhancement Using LUT Classification,”
International Journal on Document Analysis and Recognition, 13(1), March 2010.
J. Parker, G. Frieder, and O. Frieder, “Automatic Enhancement and Binarization of Degraded Document
Images,” International Conference on Document Analysis and Recognition, 2013.
J. Parker, G. Frieder, and O. Frieder, "Robust Binarization of Degraded Document Images using
Heuristics," Document Recognition and Retrieval XXI, San Francisco, California, February 2014.
Parker, et. al, "System and Method for Enhancing the Legibility of Degraded Images" US Patent
#8,995,782. March 31, 2015.
Frieder, et. al, "System and Method for Enhancing the Legibility of Images," US Patent #9,269,126.
February 23, 2016.
References
27
29. Spelling in Adverse Conditions
Foreign language (Yizkor Books)
User unfamiliar with character pronunciation
Multiple languages within a document
Domain specific (Medical)
Terms unfamiliar to the general audience
29
30. Yizkor Books
Yizkor = Hebrew word for “remember”
Firsthand accounts of events that preceded, took place
during, and followed Second World War
Documents destroyed communities and people who perished
Started early 1940’s; highest activity in 1960’s and 1970’s
Published in 13 languages, across 6 continents
One of largest collections resides in USHMM
Access restricted due to limited number, fragile
state, and prevention of destruction or theft
30
31. Traditional Access
User requested; archivist driven
Requires “complete” understanding of books
High human resource costs
Inefficient & slow
Often fails to obtain complete, if any, results
31
32. Metadata Search Access
Provides an intuitive search capability for
apprehensive but interested users
Creates and queries collection metadata
32
33. Yikzor Interface
Centralized index
Global access
Efficient search
Accurate search
Multi-lingual spelling
correction
33
35. Spell Checker
Upon entering a
misspelled query,
users are presented
with a ranked list of
suggestions
Percentages
represent similarity to
original query as
measured by our
algorithms
35
37. Language Independent Correction
Simplistic Rules Work ! or ?
Replace first and last characters by a wild card, in succession;
Retain only first and last characters and insert a wild card;
Retain only first and last two characters and insert a wild card;
Replace middle n-characters by a wild card, in succession;
Replace first half by a wild card;
Replace second half by a wild card;
37
38. Single Character Correction
Add Single Random Character
Remove Single Random Character
Replace Single Random Character
Swap Random Adjacent Pair of Characters
Mitton 1996 – “Spellchecking by Computers”
Found Rank
D-M Sound 41.41 N/A
N-Gram 94.97 2.58
USHMM 100 1.71
1.71
Found Rank
D-M Sound 41.96 N/A
N-Gram 93.40 3.46
USHMM 99.97 2.54
2.57
Found Rank
D-M Sound 57.89 N/A
N-Gram 85.02 4.77
USHMM 97.97 3.75
3.00
Found Rank
D-M Sound 31.45 N/A
N-Gram 92.06 3.24
USHMM 100 2.15
2.01
38
42. Transcription Errors
“What is a prescribing error?”, J. Quality in Health Care, 2000;
9:232–237.
“Reducing medication errors and increasing patient safety: Case
studies in clinical pharmacology”, J. Clinical Pharmacology, July
2003. vol. 43 no. 7: 768-783.
“Preventing medication errors in community pharmacy: root-cause
analysis of transcription errors”, Quality and Safety in Health Care,
2007;16:285-290.
“10 strategies for minimizingdispensingerrors”, Pharmacy Times, Jan.
20th, 2010
Note: Although many of the transcription errors are
not spelling errors; some indeed are!
42
43. Medical Term Data Set
HosfordMedical Terms Dictionary v.3.0
Number of terms: 9,883
Term characteristics:
Average: 10.58
Minimum: 2
Maximum: 30
Median: 10
Mode: 10
43
44. Single Character Correction
Add Single Random Character
Remove Single Random Character
Replace Single Random Character
Swap Random Adjacent Pair of Characters
Found Rank
D-M Sound 38.54 N/A
3-Gram 99.67 1.08
Med-Find 100 1.03
1.03
Found Rank
D-M Sound 44.84 N/A
3-Gram 99.52 1.16
Med-Find 100 1.07
1.07
Found Rank
D-M Sound 62.73 N/A
3-Gram 96.39 1.50
Med-Find 99.54 1.42
1.27
Found Rank
D-M Sound 29.99 N/A
3-Gram 98.76 1.19
Med-Find 99.99 1.10
1.08
44
47. Collaborators
Key Personnel
Michlean Amir – USHMM
Rebecca Cathey – BAE Systems
Gideon Frieder – George Washington Univ.
Jason Soo – Georgetown/MITRE
Many comments by “prototype” users
47
48. J. Soo, R. Cathey, O. Frieder, M. Amir, and G. Frieder, “Yizkor Books: A Voice for the Silent Past,” ACM
Seventeenth Conference on Information and Knowledge Management (CIKM) – Industrial Track, Napa
Valley, California, October 2008.
J. Soo and O. Frieder, “On Foreign Name Search,” ACM Thirty-Second European Conference on
Information Retrieval (ECIR), Milton Keynes, United Kingdom, March 2010.
J. Soo and O. Frieder, “On Searching Misspelled Collections,” Journal of the Association for Information
Science and Technology (JASIS), 66(6), June 2015.
J. Soo and O. Frieder, “Revisiting Known-Item Retrieval in Degraded Document Collections," Document
Recognition and Retrieval (DRR), San Francisco, California, February 2016.
J. Soo and O. Frieder, “Searching Corrupted Document Collections," Twelfth IAPR Document
Analysis Systems (DAS), Santorini, Greece, April 2016.
References
48
50. Motivation
Public health surveillance
Demands considerable human efforts
Often delayed identification
Typically: need topic of interest
Ideally: detect without focus
Motivated to expedite detection
Social media the answer?
50
51. Related Efforts
Social Media
Known topic problem
Detection of specific disease (Influenza)
Correlate occurrence of flu-related words with official
Influenza-like-illness data
Summarize influenza-related tweets
Complex solutions
Detect multiple health conditions via complex
learning algorithms
Use access-limited resources
Query logs
51
52. Hypothesis: Generation vs. Validation
Goal: extract more general health-related information from
social media streams
The Old Way:
Evaluate a pre-existing hypothesis using SM data
Q: “Is flu occurring more frequently?”
A: “Yes”
Our Way:
Generate a hypothesis from SM data
Q: “Are any illnesses occurring more frequently? If so,
which ones?”
A: “Yes, Flu”
52
53. Tweet Corpus
Collected by (JHU)
2 billion tweets (May 2009 - Oct 2010)
Filtered multiple times to yield medically related
Using a 20,000 health-related key-phrase list
High-recall / low-precision health tweets
SVM to increase precision
53
56. Frequent Word Set Identification
Preprocessing
Punctuation mark removal
Text lower-cased & tokenized
Stop-word removal
Duplicate term removal
Medical synonym expansion (MedSyn)
56
57. Frequent Word Set Identification
# Tweet Content
T1 Pounding headache, sore throat, low grade fever, flu
T2 Sleep, a perfect cure to forget about the pain!
T3 This morning woke up with fever, sore throat, and flu
T4 Cough, flu, sore throat. I couldn’t ask for a better combination
T5 Got you down? Fever , muscle aches, cough,
Term Set Support
flu, sore throat 3
fever 3
Cough 2
Frequent Term Sets: {{flu, sore throat}, {fever}} -- Threshold 3
57
59. Track Word Set Time Series
Time-series used to determine word sets with a
significant increase in prevalence
Two differing word set tracks by month
59
{feel, sick}
very frequent,
does not trend
{allergies, feel}
trends in April
and May
Trending
Decision
60. Query a trending word set in Wikipedia
Why Wikipedia?
Comprehensive range of topics including
health topics
Written in layman’s English resembling tweets
considered
60
Query Wikipedia
61. Filter Wikipedia Results
Retrieved articles determine if frequent
word set is health-related
Health-related nature judged by two
metrics:
Ratio of medical tokens in introduction
Presence of International Statistical
Classification of Diseases and Related
Health Problem (ICD) codes.
61
62. Ratio of Medical Tokens
Article health-related if ratio of health tokens
in introduction surpasses threshold
Process:
Tokenize introduction
Remove stop words
Count the tokens and medical tokens
If # medical_token / # token > 0.75
then health-related
62
63. ICD Codes
Health-related Wikipedia articles typically contain
info-box with ICD-9 & ICD-10 codes.
ICD code – strong health-related indicator
An Wikipedia article’s info box and ICD
63
64. Detection – 2010 Flu Season
Tweet time series from June
09 to Oct 10
Weekly flu cases in US from
June 09 to Oct 10
64
65. Social Media Mining Accuracy
Landing on Hudson and Mumbai Terror Attack
Flu Tweets (Lampos and Cristianini 2010; Culotta 2010)
…
Hurricane Sandy Coordination Communication
…
…
Fake Celebrity Deaths (Jeff Goldblum)
65
69. Summary
Our Approach:
Filter a corpus to be topic specific
Identify trending word sets
Connect multiple trending words sets to topics of interest
Detect trending topic of interest – Generate Hypotheses
69
70. Future Work
Run framework on a larger scale
Increase data volume: 2 billion 200 billion
Increasing temporal resolution: months weeks days
Use resources besides Wikipedia and ICD to filter out non-
medically related trending topics
Detect other types of trends by changing the filters to suit a
new topic of interest
Deploy globally
70
71. Collaborators
Key Personnel
Nazli Goharian – Georgetown University
Alek Kolcz – Twitter PushD
Jon Parker – Johns Hopkins/Georgetown MITRE
Andrew Yates – Georgetown University
Many comments by “prototype” users
71
72. Reference
A. Yates, J. Parker, N. Goharian, and O. Frieder, “A Framework for Public
Health Surveillance,” 9th Language Resources and Evaluation
Conference (LREC-2014), Reykjavik, Iceland, May 2014.
J. Parker, A. Yates, N. Goharian, and O. Frieder, “Health Related
Hypothesis Generation using Social Media Data,” Social Network
Analysis and Mining, 5(7), March 2015.
A. Yates, N. Goharian, and O. Frieder, “Learning the Relationships
between Drug, Symptom, and Medical Condition Mentions in Social
Media,“, AAAI 10th International Conference on Web and Social
Media (ICWSM), Cologne, Germany, May 2016.
A. Yates, A. Kolcz, N. Goharian, and O. Frieder, “Effects of Sampling
on Twitter Trend Detection,” 10th Language Resources and
Evaluation Conference (LREC-2016), Portoroz, Slovenia, May 2016.
72
73. Summary
Complex Document Information Processing
The whole is greater than the sum of its parts
Searching is easy
Unless it is in adverse (misspelled) environments
Social Media Search: Surveillance in a positive light
Detecting outbreaks in their infancy
73