Automap is a text-mining tool that enables the extraction of concepts, relationships, and networks from text corpora. It allows users to create semantic networks and meta-networks through both automated and manual coding of texts. The tool generates visualizations of textual networks and statistical analyses of network structures that can provide insights into themes, knowledge structures, and dynamics within texts. While computational models have limitations, validating results against human analysis of sample texts and domain expertise can help improve models and lead to new research insights.
"Mass Surveillance" through Distant ReadingShalin Hai-Jew
Â
Distant reading refers to the uses of computers to âreadâ texts by counting words, identifying themes and subthemes (through topic modeling), extracting sentiment, applying psychological analysis to the author(s), and otherwise finding latent or hidden insights. This work is based on research on âmass surveillanceâ based on five text sets: academic, mainstream journalism, microblogging, Wikipedia articles, and leaked government data. The purpose was to capture some insights about the collective social discussions occurring around this issue in an indirect way. This presentation uses a variety of data visualizations (article network graphs, word trees, dendrograms, treemaps, cluster diagrams, line graphs, bar charts, pie charts, and others) to show how machines read and the types of summary data they enable (at computational speeds, at machine scale, and in a reproducible way). Also, some computational linguistic analysis tools enable the creation of custom dictionaries for unique types of applied research. The tools used in this presentation include NVivo 11 Plus and LIWC2015.
Using âDistant Readingâ to Explore Discussion Threads in Online CoursesShalin Hai-Jew
Â
In this age of mass data, âdistant readingâ has come to the fore as a way to deal with large amounts of text dataâincluding from student discussion threads in online courses. Kansas State University has a site license for NVivo 11 Plus, a software that enables multimedia data curation and qualitative and mixed methods data analysis. Two new features in NVivoâsentiment analysis and theme extraction (topic modeling)âenable users to âdistant readâ large amounts of text to extract some early insights.
What are the expressed sentiments of learners when discussing a particular issue? Do these trend positive or negative?
What topics or themes or concepts are brought up by students given a certain discussion thread prompt?
What do the sentiment and topic insights suggest about where students are at with a particular issue? Are there latent (hidden) insights?
These new features, in combination with text frequency counts (with related text clustering), text searches, and other text data query capabilities (and related data visualization capabilities) in NVivo enable distant reading for use in online courses. This digital slideshow will introduce NVivo 11 Plus (a local software tool with both Windows and Mac platform versions) and walk-through how it may be applied to textual data extracted from an online course.
Understanding what students are thinking is a critical part of transformational teaching and learning. Using computational means to listen and to hear is important to this end.
Capitalizing on Machine Reading to Engage Bigger DataShalin Hai-Jew
Â
What are some ways to select, say, 200 research articles to âclose readâ from a set of 2,000 PDF articles gleaned from library databases and Google Scholar? How can a researcher make sense of a trending issue in the flood of Tweets and RT based on a particular hashtag (#) or keyword search or an especially lively Tweetstream based on a particular social media account? People are dealing with ever more prodigious amounts of informationâfrom a number of sources. Those who are savvy to the uses of computers to aid their reading (through âdistant readingâ or ânot-readingâ) may find that they are able to cover much more ground. This presentation introduces the use of NVivo 11 Plus (matrix queries, word frequency counts, text searches and dendrograms, cluster analyses, topic modeling, and others) for multiple cases of distant reading to aid in academic and research work.
Writing and Publishing about Applied Technologies in Tech Journals and BooksShalin Hai-Jew
Â
This slideshow provides insights on how to write and publish about applied technologies in tech journals and books, including the following:
Getting started in tech publishing
Cost-benefit calculations
Parts to an article; parts to a chapter
Writing process
Collaborating
Publishing process
Acquiring readers (and citations)
Post-publishing
Next works
Building a Digital Learning Object w/ Articulate Storyline 2Shalin Hai-Jew
Â
The digital learning object (DLO) is still a common staple in online learning. One of the more sophisticated authoring tools to build DLOs is Articulate Storyline 2, which enables the integration of multimedia (including screen captures with Articulate Replay); the building of animations; branching, and other features. Its packaging allows a full range of SCORM and Tin Can API outputs and versioning in HTML 5. This presentation will introduce the software tool and some of its capabilities to provide a sense of where digital learning objects may be headed.
To provide relevant data to users form massive data available on web the Semantic Web technique is used. This presentation gives introduction of semantic web and how NLP can be used in it.
"Mass Surveillance" through Distant ReadingShalin Hai-Jew
Â
Distant reading refers to the uses of computers to âreadâ texts by counting words, identifying themes and subthemes (through topic modeling), extracting sentiment, applying psychological analysis to the author(s), and otherwise finding latent or hidden insights. This work is based on research on âmass surveillanceâ based on five text sets: academic, mainstream journalism, microblogging, Wikipedia articles, and leaked government data. The purpose was to capture some insights about the collective social discussions occurring around this issue in an indirect way. This presentation uses a variety of data visualizations (article network graphs, word trees, dendrograms, treemaps, cluster diagrams, line graphs, bar charts, pie charts, and others) to show how machines read and the types of summary data they enable (at computational speeds, at machine scale, and in a reproducible way). Also, some computational linguistic analysis tools enable the creation of custom dictionaries for unique types of applied research. The tools used in this presentation include NVivo 11 Plus and LIWC2015.
Using âDistant Readingâ to Explore Discussion Threads in Online CoursesShalin Hai-Jew
Â
In this age of mass data, âdistant readingâ has come to the fore as a way to deal with large amounts of text dataâincluding from student discussion threads in online courses. Kansas State University has a site license for NVivo 11 Plus, a software that enables multimedia data curation and qualitative and mixed methods data analysis. Two new features in NVivoâsentiment analysis and theme extraction (topic modeling)âenable users to âdistant readâ large amounts of text to extract some early insights.
What are the expressed sentiments of learners when discussing a particular issue? Do these trend positive or negative?
What topics or themes or concepts are brought up by students given a certain discussion thread prompt?
What do the sentiment and topic insights suggest about where students are at with a particular issue? Are there latent (hidden) insights?
These new features, in combination with text frequency counts (with related text clustering), text searches, and other text data query capabilities (and related data visualization capabilities) in NVivo enable distant reading for use in online courses. This digital slideshow will introduce NVivo 11 Plus (a local software tool with both Windows and Mac platform versions) and walk-through how it may be applied to textual data extracted from an online course.
Understanding what students are thinking is a critical part of transformational teaching and learning. Using computational means to listen and to hear is important to this end.
Capitalizing on Machine Reading to Engage Bigger DataShalin Hai-Jew
Â
What are some ways to select, say, 200 research articles to âclose readâ from a set of 2,000 PDF articles gleaned from library databases and Google Scholar? How can a researcher make sense of a trending issue in the flood of Tweets and RT based on a particular hashtag (#) or keyword search or an especially lively Tweetstream based on a particular social media account? People are dealing with ever more prodigious amounts of informationâfrom a number of sources. Those who are savvy to the uses of computers to aid their reading (through âdistant readingâ or ânot-readingâ) may find that they are able to cover much more ground. This presentation introduces the use of NVivo 11 Plus (matrix queries, word frequency counts, text searches and dendrograms, cluster analyses, topic modeling, and others) for multiple cases of distant reading to aid in academic and research work.
Writing and Publishing about Applied Technologies in Tech Journals and BooksShalin Hai-Jew
Â
This slideshow provides insights on how to write and publish about applied technologies in tech journals and books, including the following:
Getting started in tech publishing
Cost-benefit calculations
Parts to an article; parts to a chapter
Writing process
Collaborating
Publishing process
Acquiring readers (and citations)
Post-publishing
Next works
Building a Digital Learning Object w/ Articulate Storyline 2Shalin Hai-Jew
Â
The digital learning object (DLO) is still a common staple in online learning. One of the more sophisticated authoring tools to build DLOs is Articulate Storyline 2, which enables the integration of multimedia (including screen captures with Articulate Replay); the building of animations; branching, and other features. Its packaging allows a full range of SCORM and Tin Can API outputs and versioning in HTML 5. This presentation will introduce the software tool and some of its capabilities to provide a sense of where digital learning objects may be headed.
To provide relevant data to users form massive data available on web the Semantic Web technique is used. This presentation gives introduction of semantic web and how NLP can be used in it.
Influence of Timeline and Named-entity Components on User Engagement Roi Blanco
Â
Nowadays, successful applications are those which contain features that captivate and engage users. Using an interactive news retrieval system as a use case, in this paper we study the effect of timeline and named-entity components on user engagement. This is in contrast with previous studies where the importance of these components were studied from a retrieval effectiveness point of view. Our experimental results show significant improvements in user engagement when named-entity and timeline components were installed. Further, we investigate if we can predict user-centred metrics through user's interaction with the system. Results show that we can successfully learn a model that predicts all dimensions of user engagement and whether users will like the system or not. These findings might steer systems that apply a more personalised user experience, tailored to the user's preferences.
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
Â
I will claim that Semantic Web Patterns can drive the next technological breakthrough: they can be key for providing intelligent applications with sophisticated ways of interpreting data. I will picture scenarios of a possible not so far future in order to support my claim. I will argue that current Semantic Web Patterns are not sufficient for addressing the envisioned requirements, and I will suggest a research direction for fixing the problem, which includes the hybridisation of existing computer science pattern-based approaches, and human computing.
An Introduction to Information Retrieval and Applicationssathish sak
Â
An Introduction to Information Retrieval and Applications The score you get depends on the functions, difficulty and quality of your project
For system development:
System functions and correctness
For academic paper presentation
Quality and your presentation of the paper
Major methods/experimental results *must* be presented
Papers from top conferences are strongly suggested
E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, âŚ
Proposals are *required* for each team, and will be counted in the score
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...ISAR Publications
Â
Mobile search engine is a meta search engine that imprisonments the userâs favorite in
the form of concepts by mining their click through data. But the search query is limited to small
words unlike those used when interacting with search engines through computers. It has become
popular because of presence of huge number of applications. Smartphoneâs carry large amount of
personal information, such as userâs personal details, contacts, messages, emails, credit card
information, etc. User type specific search and finally Ontology based Search. Moreover opinion
mining is conducted to provide feedback and valuable suggestions given by the mobile users. Due
to the different characteristics of the content concepts and location concepts, use different
techniques for their concept extraction and ontology formulation. Moreover the individual users
can use this search engine, which runs on android platform. They can give feedbacks and
suggestions about the search result. Based on the feedback other users can get valuable
information about the services available in their location or nearby location.
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Grace Hui Yang
Â
This is the introductory talk for the TREC Dynamic Domain Track. The Track ran from 2015 to 2017, aiming to evaluate and advance research in dynamic search and domain-specific search. This talk was prepared to introduce the ideas and setups in the upcoming Track to the research community.
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...dannyijwest
Â
Due to the explosion of information/knowledge on the web and wide use of search engines for desired
information,the role of knowledge management(KM) is becoming more significant in an organization.
Knowledge Management in an Organization is used to create ,capture, store, share, retrieve and manage
information efficiently. The semantic web, an intelligent and meaningful web, tend to provide a promising
platform for knowledge management systems and vice versa, since they have the potential to give each
other the real substance for machine-understandable web resources which in turn will lead to an
intelligent, meaningful and efficient information retrieval on web. Today,the challenge for web community
is to integrate the distributed heterogeneous resources on web with an objective of an intelligent web
environment focusing on data semantics and user requirements. Semantic Annotation(SA) is being widely
used which is about assigning to the entities in the text and links to their semantic descriptions. Various
tools like KIM, Amaya etc may be used for semantic Annotation.
Workshop one of a two-workshop series for graduate-level English students. Find part two here: https://www.slideshare.net/gesinaphillips/data-visualization-through-network-graphing-100293824
Text mining has turned out to be one of the in vogue handle that has been joined in a few research
fields, for example, computational etymology, Information Retrieval (IR) and data mining. Natural
Language Processing (NLP) methods were utilized to extricate learning from the textual text that is
composed by people. Text mining peruses an unstructured form of data to give important
information designs in a most brief day and age. Long range interpersonal communication locales
are an awesome wellspring of correspondence as the vast majority of the general population in this
day and age utilize these destinations in their everyday lives to keep associated with each other. It
turns into a typical practice to not compose a sentence with remedy punctuation and spelling. This
training may prompt various types of ambiguities like lexical, syntactic, and semantic and because of
this kind of indistinct data; it is elusive out the genuine data arrange. As needs be, we are directing
an examination with the point of searching for various text mining techniques to get different
textual requests via web-based networking media sites. This review expects to depict how
contemplates in online networking have utilized text investigation and text mining methods to
identify the key topics in the data. This study concentrated on examining the text mining
contemplates identified with Facebook and Twitter; the two prevailing web-based social networking
on the planet. Aftereffects of this overview can fill in as the baselines for future text mining research.
Technical Whitepaper: A Knowledge Correlation Search Engines0P5a41b
Â
For the technically oriented reader, this brief paper describes the technical foundation of the Knowledge Correlation Search Engine - patented by Make Sence, Inc.
Chcete vÄdÄt vĂc? Mnoho dalĹĄĂch prezentacĂ, videĂ z konferencĂ, fotografiĂ i jinĂ˝ch dokumentĹŻ je k dispozici v institucionĂĄlnĂm repozitĂĄĹi NTK: http://repozitar.techlib.cz
Would you like to know more? Find presentations, reports, conference videos, photos and much more in our institutional repository at: http://repozitar.techlib.cz/?ln=en
Formations & Deformations of Social Network GraphsShalin Hai-Jew
Â
Social network graphs are node-link (vertex-edge; entity-relationship) diagrams that show relationships between people and groups. Open-source tools like NodeXL Basic (available on Microsoftâs CodePlex) enable the capture of network data from select social media platforms through third-party add-ons and social media APIs. From social groups, relational clusters are extracted with clustering algorithms which identify intensities of connections. Visually, structural relational data is conveyed with layout algorithms in two-dimensional space. Using these various layout options and built-in visual design features, it is possible to aesthetically âdeformâ the network graph data for visual effects. This presentation introduces novel datasets and novel data visualizations.
This presentation explains the research I made during while working at the Social Computing Lab at KAIST.
The main goal was to expand the LIWC vocabulary and adapt for Twiter sentiment analysis.
Download it to see the animations :)
Influence of Timeline and Named-entity Components on User Engagement Roi Blanco
Â
Nowadays, successful applications are those which contain features that captivate and engage users. Using an interactive news retrieval system as a use case, in this paper we study the effect of timeline and named-entity components on user engagement. This is in contrast with previous studies where the importance of these components were studied from a retrieval effectiveness point of view. Our experimental results show significant improvements in user engagement when named-entity and timeline components were installed. Further, we investigate if we can predict user-centred metrics through user's interaction with the system. Results show that we can successfully learn a model that predicts all dimensions of user engagement and whether users will like the system or not. These findings might steer systems that apply a more personalised user experience, tailored to the user's preferences.
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
Â
I will claim that Semantic Web Patterns can drive the next technological breakthrough: they can be key for providing intelligent applications with sophisticated ways of interpreting data. I will picture scenarios of a possible not so far future in order to support my claim. I will argue that current Semantic Web Patterns are not sufficient for addressing the envisioned requirements, and I will suggest a research direction for fixing the problem, which includes the hybridisation of existing computer science pattern-based approaches, and human computing.
An Introduction to Information Retrieval and Applicationssathish sak
Â
An Introduction to Information Retrieval and Applications The score you get depends on the functions, difficulty and quality of your project
For system development:
System functions and correctness
For academic paper presentation
Quality and your presentation of the paper
Major methods/experimental results *must* be presented
Papers from top conferences are strongly suggested
E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, âŚ
Proposals are *required* for each team, and will be counted in the score
IJRET-V1I1P5 - A User Friendly Mobile Search Engine for fast Accessing the Da...ISAR Publications
Â
Mobile search engine is a meta search engine that imprisonments the userâs favorite in
the form of concepts by mining their click through data. But the search query is limited to small
words unlike those used when interacting with search engines through computers. It has become
popular because of presence of huge number of applications. Smartphoneâs carry large amount of
personal information, such as userâs personal details, contacts, messages, emails, credit card
information, etc. User type specific search and finally Ontology based Search. Moreover opinion
mining is conducted to provide feedback and valuable suggestions given by the mobile users. Due
to the different characteristics of the content concepts and location concepts, use different
techniques for their concept extraction and ontology formulation. Moreover the individual users
can use this search engine, which runs on android platform. They can give feedbacks and
suggestions about the search result. Based on the feedback other users can get valuable
information about the services available in their location or nearby location.
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Grace Hui Yang
Â
This is the introductory talk for the TREC Dynamic Domain Track. The Track ran from 2015 to 2017, aiming to evaluate and advance research in dynamic search and domain-specific search. This talk was prepared to introduce the ideas and setups in the upcoming Track to the research community.
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...dannyijwest
Â
Due to the explosion of information/knowledge on the web and wide use of search engines for desired
information,the role of knowledge management(KM) is becoming more significant in an organization.
Knowledge Management in an Organization is used to create ,capture, store, share, retrieve and manage
information efficiently. The semantic web, an intelligent and meaningful web, tend to provide a promising
platform for knowledge management systems and vice versa, since they have the potential to give each
other the real substance for machine-understandable web resources which in turn will lead to an
intelligent, meaningful and efficient information retrieval on web. Today,the challenge for web community
is to integrate the distributed heterogeneous resources on web with an objective of an intelligent web
environment focusing on data semantics and user requirements. Semantic Annotation(SA) is being widely
used which is about assigning to the entities in the text and links to their semantic descriptions. Various
tools like KIM, Amaya etc may be used for semantic Annotation.
Workshop one of a two-workshop series for graduate-level English students. Find part two here: https://www.slideshare.net/gesinaphillips/data-visualization-through-network-graphing-100293824
Text mining has turned out to be one of the in vogue handle that has been joined in a few research
fields, for example, computational etymology, Information Retrieval (IR) and data mining. Natural
Language Processing (NLP) methods were utilized to extricate learning from the textual text that is
composed by people. Text mining peruses an unstructured form of data to give important
information designs in a most brief day and age. Long range interpersonal communication locales
are an awesome wellspring of correspondence as the vast majority of the general population in this
day and age utilize these destinations in their everyday lives to keep associated with each other. It
turns into a typical practice to not compose a sentence with remedy punctuation and spelling. This
training may prompt various types of ambiguities like lexical, syntactic, and semantic and because of
this kind of indistinct data; it is elusive out the genuine data arrange. As needs be, we are directing
an examination with the point of searching for various text mining techniques to get different
textual requests via web-based networking media sites. This review expects to depict how
contemplates in online networking have utilized text investigation and text mining methods to
identify the key topics in the data. This study concentrated on examining the text mining
contemplates identified with Facebook and Twitter; the two prevailing web-based social networking
on the planet. Aftereffects of this overview can fill in as the baselines for future text mining research.
Technical Whitepaper: A Knowledge Correlation Search Engines0P5a41b
Â
For the technically oriented reader, this brief paper describes the technical foundation of the Knowledge Correlation Search Engine - patented by Make Sence, Inc.
Chcete vÄdÄt vĂc? Mnoho dalĹĄĂch prezentacĂ, videĂ z konferencĂ, fotografiĂ i jinĂ˝ch dokumentĹŻ je k dispozici v institucionĂĄlnĂm repozitĂĄĹi NTK: http://repozitar.techlib.cz
Would you like to know more? Find presentations, reports, conference videos, photos and much more in our institutional repository at: http://repozitar.techlib.cz/?ln=en
Formations & Deformations of Social Network GraphsShalin Hai-Jew
Â
Social network graphs are node-link (vertex-edge; entity-relationship) diagrams that show relationships between people and groups. Open-source tools like NodeXL Basic (available on Microsoftâs CodePlex) enable the capture of network data from select social media platforms through third-party add-ons and social media APIs. From social groups, relational clusters are extracted with clustering algorithms which identify intensities of connections. Visually, structural relational data is conveyed with layout algorithms in two-dimensional space. Using these various layout options and built-in visual design features, it is possible to aesthetically âdeformâ the network graph data for visual effects. This presentation introduces novel datasets and novel data visualizations.
This presentation explains the research I made during while working at the Social Computing Lab at KAIST.
The main goal was to expand the LIWC vocabulary and adapt for Twiter sentiment analysis.
Download it to see the animations :)
Exploring Article Networks on Wikipedia with NodeXLShalin Hai-Jew
Â
With 4.7 million articles in the English version of Wikipedia, this crowd-sourced online encyclopedia is regularly one of the top-ten visited sites online. For many, this is the go-to source for a first read on a topic. The open-source and free Network Overview, Discovery and Exploration for Excel (NodeXL), which is an add-on to Microsoft Excel, enables the capture of âarticle networksâ from Wikipedia. Such content network analysis-based data visualizations enable the development of research leads; some understandings of public conceptualizations of related concepts, peoples, events, and phenomena; the profiling of Wikipedia editors (both humans and âbots), and other research insights. This presentation will showcase this affordance of NodeXL and provide some ideas for practical applications of this channel of research and knowing.
Coding Social Imagery: Learning from a #selfie #humor Image Set from InstagramShalin Hai-Jew
Â
Social media messaging has long been harnessed to inform faculty about their respective learners. The textual channel is often used because of the ease of interpretation and analysis. Social imageryâtagged images, #selfies, grouped imagery, and othersâhas been less used, in part because images are more complex and multi-meaninged to analyze. Also, there are not many generalist models that inform how to code or even understand social imagery in an emergent way. (There are large-scale computational means to interpret online images, such as the AlchemyAPI of IBM Watson, for various types of feature extractions. There are ways to code imagery based on specific research questions in particular fields-of-practice.)
The presenter recently analyzed a 941-image #selfie + #humor image set from Instagram, with three main research questions:
What does identity-based humor look like in terms of a #selfie #humor- tagged image set from the Instagram photo-sharing mobile app?
Do more modern forms of mediated social humor link to more traditional forms theoretically? Is it possible to apply the Humor Styles Model to the images from the #selfie #humor Instagram image set to better understand #selfie #humor?
What are some constructive and systematized ways to analyze social image sets manually (with some computational support)?
This digital poster session will highlight some of the initial research findings (forthcoming in a near-future publication) and share insights about effectively coding social imagery in a bottom-up and emergent way.
This slideshow highlights the Tweet Analyzer machine, a tool created by Paterva and enabled through Maltego Carbon 3.5.3 and Maltego Chlorine 3.6.0. The Tweet Analyzer enables real-time captures of Tweets (from Twitter's streaming API) along with real-time sentiment analysis (based on polarities: positive, negative, and neutral), based on the Alchemy API.
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...Shalin Hai-Jew
Â
This introduces methods for extracting and analyzing social network data from Twitter for hashtag conversations (and emergent events), event graphs, search networks, and user ego neighborhoods (using NodeXL). There will be direct demonstrations and discussions of how to analyze social network graphs. This information may be extended with human- and / or machine-based sentiment analysis.
LIWC-ing at Texts for Insights from Linguistic PatternsShalin Hai-Jew
Â
Since the mid-1990s, researchers have been using the Linguistic Inquiry and Word Count (LIWC pronounced âlukeâ) software tool to explore various text corpora for hidden insights from linguistic patterns. The LIWC tool has evolved over the years. Simultaneously, research using computational text analysis has evolved and shed light on areas of deception, threat assessment, personality, predictive analytics, and other areas. This presentation will highlight some of the applications of LIWC in the research literature and showcase the tool on some original text sets.
Researchers have long known that the words of a text have always contained more information than on the surface. As such, texts have been studied for subtexts and other latent or hidden information. One approach has involved the machine-enabled analysis of human sentiment, usually mapped out on a positive-negative polarity. NVivo 11 Plus (a qualitative research tool released in late 2015) enables the automated sentiment analysis of texts (coded research, formal articles, text corpora, Tweetstream datasets, Facebook wall posts, websites, and other sources) based on four categories: very positive, moderately positive, moderately negative, and very negative. The tool feature compares the target text set against a sentiment dictionary and enables coding at different units of analysis: sentence, paragraph, or cell. Further, the sentiment capability extracts the coded text into respective text sets which may be further analyzed using text frequency counts, text searches, automated theme and sub-theme extractions (topic modeling), and data visualizations.
Eavesdropping on the Twitter Microblogging SiteShalin Hai-Jew
Â
Research analysts go to Twitter to capture the general trends of public conversations, identify and profile influential accounts, and extract subgroups within larger collectives and larger discourses; they also go to eavesdrop on individual self-talk and individual-to-individual conversations. So what is technically in your tweets, asked Dave Rosenberg famously in a CNET article (2010). The answer: a whole lot more than 140 characters. How are the most influential social media accounts identified through #hashtag graphs? How are themes extracted? How are sentiments understood? How can users be profiled through their Tweetstreams? How can locations be mapped in terms of the Twitter conversations occurring in particular physical areas? How can live and trending issues be identified and categorized in terms of sentiment (positive, negative, and neutral)? This presentation will summarize some of the free and open-source tools as well as commercial and proprietary ones that enable increased knowability.
Amit Sheth with TK Prasad, "Semantic Technologies for Big Science and Astrophysics", Invited Plenary Presentation, at Earthcube Solar-Terrestrial End-User Workshop, NJIT, Newark, NJ, August 13, 2014.
Like many other fields of Big Science, Astrophysics and Solar Physics deal with the challenges of Big Data, including Volume, Variety, Velocity, and Veracity. There is already significant work on handling volume related challenges, including the use of high performance computing. In this talk, we will mainly focus on other challenges from the perspective of collaborative sharing and reuse of broad variety of data created by multiple stakeholders, large and small, along with tools that offer semantic variants of search, browsing, integration and discovery capabilities. We will borrow examples of tools and capabilities from state of the art work in supporting physicists (including astrophysicists) [1], life sciences [2], material sciences [3], and describe the role of semantics and semantic technologies that make these capabilities possible or easier to realize. This applied and practice oriented talk will complement more vision oriented counterparts [4].
[1] Science Web-based Interactive Semantic Environment: http://sciencewise.info/
[2] NCBO Bioportal: http://bioportal.bioontology.org/ , Kno.e.sisâs work on Semantic Web for Healthcare and Life Sciences: http://knoesis.org/amit/hcls
[3] MaterialWays (a Materials Genome Initiative related project): http://wiki.knoesis.org/index.php/MaterialWays
[4] From Big Data to Smart Data: http://wiki.knoesis.org/index.php/Smart_Data
A Comprehensive Guide to Data Science Technologies.pdfGeethaPratyusha
Â
In the fast-paced realm of data science, staying ahead requires a deep understanding of the tools and technologies that drive insights from data. From programming languages to advanced frameworks, the world of data science technologies is vast and dynamic. In this blog, we embark on a comprehensive guide, navigating through the essential tools that empower data scientists to unravel the mysteries hidden within datasets and shape the future of information analysis. For those seeking a structured and immersive learning experience, complementing this tech-centric journey with a well-crafted data science course is the key to unlocking boundless opportunities in this evolving field.
http://www.escience2009.org/
Web Semantics in Action: Web 3.0 in e-Science
11:00 â 11:25 Torsten Reimer: Classifying the (digital) Arts and Humanities
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...eMadrid network
Â
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for Supporting discovery and reuse of OER. An approach based on Social Networks Analysis and Linked Open Data
This is the presentation of the Juan Cruz-Benitoâs PhD âOn data-driven systems analyzing, supporting and enhancing usersâ interaction and experienceâ that was defended on September 3rd, 2018 in the Faculty of Sciences at University of Salamanca Spain. This PhD was graded with the maximum qualification âSobresaliente Cum Laudeâ.
This presentation was provided by Gerald Benoit of Simmons College during the NISO webinar, Enabling Discovery and Retrieval of Non-Traditional and Granular Content, held on June 7, 2017
Sands Fish - Knowing in the Age of Networked Knowledgesandsfish
Â
Knowledge representation has become extremely complex since the advent of the internet, online education, and commons-based peer production. This talk discusses the thresholds we've crossed and what it means to know something when knowledge is massively interlinked.
http://wiki.knoesis.org/index.php/MaterialWays
http://www.knoesis.org/?q=research/semMat
http://wiki.knoesis.org/index.php/MaterialWays
Abstract
The sharing, discovery, and application of materials science and engineering data and documents are possible only if domain scientists are able and willing to do so. We need to overcome technological challenges such as the development of convenient computational tools and repositories conducive to easy exchange, curation, attribution, and analysis of data, and cultural challenges such as proper protection, control, and credit for sharing data. Our thesis and value proposition is that associating machine-processable semantics with materials science and engineering data and documents can provide a solid foundation for overcoming challenges associated with data discovery, integration, and interoperability caused by data heterogeneity. Specifically, easy to use and low upfront cost lightweight semantics in the form of file-level annotation can enable document discovery and sharing, while deeper data-level annotation using standardized ontologies can benefit semantic search and summarization. Machine processability achieved through fine-grained semantic annotation, extraction, and translation can enable data integration, interoperability and reasoning, ultimately leading to Linked Open Materials Science Data. Thus, a different granularity of semantics provides a continuum of cost/ease of use and expressiveness trade-off. In this presentation, we also show the application of semantic techniques for content extraction from materials and process specifications which are semi-structured and table-rich, and the application of semantic web techniques and technologies for materials vocabulary integration and curation (via semantic media wiki), semantic web visualization, efficient representation of provenance metadata and access control (via singleton property), and biomaterials information extraction
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014James Powell
Â
The Internet represents the connections among computers and devices, the world wide web is a network of interconnected documents, and the semantic web is the closest thing we have today to a network of interconnected facts. Noticeably absent from these global networks is any sort of open, formal representation for an online global social network. Each users' online presence, and its immediate social network, are isolated and typically only available within the confines of the social networking site that hosts it. Discovery across explicit online social networks and implicit social networks such as those that can be inferred from co-authorship relationships and affiliations is, for all practical purposes, impossible. And yet there are practical and non-nefarious reasons why an organization might be interested in exploring portions of such a network. Outreach is one such interest. Los Alamos National Laboratory (LANL) prototyped EgoSystem to harvest and explore the professional social networks of post doctoral students. The project's goal is to enlist past students and other Lab alumni as ambassadors and advocates for LANL's ongoing mission. During this talk we will discuss the various technologies that support the EgoSystem and demonstrate some of its capabilities.
Long nonfiction chapters are not in-style and may never have been. Where average chapter lengths of nonfiction book chapters are about 4,000 â 7,000 words in length, some may be several times that max range number. The explanation is that there is some irreducible complexity that that chapter addresses that cannot be addressed in shorter form. This slideshow explores some methods for writing longer chapters while still maintaining coherence, focus, and reader interestâŚand while using some technological tools to write and edit more efficiently.
Overcoming Reluctance to Pursuing Grant Funds in AcademiaShalin Hai-Jew
Â
Starting as an organizationâs new grant writer can be a challenge, especially in a case where there has been a time lapse since the last one left. People get out of the habit of pursuing grant funds. This slideshow addresses some of the reasons for such reluctance and proposes some ways to mitigate these.
Writing grants is one common way that those in institutions of higher education may acquire some fundsâsmall and big, one-off and continuingâto conduct research, hire faculty and researchers and learners and others, update equipment, update or build up new buildings, and achieve other work. This slideshow explores some aspects of the work of grant writing in the present moment in higher education.
Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...Shalin Hai-Jew
Â
The SARS-CoV-2 pandemic inspired several years of experimentation with common or folk art, involving mixed media, alcohol ink painting, and other explorations. Then, with the emergence of art-making generative AIs, there were further experiments, particularly with one that enables generation of visuals from scanned art and photos, text prompts, style overlays, and text-based visual modifiers. While both types of artmaking are emotionally satisfying and helpful for stress management, there are some contrasting differences. This exploratory slideshow explores some of these differences in order to partially shed light on the informal usage of an art-making generative AI (artificial intelligence).
Creating Seeding Visuals to Prompt Art-Making Generative AIsShalin Hai-Jew
Â
Art-making generative AIs have come to the fore. A basic work pipeline typically involves starting with text prompts -> generated images. That image may be used to seed further iterations. Deep Dream Generator (DDG) enables the application of âmodifiersâ of various types (artist styles, visual adjectives, others) to be applied in addition to the text prompt.
Another approach involves beginning with a âseeding image,â a born-digital or digitized (born-analog) visual on which AI-generated art may be based for a multi-channel and multi-modal prompt. This slideshow provides some observations of how to think about seeding images, particularly in terms of how the DDG handles them, with its âalgorithmic pareidoliaâ (âDeep Dream,â Wikipedia, July 3, 2023).
Human art-making is often about throwing mass-scale conversations. Artists are thought to help bridge humanity into the future. Whether generative AI art enables this or not is still not clear.
Common Neophyte Academic Book Manuscript Reviewer MistakesShalin Hai-Jew
Â
The work of academic book reviewing, as a volunteer (most often), is a common academic practice. The presenter has served as a neophyte one for some years before settling into this invited volunteer work for several decades. There have been lessons learned over time about avoidable mistakesâŚfrom both experience and observation.
Fashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AIShalin Hai-Jew
Â
CrAIyon (formerly DALL-E after Salvador âDaliâ) is a web-facing art-making generative AI tool online (https://www.craiyon.com/) that enables the uses of text (and image) prompts for the creation of watermarked, lightweight visuals. Counterintuitively, the rough visuals are much more usable for recombinations and remixes and recreations into usable digital visuals for various digital learning objects. The textual prompts are not particularly intuitive because of how the generative AI program was trained on mass-scale visuals). There is an art and occasional indirection to working prompts after each try, with the resulting nine-image proof sheets that CrAIyon outputs. The tool can be used iteratively for different outputs.
The tool sometimes turns out serendipitous surprises, including an occasional work so refined that it can be used / shared almost unedited. One challenge in using CrAIyon comes from their request for credit (for all non-subscribers to their service). Another comes from the visual watermarking (orange crayon at the bottom right of the image). However, this tool is quite useful for practical applications if one is willing to engage deep digital image editing (Adobe Photoshop, Adobe Illustrator).
Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...Shalin Hai-Jew
Â
Augmented reality (AR)âthe use of digital overlays over physical spaceâmanifests in a wide range of spaces (indoor, outdoor; virtual) and ways (in real space (with unaided human vision); in head gear; in smart glasses; on mobile devices, and others). There are various authoring technologies that enable the making of AR experiences for various users. This work uses a particular tool (Adobe AeroÂŽ) to explore ways to build AR for multiple dimensions, including the fourth dimension (motion, changes over time).
Based on the respective purposes of the AR experience, some basic heuristics are captured for
space design (1),
motion design (2),
multiple perception design (sight, smell, taste, sound, touch) (3),
and virtual- and tangible- interactivity (4).
Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...Shalin Hai-Jew
Â
One of the extant questions about augmented reality (AR) is how (in)effective it is for the teaching and learning in various formal, nonformal, and informal contexts. The research literature shows mixed findings, which are often highly context-based (and not generalizable). There are some non-trivial costs to the design/development/deployment of AR for teaching and learning. For the users, there is cognitive load on the working memory [(1) extraneous/poor design, (2) intrinsic/inherent difficulty in topic, and (3) germane/forming schemas]. For teachers, there are additional knowledge, skills, and abilities / attitudes (KSAs) that need to be brought to bear.
Exploring the Deep Dream Generator (an Art-Making Generative AI) Shalin Hai-Jew
Â
The Deep Dream Generator was created by Google engineer Alexander Mordvintsev in 2014. It has a public facing instance at https://deepdreamgenerator.com/, which enables people to use text prompts and image prompts (individually or in combination) to inspire the art-generating generative AI to output images. This work highlights some process-based walk-throughs of the tool, some practical uses, some lightweight art learning, some aspects of the online social community on this platform, and other insights. Some works by the AI prompted by the presenter may be seen here: https://deepdreamgenerator.com/u/sjjalinn.
(This is the first draft of a slideshow that will be used in a conference later in the year.)
Augmented Reality for Learning and AccessibilityShalin Hai-Jew
Â
Recently, the presenter conducted a systematic review of the academic literature and an environmental scan to learn how to set up an augmented reality (AR) shop at an institution of higher education. The ambition was to not only set up AR in an accessible and legal way but also be able to test for potential +/- effects of AR on teaching and learning. The research did not go past the review stage, because of a lack of funding, but some insights about accessibility in AR were acquired.
(The visuals are from Deep Dream Generator and CrAIyon.)
Engaging Pixabay as an open-source contributor to hone digital image editing,...Shalin Hai-Jew
Â
This slideshow describes the author's early experiences with creating two accounts on Pixabay in order to advance digital editing skills in multimedia. The two accounts are located at https://pixabay.com/users/sjjalinn-28605710/ and https://pixabay.com/users/wavegenerics-29440244/ ...
This work explores four main spaces where researchers publish about educational technology: academic-commercial, open-access, open-source, and self-publishing.
Human-Machine Collaboration: Using art-making AI (CrAIyon) as cited work, o...Shalin Hai-Jew
Â
It is early days for generative art AIs. What are some ways to use these to complement one's work while staying legal (legal-ish)?
Correction: .webp is a raster format
Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...Shalin Hai-Jew
Â
University creative shops are exploring whether they can get into the game of producing AR-enhanced experiences: campus tours, interactive gaming, virtual laboratories, exploratory art spaces, simulations, design labs, online / offline / blended teaching and learning modules, and other AR applications.
This work offers a basic environmental scan of the AR space for online teaching and learning, and it includes pedagogical design leads from the current research, technological knowhow, hands-on design / development / deployment of learning objects, and online teaching and learning methods.
DevOps and Testing slides at DASA ConnectKari Kakkonen
Â
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder â active learning and UiPath LLMs for do...UiPathCommunity
Â
đĽ Speed, accuracy, and scaling â discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Miningâ˘:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing â with little to no training required
Get an exclusive demo of the new family of UiPath LLMs â GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
đ¨âđŤ Andras Palfi, Senior Product Manager, UiPath
đŠâđŤ Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Â
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
Â
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. Whatâs changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Â
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But thereâs more:
In a second workflow supporting the same use case, youâll see:
Your campaign sent to target colleagues for approval
If the âApproveâ button is clicked, a Jira/Zendesk ticket is created for the marketing design team
Butâif the âRejectâ button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
Â
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
Â
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties â USA
Expansion of bot farms â how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks â Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Â
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
Â
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Â
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overviewâ
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
JMeter webinar - integration with InfluxDB and Grafana
Â
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
1. AUTOMAP-PING TEXTS FOR
HUMAN-MACHINE ANALYSIS
AND SENSE-MAKING
SHALIN HAI-JEW
KANSAS STATE UNIVERSITY
SIDLIT 2014 (OF C2C )
JULY 31 â AUG. 1, 2014
2. PRESENTATION OVERVIEW
⢠Today, there are masses of texts being generated and shared publicly. There are
microblogging Tweet streams and conversations; long-running blogs and wikis; open-ended
text responses on large-scale surveys; digitally-released novels, and machine-generated texts
(such as SciGen). In LMSes, there are numerous threads of student-generated conversations.
Text has long been plumbed for meaningâbased on context and so-called âclose readingâ
by scholars. Of late, widely accessible computational tools have enabled text-mining.
AutoMap and ORA NetScenes are tools created by CASOS (Center for Computational
Analysis of Social and Organizational Systems) of Carnegie Mellon University that enable
basic text-mining and the visualization of networked text in 2D and 3D formats. AutoMap, a
text-mining tool, offers some basic methods for sentiment analysis, the extraction of ngrams,
the definition of network text relationships, and other revelatory insights.
2
3. WELCOME!
⢠Hi! Who are you? ď
⢠What are your research areas of interest? What sorts of text datasets do
you have access to that you might want to use for research?
⢠What have your experiences been with network-based text analysis (if any)?
Any experiences with AutoMap? ORA-NetScenes?
⢠What would you like to learn during this session?
3
4. A REVIEW OF TERMS
⢠Text network analysis: The study of connectivity between words and phrases in a
text in order to identify core meanings in a text
⢠AutoMap: A software tool that enables the mapping of textual data based on
Network Text Analysis
⢠N-gram: A contiguous string that serves as a unit in computational linguistics
⢠ORA NetScenes: A software tool that enables data extraction from social media
platforms and the visualization of relational data in network graphs
⢠DyNetML: XML interchange language containing relational and network data
4
5. A REVIEW OF TERMS (CONT.)
⢠Text mining: Extracting meaning from text
⢠Text-level concepts: Specific concepts (ideas) from the contents of the dataset
⢠Higher-level concepts: Generalized text-level concepts (ideas)
⢠Semantics: Meaning contained in words and phrases
⢠Syntax: Orderly rule-based structure of words and phrases to make meaning in a
language
⢠Anaphora: A word which refers to a prior used word (to avoid repeating the initial
term) 5
6. A REVIEW OF TERMS (CONT.)
⢠Scalability: The ability to handle larger and larger amounts of data; the ability to
âscale upâ or âscale downâ adaptably
⢠Big data: n = all
⢠Script: A human-readable high-level programming language
⢠Manual or hand coding: The human extraction of nodes / codes through a close
reading of text in a dataset (vs. automated coding or âautocodingâ using automatic
extractions or human-created thesauruses applied to text in a computer tool)
6
7. A REVIEW OF TERMS (CONT.)
⢠Data extraction: The retrieval or downloading of data from a database
⢠Meta-network: An extracted network (from text corpuses) which is created either
automatically or through a human-coded thesaurus (or a mix of both)
⢠Graphical User Interface (GUI): A screen-based user interface that enables people
to interact with a computing system
⢠Encapsulation: The hiding of complexity within a simple interface (which may mask
the actual functions of the software tool to users)
7
8. MACHINE-BASED TEXT ANALYSIS
8
Notes: This presentation is really to provide an overview of some
generally readily available capabilities. This is not to show
necessarily âhowâ to achieve this in this short session.
Also, the presenter has only recently started experimenting with
this software tool. This only shows one basic âuse caseâ with many
other software functionalities and use cases left unaddressed.
9. WHY NETWORK TEXT ANALYSIS?
⢠A Structured Way of Understanding the Main Concepts and Concept
Interrelationships in Text Corpuses: The application of network science to texts,
enabling fast captures of âgistsâ both within and between text corpuses
⢠Plenty of Source Material: Broad prevalence of âbig dataâ for analysis (archival
data, digitized materials, social media contents, online prosumer-created contents,
and others), which requires speed and scalability to reduce data to a manageable
size, to ultimately capture gist and meaning
⢠Scalability and Efficiencies: The ability to harness computational resources to map
texts and text corpuses speedily and then to create data visualizations 9
10. WHAT IS NETWORK-BASED TEXT ANALYSIS?
⢠Decomposition (and decontextualization) of language as symbols and code
(semantics / meaning and syntactics / non-meaning structures / parts of speech
tagging)
⢠Application of a âbag of wordsâ (multiset) approach
⢠Use of a âwindowâ to look at word proximity as an indicator of relatedness and
meaning (semantics); frequency counts of ngrams (words, symbols, phrases);
clustering; word-pair linkages; adjacency matrices, and other statistical approaches
⢠Frequency as an indicator of importance and focus of the text corpuses
⢠A form of text mining through systematic and quantitative analysis of texts
⢠Lexical link analysis is a type 10
11. WHAT IS NETWORK-BASED TEXT ANALYSIS? (CONT.)
⢠Scalable with computational machine support for large-scale textual data
⢠Data reduction through stopwords / delete words lists; generalizations of
concepts to sentiments or other coding (machine-encoding and human-
encoding)
⢠Human-driven process, with researchers having to apply their knowledge of
the field
⢠Analysis based on statistical analysis, graph analysis, and close reading of some sample
documentsâŚalong with domain expertise
11
12. NETWORK-BASED TEXT ANALYSIS GRAPH
VISUALIZATIONS
UNDERLYING STATISTICS
⢠Matrix analysis
⢠Windowing (proximity measures)
⢠Centrality-betweenness measures
⢠Centrality-eigenvector
⢠Clique (subnetwork) count
⢠Geodesic distance,
⢠Probability, and others
GRAPH VISUALIZATIONS
⢠Meta-networks depicted as node-link diagrams
(words and phrases as actors / entities; relatedness
as links / edges)
⢠Interdependence of nodes to make meaning in the
texts
⢠Variety of graph layout algorithms based on
algebra
⢠2D or 3D
12
13. BASIC TENETS OF NETWORK TEXT ANALYSIS
⢠Networks of semantically linked concepts may be created to summarize some
aspects of the contents of a text or text corpus (to serve as mental maps)
⢠Quantitative methods have a role in complementing qualitative analysis and
other more traditional types of exploratory text analysis (critique)
⢠Enables textual-visual summarizations of texts and text corpuses
⢠May be applied to large-scale text corpuses
13
26. RESEARCH DESIGN
⢠Research objective(s); definition of
research questions and data queries
⢠Identification and acquisition of textual
datasets (access matters)
⢠Openness to discovery
26
27. CLASSIC INSTITUTIONAL REVIEW BOARD (IRB) ROLE
IN REVIEW OF HUMAN SUBJECTS RESEARCH
⢠âCommon Ruleâ of IRB oversight (particularly in research related to health and other
sensitive issues)
⢠Multi-domain and trained IRB team analyzes all IRB proposals: the entire research
study plan, the research rationale, the research team (and their credentialing and
knowledge of research ethics and practices), recruitment strategies for participants
(and the fair treatment of people), participant recruitment materials, informed
consent forms, the maintenance of data throughout and after the study
⢠Caution with vulnerable populations, potential harm, any deception, and other
aspects
27
28. INSTITUTIONAL REVIEW BOARD (IRB) ROLE IN THE
ANALYSIS OF SOCIAL MEDIA CONTENTS?
YES TO IRB GUIDANCE
⢠Potential intrusiveness of social media
data when analyzed using a variety of
mining tools (structure mining, text
mining, and others) beyond general
publicâs knowledge
⢠Ease of re-identification of individuals
with a few data points (with resulting
privacy violations)
NO TO IRB GUIDANCE
⢠The data is already public and broadly
available
⢠Infeasible to contact all to get
permissions to use their information
⢠Already have rights to the data within
the EULAs of the social media platforms
28
29. INSTITUTIONAL REVIEW BOARD (IRB) ROLE IN THE
ANALYSIS OF SOCIAL MEDIA CONTENTS? (CONT.)
YES TO IRB GUIDANCE
⢠Control for a researcher spontaneously
going off the approved plan and
interacting with the participants
(continuing oversight of IRB)
⢠IRB seeing potential unintended
consequences
⢠Some legal cover
NO TO IRB GUIDANCE
⢠If data properly handled, no one will
be identified to the individual person
⢠IRB process requires effort and time
29
30. SOME TENETS IN THE FAIR TREATMENT OF PEOPLE
IN RESEARCH
PROPER TREATMENT
⢠Sufficiently informed?
⢠Protected from harm?
⢠Ultimately benefitting from the
research?
⢠Sufficient opt-out?
IMPROPER TREATMENT
⢠Unaware and uninformed about the data use?
⢠Exposed to potential harm (risk) or actual
harm
⢠Left out from actual benefits of the research
⢠âOutedâ or identified / re-identified against
their awareness or will?
⢠Unable to opt-out
30
31. IDENTIFICATION AND ACQUISITION OF DATA FOR
TEXT CORPUSES
⢠Some types of text-based datasets
⢠Raw from-world datasets (microblogging data from #hashtag conversations,
Tweetstreams, wiki data)
⢠Formal and semi-formal datasets from periodicals (vetted by journalists and editors)
⢠Processed datasets from other studies
⢠Large-scale datasets made available to the public for study, and others
31
32. QUALITY ISSUES WITH TEXTUAL DATASETS
⢠Factual vetting or not?
⢠Translated or not (and the possible unintended introduction of error)?
⢠Provenance?
⢠Digitized paper copies (with potential introduced errors from the OCR scanning)?
⢠Transcription of audio or video (with potential introduced errors from the
transcription)? Etc.
⢠Comprehensiveness of the text corpuses?
32
33. CREATION AND SELECTION OF TEXT CORPUS(ES)
DATA STRUCTURED TEXT
⢠Labeled text in databases and
spreadsheets
DATA UNSTRUCTURED TEXT
⢠May relate to textual contents from
social media platforms (like #hashtag
conversations around particular issues,
#eventgraphs, Tweet streams, blogs,
websites, or other types of textual
information)
⢠Manuscripts or other collections of text
33
34. DATA PRE-PROCESSING
⢠Combinations of various texts into oneâŚor into separate but related text
corpuses that will be processed together (.txt)
⢠Effectively removes images and formatting
⢠Creation of a âstopwordsâ / âdeleteâ lists
⢠Removal of ânoiseâ (conjunctions, articles, proper nouns, relative pronouns, and
words with syntactic but often not semantic value) from the data to leave the
semantic data; removal of punctuation; removal of repeated data
⢠Use of the âremainder dataâ
34
35. DATA CLEANING (AKA SCRUBBING, CLEANSING)
⢠Application of the universal stopwords list, then the domain stopwords list
⢠Getting data into the optimal state on which to run queries without losing critical
data (and without limiting the findings from the tool)
⢠Elimination of corrupt records (which may crash the software)
⢠Elimination of duplicate data (which can skew results)
⢠Filling in gaps in the data (which may introduce error)
⢠Omitting inaccurate records (which may introduce noise)
⢠Often done per runâŚwithout permanent effects on the textual corpuses
35
36. DATA CODING
⢠A nontrivial and effortful process of structuring the data extraction through the uses
of âthesaurusesâ (to reduce the data using delete lists and then to identify out
important concepts and generalize the contents into this structure) using AutoMap
⢠Creation of generalization thesauruses by machine or by hand or by both
⢠Generalization thesauruses serve as a kind of code-book for AutoMap
⢠Application of the Concept List / Union Concept List
⢠Data run
⢠Data visualization of meta-networks
⢠The inherent ambiguity of language (synonymy / synonymous; polysemy or multi-
meaningnedness / openness to interpretation) 36
38. STATISTICAL ANALYSES
⢠Network structure (degree distribution, density, linkages, geodesic structure)
⢠Edges (weight)
⢠Paths (degrees of separation between words)
⢠Nodes (degree, position of words or phrases in the text document or text corpuses;
frequency counts)
⢠Hubs or regions or topics of interest
⢠Subgraph densities (interconnected semantic clusters or fields)
⢠Clusters (connected components based on clustering coefficient, filtering, to show
deeply related ideas or concepts) 38
39. GRAPH VISUALIZATION ANALYSES
TEXT-BASED GRAPHS
⢠Word hubs with linkages and branching
off
⢠Linked words and phrases
⢠The identification of themes
⢠Knowledge structures
⢠Center-periphery dynamics of texts
OTHER RELATED VISUALIZATIONS
⢠Word-tree diagrams and linked
ideas
⢠Word clouds
39
40. âPRESSURE TESTINGâ THE RESULTS:
VALIDATION, INVALIDATION
HistoricallyâŚ
⢠Computational models have been compared against human experts
evaluations whoâve read the close datasets and created ontologies or other
âmapsâ of the data
⢠Datasets have been run on various software tools to compare outputs
⢠Comparability to the researchersâ close readings of a sampling of the
information; comparability to the researchersâ domain expertise and
knowledge of the field 40
41. âPRESSURE TESTINGâ THE RESULTS:
VALIDATION, INVALIDATION (CONT.)
Quality Sample Question List
⢠How comprehensive or inclusive is the extracted meta-network? What data is
not appearing (which should appear)? Any anomalies? Are these anomalies
informative?
⢠How apparently accurate is the frequency count for main terms or phrases or
agents? Any insights (not knowable otherwise)?
⢠How (in)accurate are the relationships identified by the visualization?
41
42. âPRESSURE TESTINGâ THE RESULTS:
VALIDATION, INVALIDATION (CONT.)
⢠How coherent or intelligible are the visualizations?
⢠What new insights have been raised by the network-based text analysis?
⢠How useful are these insights or leads in follow-up research through other
channels?
⢠From such validation / invalidation feedback, there may be changes to the
research design, including the choice of datasets, the data scrubbing, the
types of data visualizations, and the analytical techniques.
42
44. AUTOMAP
BY CASOS (CENTER FOR COMPUTATIONAL ANALYSIS OF SOCIAL AND ORGANIZATIONAL SYSTEMS) AT
CARNEGIE MELLON UNIVERSITY
AUTOMAP = (HUMAN-MACHINE) MANUAL AND AUTOMATED CODING -> TEXT MAP (GRAPH)
44
45. A BRIEF HISTORY OF THE TOOL
⢠Originated in 2001 (albeit with years of lead-up work and research)
⢠Enables map analysis of texts (including meta-matrix analysis and sub-matrix analysis)
⢠Latest Version: AutoMap 3.0.10.18 for Windows 32-bit and Windows 64-bit
⢠AutoMap Userâs Guide 2013 (Carley, Columbus, & Landwehr, 2013)
45
46. CONDITIONS FOR USE
⢠Research usage only; commercial usage possible through Netanomics
⢠Crediting required:
⢠COPYRIGHT (c) 2001-2014 Kathleen M. Carley - Center for Computational Analysis of
Social and Organizational Systems (CASOS), Institute for Software Research International
(ISRI), School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue -
Pittsburgh, PA 15213-3890 - ALL RIGHTS RESERVED.
46
48. TYPES OF DATA EXTRACTIONS FROM AUTOMAP
FOR NETWORK MAPPING
1.âcontent (concepts, frequencies and meta-data such as sentence length);
2.semantic networks (concepts and relationships);
3.meta-networks (ontologically coded concepts and relationshipsânamed
entities and links);
4.sentiment and node attributes (attributes of named entities)â (Bigrigg, Carley,
Kunkel, Diesner, Eisenberg, Chieffallo, & Columbus, 2010)
48
49. DATA EXTRACTORS FROM SOCIAL MEDIA
⢠Blogs
⢠Facebook
⢠Email
⢠Newsgroups
⢠RSS feeds
⢠Twitter
⢠Web
⢠Wiki
49
50. SIX TYPES OF THESAURUS FORMATS
1. Single column format for stopwords / delete lists
2. Two-column generalization format
3. Two-column meta-network format
4. Master format
5. Reduced format
6. Change format (Sangal, Carley, Altman, & Martin, 2012)
50
51. TYPE 1: SINGLE COLUMN FORMAT (FOR
STOPWORDS / DELETE LISTS)
⢠Column A1
⢠No column header
⢠List of words which should not appear in the dataset to be analyzed
⢠A âstopwordsâ or âdeleteâ list (pronouns, prepositions, punctuation, verbs,
articles, prepositions, and proper nouns, etc.)
⢠Data reduction
51
53. TYPE 2: TWO-COLUMN GENERALIZATION FORMAT
⢠No column header
⢠Changing text-level concepts to high-level concepts (moving from the specific
to the general)
⢠One example is a two-column list associating emails in Column A to names of the persons
in Column B (with underscores used in lieu of spaces as in firstname_lastname)
⢠Another example may be the conversion of specific dates (Column A) into larger
categories like years (Column B)
⢠Generalization function / coding / data reduction
53
54. TYPE 3: TWO-COLUMN META NETWORK FORMAT
⢠No column header
⢠High-level (generalization) concepts -> metaOntology (node class: agent,
resource, knowledge, task, organization, location, event, action, belief, role,
group)
54
56. TYPE 4: MASTER FORMAT
⢠The two-column format plus attribute information
⢠Column headers: conceptFrom; conceptTo; metaOntology (node class: agent,
resource, knowledge, task, organization, location, event, action, belief, role, group);
metaName (metatype) (generic or specific concept)
⢠Text from the mss in A (conceptFrom); text-level concept in B (conceptTo); node class in
C (metaOntology); whether concept is generic or specific in D (metaName)
⢠Also known as Union Concept List, Concept List
56
59. CONCEPT LIST WITH PART OF SPEECH (POS)
EXTRACTIONS
(7,238 CONCEPTS, 1,911 UNIQUE CONCEPTS)
59
⢠Can select certain
concepts and resave
the list as a âdelete
listâ
60. TYPE 5: REDUCED (OR REVIEW) FORMAT
⢠7 columns
⢠Column headers: FREQUENCY, CURRENT_CONCEPT, NEW_CONCEPT,
CURRENT_METAONTOLOGY, NEW_METAONTOLOGY, CURRENT_METATYPE,
NEW_METATYPE
⢠Subsumes Types 2, 3, and 4
⢠New_Metaontology enables deletions of words, ignoring of words, and splitting of phrases
⢠Enables the changing of the meta-network structure
Also known as Review Format
60
62. TYPE 6: CHANGE FORMAT
⢠Comprises 10 columns
⢠Subsumes all prior formats of thesauruses
⢠Column headers: NUMBER_OF_TEXTS, FREQUENCY, CURRENT_CONCEPT, NEW_CONCEPT,
CURRENT_METAONTOLOGY, NEW_METAONTOLOGY, CURRENT_METATYPE, NEW_METATYPE,
casosWhat, casosWhy
⢠NUMBER_OF_TEXTS: The number of texts containing the concept
⢠casosWhat: Name of the thesaurus file âfrom where the concept comesâ (Sangal, Carley, Altman, & Martin,
2012, p. 9)
⢠casosWhy: Explanation for mapping from Initial concept to the New concept
⢠Other columns may be added to allow for more complex graph visualizations and data queries
62
63. REMIXING OF THESAURUSES
⢠AutoMap enables thesauruses to be combined and re-arranged through
âsplitâ and âmergeâ routines
⢠Universal thesauruses, for example, may be collated from a range of other
thesauri
⢠Domain thesauri are specific to a project and contain only entities related to
the project (and unique to the project); domain thesauri have precedence over
universal thesauri (in terms of coding)
63
64. ORA-NETSCENES
BY CASOS (CENTER FOR COMPUTATIONAL ANALYSIS OF SOCIAL AND ORGANIZATIONAL SYSTEMS) AT
CARNEGIE MELLON UNIVERSITY
64
65. A BRIEF HISTORY OF THE TOOL
⢠ORA NetScenes 3.0.8.6 available for download in Windows 32-bit and 64-bit
formats (from March 12, 2014)
⢠Created in 2001
⢠Available for non-commercial research use
⢠Crediting required:
⢠COPYRIGHT (c) 2001-2014 Kathleen M. Carley - Center for Computational Analysis of Social
and Organizational Systems (CASOS), Institute for Software Research International (ISRI),
School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue - Pittsburgh, PA
15213-3890 - ALL RIGHTS RESERVED.
65
66. TWO-DIMENSIONAL OR
THREE-DIMENSIONAL
VISUALIZATIONS
2D
⢠Offers relative clarity of layout even
with some fairly complex graphs
⢠Provides a fine sense of overview
⢠Works better as a print visual
3D
⢠Can evoke a sense of complexity and
interdependencies (but generally only a
small piece at a time)
⢠Offers interactive exploration through a
semi-immersive visualization
66
67. SOME VARIATIONS
⢠Sentiment Analysis: Using Concept Lists / Union Concept Lists (generalization
thesauruses) to categorize sentiment based on common phrases in the text
corpuses -> general sentiments
⢠The State of a Field: Application of network-based text analysis to the entire
corpuses of domain fields to understand salient concepts (or clusters of studies)
in that field âŚand analyses of bibliographies and references lists (for co-
author networks and related topics and years of varying productivity)
67
69. A REVIEW OF THE PROCESS
⢠Objectives
⢠Research Design
⢠Institutional Review Board (IRB) Process
⢠Text Corpus Selection
⢠Stopwords or Delete Lists
⢠Concept Lists / Union Concept Lists
(Generalization Thesauruses)
⢠Machine-generated ones tend to be quite
inclusive and complex (requiring whittling
down)
⢠Text Processing
⢠Graph Visualizations Created
⢠Human-Based Analysis
69
70. POTENTIALS?
⢠What are some ways that these software tools and methods may be used in
your areas of expertise?
⢠Any hesitations? Concerns?
70
72. CURRENT RESEARCH
Textually
⢠Applications to computational folkloristics and domain-specific knowledge structures
⢠Applications to sentiment analysis from keyword extractions from microblogging
corpuses (conversations, account Tweet streams) and Web networks
Organizationally
⢠Applications to terror networks, covert networks, and law enforcement interests
⢠Applications to community resilience for emergency responses
72
73. LEARNING THE METHODS AND TOOLS
⢠Understand the rationale for the tools. (There are plenty of widely available
publications related to the toolsâ origins and the rationales of the development
teams.)
⢠Acclimate to the tools first and actually sample their capabilities in depth (by
using a simple text corpus initially).
⢠Itâs critical to know the datasets well. This includes knowledge of where the
data came from, how itâs cleaned, and other aspects.
73
74. LEARNING THE METHODS AND TOOLS (CONT.)
⢠Automation of text mining provides different insights than manual reading.
Such machine-based text analysis offers some types of knowledge and not
others. When making assertions, itâs important to be nuanced and clear about
what is being asserted (gist, broad themes, possible relationships between
semantics and concepts) and what isnât (in-depth and fine-tuned insights).
⢠Text network analysis offers evocative visualizations in both 2D and 3D.
Visualizations are eye-catching, but they have to be presented properly to be
valuable in an information setting.
74
75. LEARNING THE METHODS AND TOOLS (CONT.)
⢠There may be machine-based artifacts in terms of the data. This is why itâs
critical to understand both the text corpuses and the software tools thoroughly.
It is important to explain the findings with clarity (to avoid misconceptions).
⢠Both software tools can ingest a wide range of data. AutoMap can ingest
text data; it can ingest all sorts of thesauruses. ORA NetScenes can ingest
various DyNetML files to output graph visualizations.
75
76. LEARNING THE METHODS AND TOOLS (CONT.)
⢠Researchers will need to bring their domain knowledge to bear on the tool
and the network-based text analysis findings. Generally, this does not work as
a stand-alone tool but a complementary one with other research methods and
insights.
⢠It helps to read up on the research literature to get a sense of the language
used around assertions of analytical value based on the graphs and graph
metrics.
⢠It helps to read up on dynamic network-based text analysis to see how this is
applied in even more challenging real-time contexts.
76
77. DESIRABLE RESEARCHER SKILLSETS
⢠Grammar and syntax (linguist skills)
⢠Mid-level statistical savvy
⢠Mid-level computational skills and
methodical step-by-step approaches
⢠Basic data structures and handling
of datasets
⢠Graph analysis and spatial
reasoning (the identification of
patterns)
⢠Wide reading
⢠Traits: Patience and doggedness
(even if forced by self-will)
77
79. RELATED LINKS
Software Downloads
⢠AutoMap: http://www.casos.cs.cmu.edu/projects/automap/
⢠ORA Software: http://www.casos.cs.cmu.edu/projects/ora/software.php
Support Group
⢠ORA Google Group:
http://www.casos.cs.cmu.edu/projects/automap/ORAGoogleGroup.php
Originating Organization
⢠Center for Computational Analysis of Social and Organizational Systems (CASOS of
Carnegie Mellon University): http://www.casos.cs.cmu.edu/ 79
80. OTHER GRAPH VISUALIZATION TOOLS
Free and Open-Source Tools
⢠NodeXL (Network Overview, Discovery and Exploration for Excel):
http://nodexl.codeplex.com/ (of CodePlex)
⢠Pajek: http://pajek.imfm.si/doku.php
⢠Gephi: https://gephi.org/
Commercial Tool
⢠UCINET and NetDraw (Analytic Technologies): http://www.analytictech.com/
80
81. REFERENCES
⢠Bigrigg, M.W., Carley, K.M., Kunkel, F., Diesner, J., Eisenberg, T., Chieffallo, D.,
& Columbus, D. (2010). AutoMap: Extracting usable information from
unstructured texts.
⢠Sangal, A., Carley, K.M., Altman, N., & Martin, M.K. (2012). Creating, using
and updating thesauri files for AutoMap and ORA. CASOS Technical Report:
CMU-ISR-12-108.
81
82. CONCLUSION AND CONTACT
⢠Dr. Shalin Hai-Jew
⢠Instructional Designer
⢠Information Technology Assistance Center
⢠Kansas State University
⢠212 Hale Library
⢠785-532-5262
⢠shalin@k-state.edu
⢠The presenter has no ties with CASOS or Carnegie Mellon University.
82