This document provides an introduction to text analytics. It discusses perspectives on text analytics from different roles like IT support, researchers, and solution providers. It explains how text analytics can boost business results by analyzing unstructured text data from sources like emails, social media, surveys etc. It discusses how text analytics transforms information retrieval to information access by extracting semantics, entities, topics and relationships from text. It also provides definitions and explanations of key concepts in text analytics like entities, features, metadata, natural language processing, information extraction, categorization, classification and evaluation metrics.
In this talk we outline some of the key challenges in text analytics, describe some of Endeca's current research work in this area, examine the current state of the text analytics market and explore some of the prospects for the future.
In this talk we outline some of the key challenges in text analytics, describe some of Endeca's current research work in this area, examine the current state of the text analytics market and explore some of the prospects for the future.
Slides for the class, From Pattern Matching to Knowledge Discovery Using Text Mining and Visualization Techniques, presented June 13, 2010, at the Special Libraries Association 2010 annual meeting.
This presentation introduces text analytics, its applications and various tools/algorithms used for this process. Given below are some of the important tools:
- Decision trees
- SVM
- Naive-Bayes
- K-nearest neighbours
- Artificial Neural Networks
- Fuzzy C-Means
- Latent Dirichlet Allocation
This is an introduction to text analytics for advanced business users and IT professionals with limited programming expertise. The presentation will go through different areas of text analytics as well as provide some real work examples that help to make the subject matter a little more relatable. We will cover topics like search engine building, categorization (supervised and unsupervised), clustering, NLP, and social media analysis.
https://www.youtube.com/watch?v=nvlHJgRE3pU
Won ITAC Graduation Projects Competition, ITAC ID: GP2015.R10.75
A web application that analyze big volumes of product reviews, social networks posts and tweets related to a given product. Then, present these results of this big data analytical job in a user friendly, understandable, and easily interpreted manner that can be used by different customers for different purposes.
Technologies used:
1- Hadoop
2- Hadoop Streaming
3- R Statistical
4- PHP
5- Google Charts API
Directed versus undirected network analysis of student essaysRoy Clariana
IWALS 2018
6th International Workshop on Advanced Learning Sciences
Perspectives on the Learner: Cognition, Brain, and Education
University of Pittsburgh, USA JUNE 6-8, 2018
A set of practical strategies and techniques for tackling vagueness in data modeling and creating models that are semantically more accurate and interoperable.
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...Gan Keng Hoon
This project covers several areas of research and development such as data storage and retrieval, domain analytics specification, natural language conversational modelling and query transformation. Prospective collaborators include dashboard solution providers, HPC centers and relevant interested parties.
The Process of Information extraction through Natural Language ProcessingWaqas Tariq
Information Retrieval (IR) is the discipline that deals with retrieval of unstructured data, especially textual documents, in response to a query or topic statement, which may itself be unstructured, e.g., a sentence or even another document, or which may be structured, e.g., a boolean expression. The need for effective methods of automated IR has grown in importance because of the tremendous explosion in the amount of unstructured data, both internal, corporate document collections, and the immense and growing number of document sources on the Internet.. The topics covered include: formulation of structured and unstructured queries and topic statements, indexing (including term weighting) of document collections, methods for computing the similarity of queries and documents, classification and routing of documents in an incoming stream to users on the basis of topic or need statements, clustering of document collections on the basis of language or topic, and statistical, probabilistic, and semantic methods of analyzing and retrieving documents. Information extraction from text has therefore been pursued actively as an attempt to present knowledge from published material in a computer readable format. An automated extraction tool would not only save time and efforts, but also pave way to discover hitherto unknown information implicitly conveyed in this paper. Work in this area has focused on extracting a wide range of information such as chromosomal location of genes, protein functional information, associating genes by functional relevance and relationships between entities of interest. While clinical records provide a semi-structured, technically rich data source for mining information, the publications, in their unstructured format pose a greater challenge, addressed by many approaches.
Slides for the class, From Pattern Matching to Knowledge Discovery Using Text Mining and Visualization Techniques, presented June 13, 2010, at the Special Libraries Association 2010 annual meeting.
This presentation introduces text analytics, its applications and various tools/algorithms used for this process. Given below are some of the important tools:
- Decision trees
- SVM
- Naive-Bayes
- K-nearest neighbours
- Artificial Neural Networks
- Fuzzy C-Means
- Latent Dirichlet Allocation
This is an introduction to text analytics for advanced business users and IT professionals with limited programming expertise. The presentation will go through different areas of text analytics as well as provide some real work examples that help to make the subject matter a little more relatable. We will cover topics like search engine building, categorization (supervised and unsupervised), clustering, NLP, and social media analysis.
https://www.youtube.com/watch?v=nvlHJgRE3pU
Won ITAC Graduation Projects Competition, ITAC ID: GP2015.R10.75
A web application that analyze big volumes of product reviews, social networks posts and tweets related to a given product. Then, present these results of this big data analytical job in a user friendly, understandable, and easily interpreted manner that can be used by different customers for different purposes.
Technologies used:
1- Hadoop
2- Hadoop Streaming
3- R Statistical
4- PHP
5- Google Charts API
Directed versus undirected network analysis of student essaysRoy Clariana
IWALS 2018
6th International Workshop on Advanced Learning Sciences
Perspectives on the Learner: Cognition, Brain, and Education
University of Pittsburgh, USA JUNE 6-8, 2018
A set of practical strategies and techniques for tackling vagueness in data modeling and creating models that are semantically more accurate and interoperable.
Project: Interfacing Chatbot with Data Retrieval and Analytics Queries for De...Gan Keng Hoon
This project covers several areas of research and development such as data storage and retrieval, domain analytics specification, natural language conversational modelling and query transformation. Prospective collaborators include dashboard solution providers, HPC centers and relevant interested parties.
The Process of Information extraction through Natural Language ProcessingWaqas Tariq
Information Retrieval (IR) is the discipline that deals with retrieval of unstructured data, especially textual documents, in response to a query or topic statement, which may itself be unstructured, e.g., a sentence or even another document, or which may be structured, e.g., a boolean expression. The need for effective methods of automated IR has grown in importance because of the tremendous explosion in the amount of unstructured data, both internal, corporate document collections, and the immense and growing number of document sources on the Internet.. The topics covered include: formulation of structured and unstructured queries and topic statements, indexing (including term weighting) of document collections, methods for computing the similarity of queries and documents, classification and routing of documents in an incoming stream to users on the basis of topic or need statements, clustering of document collections on the basis of language or topic, and statistical, probabilistic, and semantic methods of analyzing and retrieving documents. Information extraction from text has therefore been pursued actively as an attempt to present knowledge from published material in a computer readable format. An automated extraction tool would not only save time and efforts, but also pave way to discover hitherto unknown information implicitly conveyed in this paper. Work in this area has focused on extracting a wide range of information such as chromosomal location of genes, protein functional information, associating genes by functional relevance and relationships between entities of interest. While clinical records provide a semi-structured, technically rich data source for mining information, the publications, in their unstructured format pose a greater challenge, addressed by many approaches.
This chapter gives information about Social media analytics, Social network analysis, Text analytics, stopwords, tokenization, n-grams, Trend analytics, TF-IDF, Stemming and lemmatization
This white paper provides an overview of the patented technologies and associated processes that combine to deliver a unique and highly scalable solution for unstructured text/data analytics
Fundamentals Concepts on Text Analytics.pptxaini658222
Text analytics, also known as text mining, is the process of deriving high-quality information from text sources using software. It is a multidisciplinary field that combines elements of data mining, machine learning, statistics, and natural language processing (NLP) to process and analyze large amounts of natural language data effectively.
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEYijaia
In today’s world of digital media, connecting millions of users, large amounts of information is being
generated. These are potential mines of knowledge and could give deep insights about the trends of both
social and scientific value. However, owing to the fact that most of this is highly unstructured, we cannot
make any sense of it. Natural language processing (NLP) is a serious attempt in this direction to organise
the textual matter which is in a human understandable form (natural language) in a meaningful and
insightful way. In this, text entailment can be considered a key component in verifying or proving the
correctness or efficiency of this organisation. This paper tries to make a survey of various text entailment
methods proposed giving a comparative picture based on certain criteria like robustness and semantic
precision.
The whitepaper addresses the challenges in the data–driven organizations, medical research and health care. It summarizes how the context-enabled and semantic enrichment can transform the traditional method to search optimum data. 3RDi has advanced content enrichment with Named Entity Recognition, Semantic similarity, Content classification and Content summarization. Get the right data at the right time that helps medical researchers and health care practitioners.
Text mining has turned out to be one of the in vogue handle that has been joined in a few research
fields, for example, computational etymology, Information Retrieval (IR) and data mining. Natural
Language Processing (NLP) methods were utilized to extricate learning from the textual text that is
composed by people. Text mining peruses an unstructured form of data to give important
information designs in a most brief day and age. Long range interpersonal communication locales
are an awesome wellspring of correspondence as the vast majority of the general population in this
day and age utilize these destinations in their everyday lives to keep associated with each other. It
turns into a typical practice to not compose a sentence with remedy punctuation and spelling. This
training may prompt various types of ambiguities like lexical, syntactic, and semantic and because of
this kind of indistinct data; it is elusive out the genuine data arrange. As needs be, we are directing
an examination with the point of searching for various text mining techniques to get different
textual requests via web-based networking media sites. This review expects to depict how
contemplates in online networking have utilized text investigation and text mining methods to
identify the key topics in the data. This study concentrated on examining the text mining
contemplates identified with Facebook and Twitter; the two prevailing web-based social networking
on the planet. Aftereffects of this overview can fill in as the baselines for future text mining research.
Decision Support for E-Governance: A Text Mining ApproachIJMIT JOURNAL
Information and communication technology has the capability to improve the process by which governments involve citizens in formulating public policy and public projects. Even though much of government regulations may now be in digital form (and often available online), due to their complexity and diversity, identifying the ones relevant to a particular context is a non-trivial task. Similarly, with the advent of a number of electronic online forums, social networking sites and blogs, the opportunity of gathering citizens’ petitions and stakeholders’ views on government policy and proposals has increased greatly, but the volume and the complexity of analyzing unstructured data makes this difficult. On the other hand, text mining has come a long way from simple keyword search, and matured into a discipline capable of dealing with much more complex tasks. In this paper we discuss how text-mining techniques can help in retrieval of information and relationships from textual data sources, thereby assisting policy makers in discovering associations between policies and citizens’ opinions expressed in electronic public forums and blogs etc. We also present here, an integrated text mining based architecture for e-governance decision support along with a discussion on the Indian scenario.
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...IJERA Editor
Although publicly accessible databases containing speech documents. It requires a great deal of time and effort
required to keep them up to date is often burdensome. In an effort to help identify speaker of speech if text is
available, text-mining tools, from the machine learning discipline, it can be applied to help in this process also.
Here, we describe and evaluate document classification algorithms i.e. a combo pack of text mining and
classification. This task asked participants to design classifiers for identifying documents containing speech
related information in the main literature, and evaluated them against one another. Expected systems utilizes a
novel approach of k -nearest neighbour classification and compare its performance by taking different values of
k.
Semantic interoperability is often an afterthought. QSi is proposing a radical shift in the way we currently view the nature and relationship between Information, Language, and Data. In the process, semantic interoperability is an emergent characteristic of data management.
SPIRIT: A TREE KERNEL-BASED METHOD FOR TOPIC PERSON INTERACTION DETECTIONNexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
Similar to An Introduction to Text Analytics: 2013 Workshop presentation (20)
Creating an AI Startup: What You Need to KnowSeth Grimes
Seth Grimes presented "Creating an AI Startup: What You Need to Know," at a May 20, 2021 Launch Annapolis + Maryland AI (https://www.meetup.com/MarylandAI) program, focusing on opportunity and resources for Maryland tech entrepreneurs.
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Seth Grimes
Moshe Wasserblat, Intel AI, presents on Efficient Deep Learning in Natural Language Processing Production to an online NLP meetup audience, August 3, 2020. Visit https://www.meetup.com/NY-NLP for the New York NLP meetup.
From Customer Emotions to Actionable Insights, with Peter DorringtonSeth Grimes
From Customer Emotions to Actionable Insights -- A presentation by Peter Dorrington, founder, XMplify Consulting, at the 2020 CX Emotion conference (https://cx-emotion.com), July 22, 2020.
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AISeth Grimes
Dan Lee from Dentuit AI presented an Intro to Deep Learning for Medical Image Analysis at the Maryland AI meetup (https://www.meetup.com/Maryland-AI), May 27, 2020. Visit https://www.youtube.com/watch?v=xl8i7CGDQi0 for video.
Emotion AI refers to a set of technologies -- natural language processing, voice tech, facial coding, neuroscience, and behavioral analytics -- applied to interactions to extract, convey, and induce emotion. Emotion AI is a presentation by Seth Grimes at AI for Human Language, March 5, 2020 in Tel Aviv.
Text Analytics for NLPers, a presentation by Seth Grimes, created for the December 2, 2019 Natural Language Processing-New York (NYC-NLP) meetup, https://www.meetup.com/NLP-NY/events/266093296/
Our FinTech Future – AI’s Opportunities and Challenges? Seth Grimes
"Our FinTech Future – AI’s Opportunities and Challenges?" is a presentation by Jim Kyung-Soo Liew, Ph.D. to the Artificial Intelligence Maryland (MD-AI) meetup (https://www.meetup.com/Maryland-AI/), November 20, 2019. Dr. Liew is Co-Founder of SoKat.co and Associate Professor at Johns Hopkins Carey Business School.
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
Presentation by Nathan Schneider, Assistant Professor of Linguistics and Computer Science at Georgetown University, to the Washington DC Natural Language Processing meetup, October 14, 2019 (https://www.meetup.com/DC-NLP/events/264894589/).
The Ins and Outs of Preposition Semantics: Challenges in Comprehensive Corpu...Seth Grimes
Presentation by Nathan Scheider, Georgetown University, to the Washington DC Natural Language Processing meetup, October 14, 2019, https://www.meetup.com/DC-NLP/events/264894589/.
Nick Schmidt of BLDS, LLC to the Maryland AI meetup, June 4, 2019 (https://www.meetup.com/Maryland-AI). Nick discusses ideas of fairness and how they apply to machine learning. He explores recent academic work on identifying and mitigating bias, and how his work in lending and employment can be applied to other industries. Nick explains how to measure whether an algorithm is fair and also demonstrate the techniques that model builders can use to ameliorate bias when it is found.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
An Introduction to Text Analytics: 2013 Workshop presentation
1. An Introduction to Text Analytics:
Social, Online, and Enterprise
Seth Grimes
Alta Plana Corporation
@sethgrimes
Text & Social Analytics Summit 2013
Workshop
June 4, 2013
2. Text Analytics Introduction
2013 Text Analytics Summit
2
Perspectives
Perspective #1: You support research or work in IT.
You help end users who have lots of text.
Perspective #2: You’re a researcher, business analyst, or
other “end user.”
You have lots of text. You want an automated way to deal
with it.
Perspective #3: You work for a solution provider.
Perspective #4: Other?
3. Text Analytics Introduction
2013 Text Analytics Summit
3
From the Analytics/Business Perspective
1. If you are not analyzing text – if you're analyzing only
transactional information – you're missing opportunity
or incurring risk.
2. Text analytics can boost business results –
“Organizations embracing text analytics all report having an
epiphany moment when they suddenly knew more than
before.”
-- Philip Russom, the Data Warehousing Institute, 2007
http://tdwi.org/articles/2007/05/09-what-works/bi-search-and-text-analytics.aspx
– via established BI / data-mining programs, or
independently.
4. Text Analytics Introduction
2013 Text Analytics Summit
4
Agenda
1. The “Unstructured” Data Challenge.
2. Text analytics for information retrieval and BI.
3. Text analysis technologies and processes.
4. Applications.
5. The market: Software, services, and solutions.
Note:
I will not cover the agenda in a linear fashion.
Product images and references are used to illustrate only.
Some slides are included for reference purposes.
5. Text Analytics Introduction
2013 Text Analytics Summit
5
Value in Text
It’s a truism that 80% of enterprise-relevant information
originates in “unstructured” form:
E-mail and messages.
Web pages, news & blog articles, forum postings, and other
social media.
Contact-center notes and transcripts.
Surveys, feedback forms, warranty claims.
Scientific literature, books, legal documents.
...
http://breakthroughanalysis.com/2008/08/01/unstructured-data-and-the-80-percent-rule/
Non-text “unstructured” content?
Images
Audio including speech
Video
Value derives from patterns.
6. Text Analytics Introduction
2013 Text Analytics Summit
6
Unstructured Sources
These sources may contain “traditional” data.
The Dow fell 46.58, or 0.42 percent, to 11,002.14. The Standard
& Poor's 500 index fell 1.44, or 0.11 percent, to 1,263.85.
And they may not.
www.stanford.edu/%7ernusse/wntwindow.html
Axin and Frat1 interact with dvl and GSK, bridging
Dvl to GSK in Wnt-mediated regulation of LEF-1.
Wnt proteins transduce their signals through
dishevelled (Dvl) proteins to inhibit glycogen synthase
kinase 3beta (GSK), leading to the accumulation of
cytosolic beta-catenin and activation of TCF/LEF-1
transcription factors. To understand the mechanism
by which Dvl acts through GSK to regulate LEF-1, we
investigated the roles of Axin and Frat1 in Wnt-
mediated activation of LEF-1 in mammalian cells. We
found that Dvl interacts with Axin and with Frat1,
both of which interact with GSK. Similarly, the Frat1
homolog GBP binds Xenopus Dishevelled in an
interaction that requires GSK.
www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&
cmd=Retrieve&list_uids=10428961&dopt=Abstract
7. Text Analytics Introduction
2013 Text Analytics Summit
7
http://www.tropicalisland.de/NYC_New_York_
Brooklyn_Bridge_from_World_Trade_Center_
b.jpg
x(t) = t
y(t) = ½ a (et/a + e-t/a)
= acosh(t/a)
http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg
Structure in “unstructured” sourcesStructure in “unstructured” sources
8. Text Analytics Introduction
2013 Text Analytics Summit
8
Business Intelligence
Conventional BI feeds off:
It runs off:
"SUMLEV","STATE","COUNTY","STNAME","CTYNAME","YEAR","POPESTIMATE",
50,19,1,"Iowa","Adair County",1,8243,4036,4207,446,225,221,994,509
50,19,1,"Iowa","Adair County",2,8243,4036,4207,446,225,221,994,509
50,19,1,"Iowa","Adair County",3,8212,4020,4192,442,222,220,987,505
50,19,1,"Iowa","Adair County",4,8095,3967,4128,432,208,224,935,488
50,19,1,"Iowa","Adair County",5,8003,3924,4079,405,186,219,928,495
50,19,1,"Iowa","Adair County",6,7961,3892,4069,384,183,201,907,472
50,19,1,"Iowa","Adair County",7,7875,3855,4020,366,179,187,871,454
50,19,1,"Iowa","Adair County",8,7795,3817,3978,343,162,181,841,439
50,19,1,"Iowa","Adair County",9,7714,3777,3937,338,159,179,805,417
“The bulk of information value
is perceived as coming from
data in relational tables. The
reason is that data that is
structured is easy to mine and
analyze.”
-- Prabhakar Raghavan,
Google, formerly
Yahoo Research
10. Text Analytics Introduction
2013 Text Analytics Summit
10
Text-BI: Back to the Future
Business intelligence (BI) was first defined in 1958:
“In this paper, business is a collection of activities carried on for
whatever purpose, be it science, technology, commerce,
industry, law, government, defense, et cetera... The notion of
intelligence is also defined here... as ‘the ability to apprehend
the interrelationships of presented facts in such a way as to
guide action towards a desired goal.’”
-- Hans Peter Luhn
“A Business Intelligence System”
IBM Journal, October 1958
What was IT like in the ‘50s?
12. Text Analytics Introduction
2013 Text Analytics Summit
12
From the Information Retrieval Perspective
What do people do with electronic documents?
1. Publish, Manage, and Archive.
2. Index and Search.
3. Categorize and Classify according to metadata &
contents.
4. Information Extraction.
For textual documents, text analytics enhances #1 & #2 and
enables #3 & #4.
You need linguistics to do #1 & #4 well, to deal with meaning
(a.k.a. semantics).
Search is not enough...
13. Text Analytics Introduction
2013 Text Analytics Summit
13
Semantics
Text analytics adds semantic understanding of –
Named entities: people, companies, places, etc.
Pattern-based entities: e-mail addresses, phone numbers, etc.
Concepts: abstractions of entities.
Facts and relationships.
Concrete and abstract attributes (e.g., 10-year, expensive,
comfortable).
Subjectivity in the forms of opinions, sentiments, and
emotions: attitudinal data.
Call these elements, collectively, features.
14. Text Analytics Introduction
2013 Text Analytics Summit
14
Semantics, Analytics, and IR
Text analytics generates semantics to bridge search, BI, and
applications, enabling next-generation information
systems.
Search BI
Applica-
tions
Search based
applications
(search + text +
apps)
Information access
(search + text + BI)
Integrated analytics
(text + BI)
Text analytics
(inner circle)
Semantic search
(search + text)
NextGen CRM, EFM,
MR, marketing, …
15. Text Analytics Introduction
2013 Text Analytics Summit
15
Information Access
Text analytics transforms Information Retrieval (IR) into
Information Access (IA).
• Search terms become queries.
• Retrieved material is mined for larger-scale structure.
• Retrieved material is mined for features such as entities and
topics or themes.
• Retrieved material is mined for smaller-scale structure such
as facts and relationships.
• Results are presented intelligently, for instance, grouping
on mined topics-themes.
• Extracted information may be visualized and explored.
16. Text Analytics Introduction
2013 Text Analytics Summit
16
Information Access
Text analytics enables results that suit the information and
the user, e.g., answers –
17. Text Analytics Introduction
2013 Text Analytics Summit
17
Text Data Mining Enables Content Exploration
Decisive Analytics
http://www.dac.us/
18. Text Analytics Introduction
2013 Text Analytics Summit
18
Text Analytics Definition
Text analytics automates what researchers, writers,
scholars, and all the rest of us have been doing for years.
Text analytics –
Applies linguistic and/or statistical techniques to extract
concepts and patterns that can be applied to categorize and
classify documents, audio, video, images.
Transforms “unstructured” information into data for
application of traditional analysis techniques.
Discerns meaning and relationships in large volumes of
information that were previously unprocessable by
computer.
19. Text Analytics Introduction
2013 Text Analytics Summit
19
Glossary: Information in Text
Entity: Typically a name (person, place, organization, etc.) or
a patterned composite (phone number, e-mail address).
Concept: An abstract entity or collection of entities.
Feature: An element of interest, e.g., an entity, concept,
topic, event, fact, relationship, etc.
Metadata: Descriptive information such as author, title,
publication date, file type and size.
Information Extraction (IE) involves pulling metadata,
features & their attributes out of textual sources.
Co-reference: Multiple expressions that describe the same
thing. Anaphora including pronoun use is an example:
John pushed Max. He fell.
John pushed Max. He laughed.
-- Laure Vieu and Patrick Saint-Dizier
20. Text Analytics Introduction
2013 Text Analytics Summit
20
Glossary: Methods
Natural Language Processing (NLP): Computers hear
humans.
Parsing: Evaluating the content of a document or text.
Tokenization: Identification of distinct elements, e.g.,
words, punctuation marks, n-grams.
Stemming/Lemmatization: Reducing word variants
(conjugation, declension, case, pluralization) to bases.
Term reduction: Use of synonyms, taxonomy, similarity
measures to group like terms.
Tagging: Wrapping XML tags around distinct features, a.k.a.
text augmentation. May involve text enrichment.
POS Tagging: Specifically identifying parts of speech via
syntactic analysis.
21. Text Analytics Introduction
2013 Text Analytics Summit
21
Glossary: Organizing and Structuring
Categorization: Specification of feature/doc groupings.
Clustering: Creating categories according to outcome-
similarity criteria.
Taxonomy: An exhaustive, hierarchical categorization of
entities and concepts, either specified or generated by
clustering.
Classification: Assigning an item to a category, perhaps
using a taxonomy.
Ontology : In practice, a classification of a set of items in a
way that represents knowledge.
An oak is a tree. A rose is a flower. A deer is an animal. A sparrow is a
bird. Russia is our fatherland. Death is inevitable.
-- P. Smirnovskii, A Textbook of Russian Grammar
Semantics relates features to others.
22. Text Analytics Introduction
2013 Text Analytics Summit
22
Glossary: Evaluation
Precision: The proportion of decisions (e.g., classifications)
that are correct.
Recall: The proportion of actual correct decisions (e.g.,
classifications) relative to the total number of correct
decisions.
Find the even numbers:
9 17 12 4 1 6 2 20 7 3 8 10
What is my Precision? What is my Recall?
Accuracy: How well an IE or IR task has been performed,
computed as an F-score weighting Precision & Recall,
typically:
f = 2*(precision * recall) / (precision + recall)
Relevance: Do results match the individual user’s needs?
23. Text Analytics Introduction
2013 Text Analytics Summit
23
Text Analytics Pipeline
Typical steps in text analytics include –
1. Identify and retrieve documents for analysis.
2. Apply statistical &/ linguistic &/ structural techniques to
discern, tag, and extract entities, concepts, relationships,
and events (features) within document sets.
3. Apply statistical pattern-matching & similarity techniques
to classify documents and organize extracted features
according to a specified or generated categorization /
taxonomy.
– via a pipeline of statistical & linguistic steps.
Let’s look at them, at steps to model text...
24. Text Analytics Introduction
2013 Text Analytics Summit
24
Modelling Text
Metadata.
E.g., title, author, date.
Statistics.
E.g., term frequency, co-occurrence, proximity.
Linguistics.
Lexicons, gazetteers, phrase books.
Word morphology, parts of speech, syntactic rules.
Semantic networks.
Larger-scale structure including discourse.
Machine learning.
25.
26.
27. Text Analytics Introduction
2013 Text Analytics Summit
27
“Statistical information derived from word frequency and distribution is
used by the machine to compute a relative measure of significance, first
for individual words and then for sentences. Sentences scoring highest in
significance are extracted and printed out to become the auto-abstract.”
-- H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
http://wordle.net
28. Text Analytics Introduction
2013 Text Analytics Summit
28
Modelling Text
The text content of a document can be considered an
unordered “bag of words.”
Particular documents are points in a high-dimensional vector
space.
Salton, Wong &
Yang, “A Vector
Space Model for
Automatic
Indexing,”
November 1975.
29. Text Analytics Introduction
2013 Text Analytics Summit
29
Modelling Text
We might construct a document-term matrix...
D1 = “I like databases”
D2 = “I hate hate databases”
and use a weighting such as TF-IDF (term frequency–inverse
document frequency)…
in computing the cosine of the angle between weighted
doc-vectors to determine similarity.
I like hate databases
D1 1 1 0 1
D2 1 0 2 1
http://en.wikipedia.org/wiki/Term-document_matrix
30. Text Analytics Introduction
2013 Text Analytics Summit
30
Modelling Text
Analytical methods make text tractable.
Latent semantic indexing utilizing singular value
decomposition for term reduction / feature selection.
Creates a new, reduced concept space.
Takes care of synonymy, polysemy, stemming, etc.
Classification technologies / methods:
Naive Bayes.
Support Vector Machine.
K-nearest neighbor.
31. Text Analytics Introduction
2013 Text Analytics Summit
31
Modelling Text
In the form of query-document similarity, this is Information
Retrieval 101.
See, for instance, Salton & Buckley, “Term-Weighting
Approaches in Automatic Text Retrieval,” 1988.
A useful basic tech paper: Russ Albright, SAS, “Taming Text
with the SVD,” 2004.
Given the complexity of human language, statistical models
may fall short.
“Reading from text in general is a hard problem, because it
involves all of common sense knowledge.”
-- Expert systems pioneer Edward A. Feigenbaum
32. Text Analytics Introduction
2013 Text Analytics Summit
32
“Tri-grams” here
are pretty good at
describing the
Whatness of the
source text. Yet...
“This rather unsophisticated argument on ‘significance’
avoids such linguistic implications as grammar and syntax...
No attention is paid to the logical and semantic
relationships the author has established.”
-- Hans Peter Luhn, 1958
34. Text Analytics Introduction
2013 Text Analytics Summit
34
Advanced Term Counting
Counting term hits, in one source, at the doc level, doesn’t
take you far...
Good or bad? What’s behind the posts?
35. Text Analytics Introduction
2013 Text Analytics Summit
35
Why Do We Need Linguistics?
To get more out of text than can be delivered by a
bag/vector of words and term counting.
The Dow fell 46.58, or 0.42 percent, to 11,002.14. The Standard
& Poor's 500 index gained1.44, or 0.11 percent, to 1,263.85.
The Dow gained 46.58, or 0.42 percent, to 11,002.14. The
Standard & Poor's 500 index fell 1.44, or 0.11 percent, to
1,263.85.
-- Luca Scagliarini, Expert System
Time flies like an arrow. Fruit flies like a banana.
-- Groucho Marx
(Statistical co-occurrence to build a model for analysis of
text such as these is possible but still limited.)
39. Text Analytics Introduction
2013 Text Analytics Summit
39
From POS to Relationships
When we under-
stand, for instance,
parts of speech
(POS), e.g. –
<subject> <verb>
<object>
– we’re in a position
to discern facts and
relationships...
Semantic networks
such as WordNet
are an asset for
word-sense
disambiguation.
40.
41. Text Analytics Introduction
2013 Text Analytics Summit
41
Tagging with GATE
Annotation (tagging) in action via GATE, an open-source
tool:
45. Text Analytics Introduction
2013 Text Analytics Summit
45
Information Extraction
For content analysis, key in on extracting information.
Text features are typically marked up (annotated) in-place
with XML.
Entities and concepts may correspond to dimensions in a
standard BI model.
Both classes of object are hierarchically organized and have
attributes.
We can have both discovered and predetermined classifications
(taxonomies) of text features.
Dimensional modelling facilitates extraction to
databases...
46. Text Analytics Introduction
2013 Text Analytics Summit
46
Database Insert
Illustrated via an IBM example:
“The standard features are stored in the STANDARD_KW table,
keywords with their occurrences in the KEYWORD_KW_OCC
table, and the text list features in the TEXTLIST_TEXT table.
Every feature table contains the DOC_ID as a reference to the
DOCUMENT table.”
47. Text Analytics Introduction
2013 Text Analytics Summit
47
Sophisticated Pattern Matching
Lexicons and language rules boost accuracy. An example –
a bit complicated – GATE Extension via JAPE Rules...
/* locationcontext2.jape
* Dhavalkumar Thakker, Nottingham Trent University/PA Photos 15 Sept 2008*/
Phase: locationcontext2
Input: Lookup Token
Options: control = all debug = false
//Manchester, UK
Rule: locationcontext2
Priority:50
({Token.string == "at"})
( ({Token.string =~ "[Tt]he"})?
(
(
{Token.kind == word, Token.category == NNP, Token.orth == upperInitial}
({Token.kind == punctuation})?
{Token.kind == word, Token.category == NNP, Token.orth == upperInitial}
({Token.kind == punctuation})?
{Token.kind == word, Token.category == NNP, Token.orth == upperInitial} ) |
( {Token.kind == word, Token.category == NNP, Token.orth == upperInitial}
({Token.kind == punctuation})?
( {Token.kind == word, Token.category == NNP, Token.orth == allCaps} |
{Token.kind == word, Token.category == NNP, Token.orth == upperInitial} )
) |
...
49. Text Analytics Introduction
2013 Text Analytics Summit
49
Predictive Modeling
In the text context, predictive analytics is mostly about
classification and automated processing.
Modeling also helps, operationally, with:
Completion
Disambiguation
: use dictionaries, context
Error correction
http://en.wikipedia.org/wiki/File:
ITap_on_Motorola_C350.jpg
50. Text Analytics Introduction
2013 Text Analytics Summit
50
Error Correction
“Search logs suggest that from 10-15% of queries contain
spelling or typographical errors. Fittingly, one important
query reformulation tool is spelling suggestions or
corrections.”
-- Marti Hearst, Search User Interfaces
51. Text Analytics Introduction
2013 Text Analytics Summit
51
Accuracy and Semi-Structured Sources
An e-mail message is “semi-structured,” which facilitates
extracting metadata --
Date: Sun, 13 Mar 2005 19:58:39 -0500
From: Adam L. Buchsbaum <alb@research.att.com>
To: Seth Grimes <grimes@altaplana.com>
Subject: Re: Papers on analysis on streaming data
seth, you should contact divesh srivastava,
divesh@research.att.com regarding at&t labs data streaming
technology.
Adam
“Reading from text in structured domains I don’t think is as
hard.”
-- Edward A. Feigenbaum
Surveys are also typically s-s in a different way...
52. Text Analytics Introduction
2013 Text Analytics Summit
52
Structured &‘Unstructured’
The respondent is invited to explain his/her attitude:
53. Text Analytics Introduction
2013 Text Analytics Summit
53
Structured &‘Unstructured’
We typically look at frequencies and distributions of coded-
response questions:
Linkage of responses to
coded ratings helps in
analyses of free text:
54. Text Analytics Introduction
2013 Text Analytics Summit
54
“Sentiment analysis is the task of identifying positive and
negative opinions, emotions, and evaluations.”
-- Wilson, Wiebe & Hoffman, 2005, “Recognizing Contextual
Polarity in Phrase-Level Sentiment Analysis”
“Sentiment analysis or opinion mining is the computational
study of opinions, sentiments and emotions expressed in
text… An opinion on a feature f is a positive or negative
view, attitude, emotion or appraisal on f from an opinion
holder.”
-- Bing Liu, 2010, “Sentiment Analysis and Subjectivity,” in Handbook
of Natural Language Processing
“Dell really... REALLY need to stop overcharging... and when i
say overcharing... i mean atleast double what you would
pay to pick up the ram yourself.”
-- From Dell’s IdeaStorm.com
Sentiment Analysis
55. Text Analytics Introduction
2013 Text Analytics Summit
55
What are people saying? What’s hot/trending?
What are they saying about {topic|person|product} X?
... about X versus {topic|person|product} Y?
How has opinion about X and Y evolved?
How has opinion correlated with {our|competitors’|general}
{news|marketing|sales|events}?
What’s behind opinion, the root causes?
(How) Can we link opinions & transactions?
(How) Can we link opinion & intent?
Who are opinion leaders?
How does sentiment propagate across channels?
Sentiment Analysis
56. Text Analytics Introduction
2013 Text Analytics Summit
56
Sentiment Analysis
Applications include:
Brand / Reputation Management.
Competitive intelligence.
Customer Experience Management.
Enterprise Feedback Management.
Quality improvement.
Trend spotting.
There are significant challenges…
57.
58. Text Analytics Introduction
2013 Text Analytics Summit
58
Complications Illustrated
Unfiltered
duplicates
External
reference
“Kind” =
type, variety,
not a
sentiment.
Complete
misclassification
59. Text Analytics Introduction
2013 Text Analytics Summit
59
Sentiment Complications
There are many complications.
Sentiment may be of interest at multiple levels.
Corpus / data space, i.e., across multiple sources.
Document.
Statement / sentence.
Entity / topic / concept.
Human language is noisy and chaotic!
Jargon, slang, irony, ambiguity, anaphora, polysemy, synonymy,
etc.
Context is key. Discourse analysis comes into play.
Must distinguish the sentiment holder from the object:
“Geithner said the recession may worsen.”
61. Text Analytics Introduction
2013 Text Analytics Summit
61
Intent Analysis
http://www.aiaioo.com/whitepapers/intention_analysis_use_cases.pdf
http://sentibet.com/
62. Text Analytics Introduction
2013 Text Analytics Summit
62
Applications
Text analytics has applications in –
• Intelligence & law enforcement.
• Life sciences.
• Media & publishing including social-media analysis and
contextual advertizing.
• Competitive intelligence.
• Voice of the Customer: CRM, product management &
marketing.
• Legal, tax & regulatory (LTR) including compliance.
• Recruiting.
63. Text Analytics Introduction
2013 Text Analytics Summit
63
Online Commerce
Text analytics is applied for marketing, search optimization,
competitive intelligence.
Analyze social media and enterprise feedback to understand
the Voice of the Market:
• Opportunities
• Threats
• Trends
Categorize product and service offerings for on-site search
and faceted navigation and to enrich content delivery.
Annotate pages to enhance Web-search findability, ranking.
Scrape competitor sites for offers and pricing.
Analyze social and news media for competitive information.
64. Text Analytics Introduction
2013 Text Analytics Summit
64
Voice of the Customer
Text analytics is applied to enhance customer service and
satisfaction.
Analyze customer interactions and opinions –
• E-mail, contact-center notes, survey responses
• Forum & blog posting and other social media
– to –
• Address customer product & service issues
• Improve quality
• Manage brand & reputation
If you can link qualitative information from text you can –
• Link feedback to transactions
• Assess customer value
• Understand root causes
• Mine data for measures such as churn likelihood
65. Text Analytics Introduction
2013 Text Analytics Summit
65
E-Discovery and Compliance
Text analytics is applied for compliance, fraud and risk, and
e-discovery.
Regulatory mandates and corporate practices dictate –
• Monitoring corporate communications
• Managing electronic stored information for production in
event of litigation
Sources include e-mail (!!), news, social media
Risk avoidance and fraud detection are key to effective
decision making
• Text analytics mines critical data from unstructured sources
• Integrated text-transactional analytics provides rich insights
68. Text Analytics Introduction
2013 Text Analytics Summit
68
8%
21%
25%
27%
36%
21%
34%
35%
44%
47%
21%
22%
23%
27%
29%
30%
35%
35%
41%
62%
0% 10% 20% 30% 40% 50% 60% 70%
text messages/SMS/chat
Web-site feedback
contact-center notes or transcripts
scientific or technical literature
e-mail and correspondence
review sites or forums
customer/market surveys
on-line forums
news articles
blogs and other social media
What textual information are you analyzing or do you plan to
analyze?
2011 (n=215)
2009 (n=100)
71. Text Analytics Introduction
2013 Text Analytics Summit
71
Getting Started
A best practices approach…
Assess:
• Assess business goals.
• Understand information sources.
• Consult and educate stakeholders.
Evaluate:
• Evaluate installed, hosted/SaaS, database-integrated options.
• Determine performance and business requirements.
• Match methods to goals, sources, and work practices.
Implement:
• Start with basic functions such as search, modest goals, or
with a single information source.
• Go for clear wins to gain support.
• Build out applications, capacity, BI/research integration.
72. Text Analytics Introduction
2013 Text Analytics Summit
72
Software & Platform Options
Text-analytics options may be grouped in general classes.
• Installed text-analysis application, whether desktop or
server or deployed in-database.
• Data mining workbench.
• Hosted.
• Programming tool.
• As-a-service, via an application programming interface
(API).
• Code library or component of a business/vertical
application, for instance for CRM, e-discovery, search.
Text analytics is frequently embedded in search or other
end-user applications.
The slides that follow next will present leading options in
each category except Hosted…
73. Text Analytics Introduction
2013 Text Analytics Summit
73
Text Analysis Applications
Vendors:
Attensity, Clarabridge, Daedalus, Digital Reasoning, Expert
System, Fido Labs, IBM, Linguamatics, Medallia, NetBase,
Open Text (Nstein), Provalis Research, SAP, SAS, SRA
NetOwl, Sysomos, Temis.
Typical uses:
Customer experience management (CEM), survey analysis,
social-media analysis, law enforcement, life sciences.
Typical characteristics:
Interface that allows the user to configure a processing
pipeline.
Interface for text exploration and visualization.
Export to databases
74. Text Analytics Introduction
2013 Text Analytics Summit
74
Data Mining Workbench
Vendors:
IBM SPSS Modeler, Megaputer PolyAnalyst, RapidMiner, SAS
Text Analytics, WEKA.
Typical uses:
Customer experience management (CEM), marketing
analytics, survey analysis, social-media analysis, law
enforcement.
Predictive modeling.
Typical characteristics:
Same as text-analysis applications, but with more
sophisticated modeling and analysis capabilities.
75. Text Analytics Introduction
2013 Text Analytics Summit
75
Programming/Development Tool
Vendors:
GATE, OpenNLP, Python NLTK, R, Stanford NLP – open
source.
NooJ – free, non-open source.
Typical uses:
Language modeling.
Data exploration.
Up to the programmer.
Typical characteristics:
Text is an add-in to a programming language/environment.
76. Text Analytics Introduction
2013 Text Analytics Summit
76
As a Service, via API
Vendors:
Alchemy API, Clarabridge, Open Amplify, Pingar, Saplo,
Semantria (Lexalytics), Thomson Reuters Calais, others.
Typical uses:
Annotation and content enrichment with the application
domain up to the user.
Typical characteristics:
Relies on remote, server-resident processing resources.
May or, more likely, may not be end-user customizable.
77. Text Analytics Introduction
2013 Text Analytics Summit
77
Code Library or Annotation Engine
Vendors:
Alias-I LingPipe, GATE.
Basis Technology Rosette, Lexalytics Salience, SAP Inxight,
SAS Teragram, TEMIS Luxid.
Typical uses:
Information extraction in support of business applications.
Typical characteristics:
Same as text-analysis applications, but with more
sophisticated modeling and analysis capabilities.
79. Text Analytics Introduction
2013 Text Analytics Summit
79
http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html
http://hpccsystems.com/ (GNU Affero GPL)
A Big Data Platform (example)
83. An Introduction to Text Analytics:
Social, Online, and Enterprise
Seth Grimes
Alta Plana Corporation
@sethgrimes
Text & Social Analytics Summit 2013
Workshop
June 4, 2013