Natural Language Processing for Data Analytics - Tel Aviv Summit 2018Amazon Web Services
The need for Natural Language Processing (NLP) is gaining more importance as the amount of unstructured text data doubles every 18 months and customers are looking to extend their existing analytics workloads to include natural language capabilities. Historically this data had been prohibitively expensive to store and early manual processing evolved into rule-based systems which were expensive to operate and inflexible.
In this session we will show you how you can address this problem using Amazon Comprehend.
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
Advances in text mining, analytics and machine learning are transforming our applications and enabling ever more powerful applications, yet most applications and platforms are designed to deal with a single (normalized) language. Hence as our applications and platforms are increasingly required to ingest international content, the challenge becomes to find ways to normalize content to a single language without compromising quality. An extension of this question in terms of such applications is also how we define quality in this context and what, if any, bi-products a localization effort can produce that may enhance the usefulness of the application.
This talk will, using patent searching as an example use case, review the challenges and possible solution approaches for handling localization effectively and will show what current emerging technology offers, what to expect and what not to expect and provide an introductory practical guide to handling localization in the context of data mining and analytics.
In this presentation we discuss several concepts that include Word Representation using SVD as well as neural networks based techniques. In addition we also cover core concepts such as cosine similarity, atomic and distributed representations.
Voice Summit 2018 - Millions of Dollars in Helping Customers Through Searchin...Noriaki Tatsumi
Capital One has built a low-latency data platform that has enabled us to search and visualize text and metadata from call center audio recordings in order to optimize customer experience and drive them to love our products and service. In this talk, Capital One talks about what business problems the platform is allowing the company to solve in innovative ways. They also give a walk-through of its underlying technology, built with state-of-the-art data ingestion pipeline, machine learning models, search engine, visualization tools, and API in a microservices architecture.
Words and sentences are the basic units of text. In this lecture we discuss basics of operations on words and sentences such as tokenization, text normalization, tf-idf, cosine similarity measures, vector space models and word representation
Natural Language Processing for Data Analytics - Tel Aviv Summit 2018Amazon Web Services
The need for Natural Language Processing (NLP) is gaining more importance as the amount of unstructured text data doubles every 18 months and customers are looking to extend their existing analytics workloads to include natural language capabilities. Historically this data had been prohibitively expensive to store and early manual processing evolved into rule-based systems which were expensive to operate and inflexible.
In this session we will show you how you can address this problem using Amazon Comprehend.
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
Advances in text mining, analytics and machine learning are transforming our applications and enabling ever more powerful applications, yet most applications and platforms are designed to deal with a single (normalized) language. Hence as our applications and platforms are increasingly required to ingest international content, the challenge becomes to find ways to normalize content to a single language without compromising quality. An extension of this question in terms of such applications is also how we define quality in this context and what, if any, bi-products a localization effort can produce that may enhance the usefulness of the application.
This talk will, using patent searching as an example use case, review the challenges and possible solution approaches for handling localization effectively and will show what current emerging technology offers, what to expect and what not to expect and provide an introductory practical guide to handling localization in the context of data mining and analytics.
In this presentation we discuss several concepts that include Word Representation using SVD as well as neural networks based techniques. In addition we also cover core concepts such as cosine similarity, atomic and distributed representations.
Voice Summit 2018 - Millions of Dollars in Helping Customers Through Searchin...Noriaki Tatsumi
Capital One has built a low-latency data platform that has enabled us to search and visualize text and metadata from call center audio recordings in order to optimize customer experience and drive them to love our products and service. In this talk, Capital One talks about what business problems the platform is allowing the company to solve in innovative ways. They also give a walk-through of its underlying technology, built with state-of-the-art data ingestion pipeline, machine learning models, search engine, visualization tools, and API in a microservices architecture.
Words and sentences are the basic units of text. In this lecture we discuss basics of operations on words and sentences such as tokenization, text normalization, tf-idf, cosine similarity measures, vector space models and word representation
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
Delivering Micro-Credentials in Technical and Vocational Education and TrainingAG2 Design
Explore how micro-credentials are transforming Technical and Vocational Education and Training (TVET) with this comprehensive slide deck. Discover what micro-credentials are, their importance in TVET, the advantages they offer, and the insights from industry experts. Additionally, learn about the top software applications available for creating and managing micro-credentials. This presentation also includes valuable resources and a discussion on the future of these specialised certifications.
For more detailed information on delivering micro-credentials in TVET, visit this https://tvettrainer.com/delivering-micro-credentials-in-tvet/
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
1. + Jan Žižka
František Dařena
Department
of
Faculty of
Business
Informatics and
Economics
Mendel Czech
University Republic
in Brno
Mining Textual Significant
Expressions Reflecting Opinions in
Natural Languages
2. +
Introduction
Many companies collect opinions expressed
by their customers.
These opinions can hide valuable knowledge.
Discovering the knowledge by people can be
sometimes a very demanding task because
the opinion database can be very large,
the customers can use different languages,
the people can handle the opinions subjectively,
sometimes additional resources (like lists of positive
and negative words) might be needed.
3. +
Introduction
Text
mining can reveal units of the texts
(words, phrases, sentences etc.) that can
represent the meaning/sentiment
Individual
words usually do not bring
enough information
More information can provide phrases, but
their extraction, based on linguistic
analysis, requires additional knowledge
that is unique for every language
4. +
Objective
The objective is to find a way how a
computer can reveal phrases that
express a certain opinion, without the
exacting and time consuming linguistic
analysis which is miscellaneous for
different natural languages.
5. +
Data description
Processed data included reviews of hotel clients
collected from publicly available sources
The reviews were labeled as positive and negative
Reviews characteristics:
more than 5,000,000 reviews
written in more than 25 natural languages
written only by real customers, based on a real
experience
written relatively carefully but still containing errors that
are typical for natural languages
6. +
Review examples
Positive
The breakfast and the very clean rooms stood out as the best
features of this hotel.
Clean and moden, the great loation near station. Friendly
reception!
The rooms are new. The breakfast is also great. We had a really
nice stay.
Good location - very quiet and good breakfast.
Negative
High price charged for internet access which actual cost now
is extreamly low.
water in the shower did not flow away
The room was noisy and the room temperature was higher
than normal.
The air conditioning wasn't working
7. +
Data preparation
Data collection, cleaning (removing tags, non-
letter characters), converting to upper-case
Transforming into the bag-of-words
representation, term frequencies (TF) used as
attribute values
Removing the words with global frequency < 2
Stemming, stopwords removing, spell
checking, diacritics removal etc. were not
carried out
8. +
Data characteristics – number of
reviews
1200000
1000000
800000
number of reviews
positive
600000
negative
400000
200000
0
English French Spanish German Italian Czech
9. +
Data characteristics – dictionary
sizes
250000
200000
number of unique words
150000
MinTF=1
MinTF=2
100000
50000
0
English German French Spanish Italian Czech
10. +
Finding significant words
Thanksto having a large collection of labeled
examples a classifier that separates positive and
negative reviews could be created
To reveal significant attributes (words) a decision
tree was built using the tree-generating algorithm
c5 based on entropy minimization
The goal was not to achieve the best classification
accuracy but to find relevant attributes that
contribute to assigning a text to a given class
The significant words appeared in the nodes of the
decision tree
11. +
Finding the significant words
The classification accuracy which is proportional to
the relevancy of words was between 89.5 – 92.5%
Thedecision tree provided a list of about 200–300
words significant for classification from the
sentiment perspective
These words are used as the basis for extraction of
significant expressions in order to prevent from
considering all possible combinations of words
12. +
Extracting significant expressions
Extraction of significant expressions starts from
the list of significant words, the reviews are
being searched in the proximity of these words
Significant-expression extracting algorithm
parameters:
D – the distance from a significant word within
which the search is carried out
N – the number of words forming the significant
expressions
M – the minimal number of occurrences of a
specific group of words
13. +
An example
Searching for significant expressions in a review,
the algorithm parameters: D = 3, N = 3.
14. +
Results
Lists
of significant expressions extracted from the
original text reviews were obtained.
The expressions need to be considered by people.
19. +
Discussion
Some of the significant expressions were very similar
The significant expressions were mostly quite
meaningful and potentially useful for the target
audience
Some of the expressions were naturally not useful at all
Itis necessary to find a trade-off between the size of
expressions, the length of the texts where the search is
carried out and the informative value of expressions
20. +
Discussion
Examples of different distances of words forming the same
significant expression "good location"
21. +
Discussion
But, the same expression can be formed from words from
more contexts:
“... Breakfast was really good. The location is a
little out of the center ...”
or
“Good service. Convenient location”
or
“It is a quiet location for a good nights sleep”
22. +
Handling large collections
For
languages with large amount of reviews the
datasets were randomly split into subsets
consisting of 50,000 reviews because of memory
requirements and a decision tree was created for
each such subset
Each
of the 50,000-sample subsets gave almost the
same list of words
The relevancies of extracted words were averaged
23. +
Conclusions
A procedure how to apply computers, machine learning, and
natural language processing areas to automatically find
significant expressions was presented
From the total number of words (80,000–200,000) only about
200–300 were identified as significant and used as the basis
for expressions extraction
The simple, unified procedure worked well for many
languages
Following research focuses on preprocessing phase (e.g.
eliminating meaningless words)
The procedure might be used during the marketing
research or marketing intelligence, for filtering reviews,
generating lists of key-words etc.
24. Thank you for your attention
Vielen Dank für Ihre Aufmerksamkeit
Gracias por vuestra atención
Merci de votre attention
Grazie per la vostra attenzione
Děkuji za vaši pozornost