This document presents an overview of text mining. It discusses how text mining differs from data mining in that it involves natural language processing of unstructured or semi-structured text data rather than structured numeric data. The key steps of text mining include pre-processing text, applying techniques like summarization, classification, clustering and information extraction, and analyzing the results. Some common applications of text mining are market trend analysis and filtering of spam emails. While text mining allows extraction of information from diverse sources, it requires initial learning systems and suitable programs for knowledge discovery.
Text Mining is an Important part of data mining and it is used nowadays on a large scale. This mining technique is used to find patterns in text data collected from many online sources , and to gain some interestings insights from the patterns observed. Since text is basically everywhere on the internet, it becomes quite difficult to get the data in structured format, which is why text mining plays a huge role. It uses NLP(Natural Language Processing Techniques) to automate the text mining and this concept is used in Machine Learning.
A college level presentation covering the following topics:-
Introduction
Text mining Comparison with other mining
Text Mining Process
How Algorithm is derived for Text Mining
Text Analysis For Google Sheet
Conclusion
Text Mining is an Important part of data mining and it is used nowadays on a large scale. This mining technique is used to find patterns in text data collected from many online sources , and to gain some interestings insights from the patterns observed. Since text is basically everywhere on the internet, it becomes quite difficult to get the data in structured format, which is why text mining plays a huge role. It uses NLP(Natural Language Processing Techniques) to automate the text mining and this concept is used in Machine Learning.
A college level presentation covering the following topics:-
Introduction
Text mining Comparison with other mining
Text Mining Process
How Algorithm is derived for Text Mining
Text Analysis For Google Sheet
Conclusion
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaEdureka!
YouTube: https://youtu.be/xtOg44r6dsE
(** Python Data Science Training: https://www.edureka.co/python **)
In this PPT on Supervised vs Unsupervised vs Reinforcement learning, we’ll be discussing the types of machine learning and we’ll differentiate them based on a few key parameters. The following topics are covered in this session:
1. Introduction to Machine Learning
2. Types of Machine Learning
3. Supervised vs Unsupervised vs Reinforcement learning
4. Use Cases
Python Training Playlist: https://goo.gl/Na1p9G
Python Blog Series: https://bit.ly/2RVzcVE
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Big Data & Text Mining: Finding Nuggets in Mountains of Textual Data
Big amount of information is available in textual form in databases or online sources, and for many enterprise functions (marketing, maintenance, finance, etc.) represents a huge opportunity to improve their business knowledge. For example, text mining is starting to be used in marketing, more specifically in analytical customer relationship management, in order to achieve the holy 360° view of the customer (integrating elements from inbound mails, web comments, surveys, internal notes, etc.).
Facing this new domain I have make a personal research, and realize a synthesis, which has help me to clarify some ideas. The below presentation does not intend to be exhaustive on the subject, but could perhaps bring you some useful insights.
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaEdureka!
YouTube: https://youtu.be/xtOg44r6dsE
(** Python Data Science Training: https://www.edureka.co/python **)
In this PPT on Supervised vs Unsupervised vs Reinforcement learning, we’ll be discussing the types of machine learning and we’ll differentiate them based on a few key parameters. The following topics are covered in this session:
1. Introduction to Machine Learning
2. Types of Machine Learning
3. Supervised vs Unsupervised vs Reinforcement learning
4. Use Cases
Python Training Playlist: https://goo.gl/Na1p9G
Python Blog Series: https://bit.ly/2RVzcVE
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Big Data & Text Mining: Finding Nuggets in Mountains of Textual Data
Big amount of information is available in textual form in databases or online sources, and for many enterprise functions (marketing, maintenance, finance, etc.) represents a huge opportunity to improve their business knowledge. For example, text mining is starting to be used in marketing, more specifically in analytical customer relationship management, in order to achieve the holy 360° view of the customer (integrating elements from inbound mails, web comments, surveys, internal notes, etc.).
Facing this new domain I have make a personal research, and realize a synthesis, which has help me to clarify some ideas. The below presentation does not intend to be exhaustive on the subject, but could perhaps bring you some useful insights.
Slides for the class, From Pattern Matching to Knowledge Discovery Using Text Mining and Visualization Techniques, presented June 13, 2010, at the Special Libraries Association 2010 annual meeting.
SA2: Text Mining from User Generated ContentJohn Breslin
ICWSM 2011 Tutorial
Lyle Ungar and Ronen Feldman
The proliferation of documents available on the Web and on corporate intranets is driving a new wave of text mining research and application. Earlier research addressed extraction of information from relatively small collections of well-structured documents such as newswire or scientific publications. Text mining from the other corpora such as the web requires new techniques drawn from data mining, machine learning, NLP and IR. Text mining requires preprocessing document collections (text categorization, information extraction, term extraction), storage of the intermediate representations, analysis of these intermediate representations (distribution analysis, clustering, trend analysis, association rules, etc.), and visualization of the results. In this tutorial we will present the algorithms and methods used to build text mining systems. The tutorial will cover the state of the art in this rapidly growing area of research, including recent advances in unsupervised methods for extracting facts from text and methods used for web-scale mining. We will also present several real world applications of text mining. Special emphasis will be given to lessons learned from years of experience in developing real world text mining systems, including recent advances in sentiment analysis and how to handle user generated text such as blogs and user reviews.
Lyle H. Ungar is an Associate Professor of Computer and Information Science (CIS) at the University of Pennsylvania. He also holds appointments in several other departments at Penn in the Schools of Engineering and Applied Science, Business (Wharton), and Medicine. Dr. Ungar received a B.S. from Stanford University and a Ph.D. from M.I.T. He directed Penn's Executive Masters of Technology Management (EMTM) Program for a decade, and is currently Associate Director of the Penn Center for BioInformatics (PCBI). He has published over 100 articles and holds eight patents. His current research focuses on developing scalable machine learning methods for data mining and text mining.
Ronen Feldman is an Associate Professor of Information Systems at the Business School of the Hebrew University in Jerusalem. He received his B.Sc. in Math, Physics and Computer Science from the Hebrew University and his Ph.D. in Computer Science from Cornell University in NY. He is the author of the book "The Text Mining Handbook" published by Cambridge University Press in 2007.
This documents provides a glance on the planning activity by illustrating the associated concepts such as,
Nature of Planning
Importance and purpose of planning process
Steps in Planning and Planning Premises
Types of plans
Objectives
Decision Making
Types of planning
Hierarchy of plans
The slide provides an insight towards management activities like directing, leadership, communication, coordination and controlling. The slide also illustrates the scheduling of activities in a network and computation of critical path.
Emotion detection from text using data mining and text miningSakthi Dasans
Emotion detection from text using data mining and text mining
Based on research paper published by Faculty of Engineering, The University of Tokushima at IEEE 2007 we build an intelligent system under the title Emotelligence on Text to recognize human emotion from textual contents.
i.e. if you give an input string , our system would possibly able to say the emotion behind that textual content.
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Post 1What is text analytics How does it differ from text mini.docxstilliegeorgiana
Post 1:
What is text analytics? How does it differ from text mining?
Text Analytics is applying of statistical and machine learning techniques to be able to predict /prescribe or infer any information from the text-mined data. Text mining is a tool that helps in getting the data cleaned up.Text analytics and text mining approaches have essentially equivalent performance. Text analytics requires an expert linguist to produce complex rule sets, whereas text mining requires the analyst to hand-label cases with outcomes or classes to create training data.
Differences between Text Mining and Text Analytics:
• Text Mining and Text Analytics solve the same problems, but use different techniques and are complementary ways to automatically extract meaning from text.
• Text Analytics is developed within the field of computational linguistics. It has the ability to encode human understanding into a series of linguistic rules which are generated by humans are high in precision, but they do not automatically adapt and are usually fragile when tried in new situations.
• Text mining is a newer discipline arising out of the fields of statistics, data mining, and machine learning. Its strength is the ability to inductively create models from collections of historical data. Because statistical models are learned from training data they are adaptive and can identify “unknown unknowns”, leading to the better recall. Still, they can be prone to missing something that would seem obvious to a human.
• Text analytics and text mining approaches have essentially equivalent performance. Text analytics requires an expert linguist to produce complex rule sets, whereas text mining requires the analyst to hand-label cases with outcomes or classes to create training data.
• Due to their different perspectives and strengths, combining text analytics with text mining often leads to better performance than either approach alone.
2. What technologies were used in building Watson (both hardware and software)?
Watson is an extraordinary computer system (a novel combination of advanced hardware an software) designed at answering questions posed in natural human language.Watson is an artificially intelligent computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's first CEO and industrialist Thomas J. Watson. The computer system was specifically developed to answer questions on the quiz show Jeopardy! In 2011, Watson competed on Jeopardy! against former winners Brad Rutter and Ken Jennings.
Watson received the first prize of $1 million.The goal was to advance computer science by exploring new ways for computer technology to affect science, business, and society.IBM undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV quiz show Jeopardy!The extent of the challenge in ...
Post 1What is text analytics How does it differ from text minianhcrowley
Post 1:
What is text analytics? How does it differ from text mining?
Text Analytics is applying of statistical and machine learning techniques to be able to predict /prescribe or infer any information from the text-mined data. Text mining is a tool that helps in getting the data cleaned up.Text analytics and text mining approaches have essentially equivalent performance. Text analytics requires an expert linguist to produce complex rule sets, whereas text mining requires the analyst to hand-label cases with outcomes or classes to create training data.
Differences between Text Mining and Text Analytics:
• Text Mining and Text Analytics solve the same problems, but use different techniques and are complementary ways to automatically extract meaning from text.
• Text Analytics is developed within the field of computational linguistics. It has the ability to encode human understanding into a series of linguistic rules which are generated by humans are high in precision, but they do not automatically adapt and are usually fragile when tried in new situations.
• Text mining is a newer discipline arising out of the fields of statistics, data mining, and machine learning. Its strength is the ability to inductively create models from collections of historical data. Because statistical models are learned from training data they are adaptive and can identify “unknown unknowns”, leading to the better recall. Still, they can be prone to missing something that would seem obvious to a human.
• Text analytics and text mining approaches have essentially equivalent performance. Text analytics requires an expert linguist to produce complex rule sets, whereas text mining requires the analyst to hand-label cases with outcomes or classes to create training data.
• Due to their different perspectives and strengths, combining text analytics with text mining often leads to better performance than either approach alone.
2. What technologies were used in building Watson (both hardware and software)?
Watson is an extraordinary computer system (a novel combination of advanced hardware an software) designed at answering questions posed in natural human language.Watson is an artificially intelligent computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's first CEO and industrialist Thomas J. Watson. The computer system was specifically developed to answer questions on the quiz show Jeopardy! In 2011, Watson competed on Jeopardy! against former winners Brad Rutter and Ken Jennings.
Watson received the first prize of $1 million.The goal was to advance computer science by exploring new ways for computer technology to affect science, business, and society.IBM undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV quiz show Jeopardy!The extent of the challenge in ...
Text Mining is the technique that helps users to find out useful information from a large amount of text documents on the web or database. Most popular text mining and classification methods have adopted term-based approaches. The term based approaches and the pattern-based method describing user preferences. This review paper analyse how the text mining work on the three level i.e sentence level, document level and feature level. In this paper we review the related work which is previously done. This paper also demonstrated that what are the problems arise while doing text mining done at the feature level. This paper presents the technique to text mining for the compound sentences.
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...kevig
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
In recent years the growth of digital data is increasing dramatically, knowledge discovery and data mining have attracted immense attention with coming up need for turning such data into useful information and knowledge. Keyword extraction is considered an essential task in natural language processing (NLP) that facilitates mapping of documents to a concise set of representative single and multi-word phrases. This paper investigates using of Word2Vec and Decision Tree for keywords extraction from textual documents. The Sem-Eval (2010) dataset is used as a main input for the proposed study. The words are represented by vectors with Word2Vec technique following applying pre-processing operations on the dataset. This method is based on word similarity between candidate keywords from both collecting keywords for each label and one sample from the same label. An appropriate threshold has been determined by which the percentages that exceed this threshold are exported to the Decision Tree in order to consider an appropriate classification to be taken on the text document.
Some similarity measurements were used for the classification process. The efficiency and accuracy of the algorithm was measured in the process of classification using precision, recall and F-score rates. The obtained results indicated that using of vector representation for each keyword is an effective way to identify the most similar words, so that the opportunity to recognize the correct classification of the document increases. When using word2Vec CBOW the result of F-Score was 64% with the Gini method and WordNet Lemmatizer. Meanwhile, when using Word2Vec SG the result of F-Score was 82% with Gini Index and English Porter Stemming which considered the highest ratio for all our experiments.
http://sites.google.com/site/ijcsis/
https://google.academia.edu/JournalofComputerScience
https://www.linkedin.com/in/ijcsis-research-publications-8b916516/
http://www.researcherid.com/rid/E-1319-2016
To provide relevant data to users form massive data available on web the Semantic Web technique is used. This presentation gives introduction of semantic web and how NLP can be used in it.
text mining, data mining, machine learning, unstructured data, big data, database, data warehouse, text mining (industry), research (industry), text analysis, text, text analytics, unstructured, data science, structured data, advanced analytics, what is data mining, data mining lecture, data mining techniques, information, learning from data, computre technolog, technology, data process, data mining tutorial,
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Prakhyath Rai
Software Engineering - Part 2 which describes the following topics:
Introduction, Modelling Concepts and Class Modelling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO Modelling history. Modelling as Design technique: Modelling, abstraction, The Three models. Class Modelling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams.
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data Modelling Concepts.
Software Engineering - Introduction + Process Models + Requirements EngineeringPrakhyath Rai
Software Engineering - Part 1 which describes the following topics:
Introduction: The evolving role of software, The changing nature of software, Software engineering, A Process Framework, Process Patterns, Process Assessment, Personal and Team Process Models, Process Technology, Product and Process.
Process Models: Prescriptive models, Waterfall model, Incremental process models, Evolutionary process models, Specialized process models.
Requirements Engineering: Requirements Engineering Task, Initiating the Requirement Engineering process, Eliciting Requirements, developing use cases, Building the analysis model, Negotiating Requirements, Validating Requirements, Software Requirement Document.
Ethics, Professionalism and Other Emerging TechnologiesPrakhyath Rai
The Slide focusses on providing insights on following topics,
* Technology and Ethics
* Digital Privacy
* Accountability and Trust
* Threats and challenges
* Block Chain Technology
* Cloud and Quantum Computing
* Autonomic Computing
* Computer Vision
* Cyber Security
* 3D Printing
The Slide focusses on providing insights on following topics,
* Overview of IoT
* History of IoT
* Advantages of IOT
* Challenges of IOT
* Architecture of IOT
* Devices and Network
* Applications of IOT
* IOT Tools and Platforms
The slide helps to get an insight on the concepts of Artificial Intelligence.
The topics covered are as follows,
* Concept of AI
* Meaning of AI
* History of AI
* Levels of AI
* Types of AI
* Applications of AI - Agriculture, Health, Business (Emerging market), Education
* AI Tools and Platforms
The slide aids to understand and provide insights on the following topics,
* Overview for Data Science
* Definition of Data and Information
* Types of Data and Representation
* Data Value Chain - [ Data Acquisition; Data Analysis; Data Curating; Data Storage; Data Usage ]
* Basic concepts of Big Data
Emerging Exponential Technologies - History & IntroductionPrakhyath Rai
The Slide focusses on providing insights on following topics,
* Evolution of Technologies
* Introduction to Industrial Revolution
* Historical Background of the Industrial Revolution
* Introduction to Fourth Industrial Revolution (IR 4.0)
* Role of Data for Emerging Technologies
* Enabling Devices and Networks for Emerging Technologies (Programmable Devices)
* Human to Machine Interaction
* Future Trends in Emerging Technologies
The document provides an introductory glimpse of management and the history of its evolution.The document also illustrates the nature, characteristics and importance of management. Various levels of management and managerial skills are also illustrated. The document also provides information to distinguish management and administration. Certain theories of pioneers are also mapped onto the document.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
2. Outline
Introduction
Data Mining Vs. Text Mining
Motivation for Text Mining
I/O Model for Text Mining
Steps for Text Mining
Key Terms in Text Mining
Text Mining Frameworks
Merits of Text Mining
Applications of Text Mining
Demerits of Text Mining
References
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
3. Introduction
Text Mining is a Discovery
Text Mining is also referred as Text Data Mining (TDM)
and Knowledge Discovery in Textual Database (KDT).
Text Mining is used to extract relevant information or
knowledge or pattern from different sources that are in
unstructured or semi-structured form.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
4. Introduction Cont.
Extract and discover knowledge hidden in text
automatically
Aid domain experts by automatically:
identifying concepts
extracting facts/relations
discovering implicit links
generating hypotheses
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
5. Data Mining vs. Text Mining
Data Mining Text Mining
Process directly Linguistic processing or natural
language processing (NLP)
Identify causal relationship Discover heretofore unknown
information
Structured Data Semi-structured & Unstructured
Data (Text)
Structured numeric transaction
data residing in rational data
warehouse
Applications deal with much
more diverse and eclectic
collections of systems and
formats
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
6. Motivation for Text Mining
Approximately 90% of the world’s data is held in
unstructured formats (source: Oracle Corporation)
Information intensive business processes demand that we
transcend from simple document retrieval to “knowledge”
discovery.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
7. Input-Output Model for Text Mining
Input
Text Mining
Technique
Output
Patterns
Connections
Trends
Documents
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
8. Steps for Text Mining
Pre-Processing the Text
Applying Text Mining Techniques
Summarization
Classification
Clustering
Visualization
Information Extraction
Analyzing the Text
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
9. Keywords Terms in Text Mining
Information Extraction (IE)
The science of searching for
Information in documents
Documents themselves
Metadata which describe
documents
Text, sound, images or data,
within database: relational
stand-alone database or
hypertext networked
databases such as the
Internet or intranets.
Artificial Intelligence (AI)
Artificial intelligence
(AI) is a branch of
computer science and
engineering that deals
with intelligent behavior,
learning, and adaptation
in machines.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
10. Merits of Text Mining
Database limits itself to Storage of less Information
whereas Text Mining overcomes this limitation
Extraction of relevant Information and Relationships
from Natural Documents
Extraction of Information from Unstructured or Semi-
structured Documents
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
11. Applications of Text Mining
Analysis of Market Trends
Classification Technique
Information Extraction Technique
Analysis and Screening of Junk Emails
Classification on the basis of pre-defined frequently
occurring items
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
12. Demerits of Text Mining
Requires Initial Learned Information System for
Initial Extraction
Suitable programs are not been defined to Analyze
Text from Mining Knowledge or Information
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
13. References
[1] R Baeza-Yates and B Ribeiro-Neto. “Modern Information Retrieval”, ACM
Press, New York, 1999.
[2] Ning Zhong, Yuefeng Li and T. Grance, “Effective Pattern Discovery for Text
Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 1,
January 2012.
[3] Raymond J Mooney and Un Yong Nahm, “ Text Mining with Information
Extraction”, Proceedings of the 4th International MIDP Colloquium, pages 141-
160, Van Schaik Pub., South Africa, 2005.
[4] M E Califf and R J Mooney, “Relational Learning of Pattern-Match Rules for
Information Extraction”, Proceedings of the 16th National Conference on Artificial
Intelligence (AAAI-99), pages 328-334, Orlando, FL, July 1999.
[5] D Freitag and N Kushmerick, “Boosted Wrapper Induction”, Proceedings of
the 17th National Conference on Artificial Intelligence (AAAI-2000), pages 577-
583, Austin, TX, July 2000.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007