Text mining and data mining

  • 570 views
Uploaded on

These slides explain the basic meaning of text mining,its comparision with other data retrieval methods,its subtasks and applications, limitations, present and future of text mining. Also included is …

These slides explain the basic meaning of text mining,its comparision with other data retrieval methods,its subtasks and applications, limitations, present and future of text mining. Also included is the topic data mining with its goals and applications.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
570
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
35
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The next step in Search Technology
  • 2. OUTLINE What is Text Mining? What is unstructured text Need for Text Mining? Text Mining sub tasks  Applications of text mining  Barriers  Today of text mining  Tomorrow of text mining Data Mining Goals of Data Mining What can Data mining do?
  • 3. Text Mining v/s. Data Mining Web Mining Information Retrieval
  • 4. Text retrieval Information is retrieved so as to fulfill the needs of customers. Does not discover anything new about the query. IRS find the result from a large database by matching the query. E.g.: the search engines, which identify the relevant documents according to a given set of words on www.
  • 5. IE is the process of automatically extracting structured data from unstructured machine readable codes. It highly relies on Natural Language Processing systems. Natural Language Processing It converts samples of human language into formal representation which can be understood by the computer. Its types are: Natural Language Generation System Natural Language Understanding System Information extraction
  • 6. Spam filtering • A spam filter is a program that is used to detect unwanted email and prevent those messages from getting to a user's inbox. Sophisticated program, such as Bayesian filters , attempt to identify spam through suspicious word patterns or word frequency . • Bayesian spam filtering :It identifies spam e-mail through suspicious word patterns or word frequency. Applications of Text Mining
  • 7. Creating suggestion and recommendations • Text mining helps customers in providing suggestions for online stores such as amazon, based on their interests. The prediction algorithms are of huge importance to online stores -the more accurate they are, the more the online store will sell. • A large online store like Amazon may have millions of customers and millions of items in stock. New customers will have limited information about their preferences, while more established customers may have too much. • The data on which these algorithms work is constantly updated and changed. Customers are browsing the site and the prediction algorithm should take the recently browsed items into consideration. • Traditionally, these recommendation algorithms have worked by finding similar customers in the database.
  • 8. Barriers that we need to overcome to make best use of text mining tools in the future: 1) Text mining is a complex technical process that requires skilled staff. 2) It requires unrestricted access to information sources. 3) Copyright can be a barrier.
  • 9. • Text mining is already producing efficiencies and new knowledge in areas as diverse as biological science, particle physics, media and communications. It has been used to hypothesise the causes of rare diseases and how pre-existing drugs could be used to target different diseases. • The technique was also used recently to analyse the vast amount of text produced on websites, blogs and social media such as Twitter - where copyright holders allowed - and showed that the messages exchanged on Twitter during the English riots of 2011 were not to blame for inciting riots. • The business benefit of text mining is in identifying emerging trends, and to explore consumer preferences and competitor developments. Text mining is particularly used in larger companies as part of their customer relationship management strategy and in the pharmaceutical industry as part of their research and development strategy. Today of Text Mining
  • 10. Text mining has been garnering a significant amount of importance in recent years, creating a strong industrial impact. Based on this observation, it is evident that the future of text mining companies would be promising in the coming years. The age of innovation for this is not over. It is, therefore, unmistakable that in the years to come many new doors and exciting opportunities will open up through the advanced text mining services offered by various professional text mining companies
  • 11. DATA MINING It is the process of discovering interesting knowledge, such as patterns, associations, changes, anomalies and significant structures from large amount of data stored in databases, data warehouses or other information repositories. Why Data mining? Due to wide availability of huge amounts of data in electronic forms and the imminent need to turn such data into useful information and knowledge for broad applications including business management, decision report, market analysis and decision report data mining has attracted a great deal of attention in information industry in recent years.
  • 12.  Prediction: how certain attributes within the data will behave in the future.  Identification : identify the existence of an item, an event, an activity.  Classification: partition the data into categories.  Optimization: optimize the use of limited resources. Goals of Data Mining
  • 13. Application of Data Mining Marketing:  analysis of human behavior.  advertising campaigns.  targeted mailings  segmentation of customers, stores or products. Finance:  creditworthiness of clients.  performance analysis of finance investments.  fraud detection
  • 14. Manufacturing:  optimization of resources.  optimization of manufacturing processes.  product design based on customer requirements. Healthcare:  discovering patterns in X-ray images.  analyzing the side effects of drugs.  analyzing the effectiveness of treatments Continued
  • 15. References 1) http://en.wikipedia.org/wiki/Text_mining 2) http://www.cs.waikato.ac.nz/~ihw/papers/04-IHW- Textmining.pdf 3)http://comminfo.rutgers.edu/~msharp/text_mining.htm 4)http://www.cs.sunysb.edu/~cse634/presentations/TextMining .pdf 5)http://www.mpi-inf.mpg.de/yago-naga/yago/demo.html 6)http://searchbusinessanalytics.techtarget.com/definition/tex t-mining
  • 16. By: Bhawana