Building Tools to Data Mine Unstructured Text using a Machine Learning API
 

Building Tools to Data Mine Unstructured Text using a Machine Learning API

on

  • 32,835 views

ai-one's Topic-Mapper API enables programmers to develop software that can learn like a human. This presentation describes how to build an application to mine data from unstructured text using a ...

ai-one's Topic-Mapper API enables programmers to develop software that can learn like a human. This presentation describes how to build an application to mine data from unstructured text using a combination of machine learning, natural language processing (NLP) and clustering technologies.

The source code for this text analytics program is available as a reference design for others to modify to meet specific industry and use case requirements.

Statistics

Views

Total Views
32,835
Views on SlideShare
6,660
Embed Views
26,175

Actions

Likes
16
Downloads
98
Comments
1

28 Embeds 26,175

http://www.ai-one.com 25562
http://www.scoop.it 296
http://www.linkedin.com 112
http://translate.googleusercontent.com 66
http://tweets.zegosoft.com 29
https://twitter.com 23
http://abtasty.com 15
http://kred.com 12
http://www.facebook.com 11
http://webcache.googleusercontent.com 8
https://www.facebook.com 8
http://us-w1.rockmelt.com 6
http://131.253.14.66 5
http://ai-one.web01.trentsetter.ch 3
http://207.46.192.232 3
http://131.253.14.98 3
http://www.google.com.au 2
http://www.ai-one.com. 1
https://www.google.ch 1
http://paper.li 1
https://www.google.com 1
http://translate.google.ch 1
https://m.facebook.com&_=1358171279789 HTTP 1
https://m.facebook.com&_=1358171121076 HTTP 1
http://statpedia.com 1
http://getpocket.com 1
http://thisninja 1
http://www.google.at 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Really it is too good! We can learn so many things from this site!!!
    Thanks a lot !!!!
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Building Tools to Data Mine Unstructured Text using a Machine Learning API Presentation Transcript

  • 1. ai-one Outline of discussion ™ Topic-Mapper: ai-one for Text biologically inspired intelligenceOverview of ai-BrowserBuilding Tools to Data MineUnstructured Text using aMachine Learning API Learn | Solve | Evolve | Inspire © ai-one inc. 2011February 2012
  • 2. Agenda• ai-one technology & company• Topic-Mapper: A machine learning API• ai-Browser: A prototype application – Research tool for knowledge workers – Combines NLP and other technologies – Enables machine-human collaboration – Reference design can be modified for specific domains – Source code included with Topic-Mapper SDK Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 3. biologically inspired intelligence logic creativity Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 4. ai-one | where we are Seeking early-adopter customers who will use technology to gain competitive advantage. Positioned as a “easy step” to build mainstream artificial intelligence applications to understand unstructured text. Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 5. ai-one | our business model • Technology licensing ONLY – NO professional services – NO end-user applications • We focus on continuous evolution of core technology • Our consulting and OEM partners focus on application of technology to solve problems Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 6. ai-one’s biologically inspired intelligence Our technology “learns” dynamically like you do, enabling you to extract the inherent intelligence from any contentTextdocs, associationstwitter,RSS, FBfeeds, relevancedata patterns Reading Intelligence Learning Learn | Solve | Evolve | Inspire © ai-one inc. 2011
  • 7. our technology | ai-one description ai-one’s technology is an adaptive holosemantic dataspace (“biologically inspired intelligence”) that allows users to quickly analyze and discover meaningful patterns of interleaved text, time related data, and images. The holosemantic dataspace (HSDS) provides complex AI with reasoning and learning capability. … it provides answers to questions you didnt know you wanted to ask…. Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 8. ai-one SDKs | APIs for building learning machinesThree products: Each optimized for the unique “grammar” of each type ofdata. Topic-Mapper for Text • Text analytics • Genomics Available NOW UltraMatch API for Computer Vision • Image recognition • Robotics Estimated availability July 2012 Graphalizer ai-one “Sensors” for Signal Processing HSDS • Financial markets • N-dimensional Time series Text, Images, Signal Processing Smallest Input = Data Quant Planned for 2013 Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 9. the fundamental theory our universal sales proposition biologically inspired intelligence • Self-optimized information processing • Self-controlled content organization • Multiple higher-order concept formation • Autonomic learning via recognition of multiple contexts • Self-generalization of learned concepts Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 10. big discovery | new form of machine learning Holosemantic dataspace (HSDS) • Operates at byte-level • data agnostic • Any language, any sensor, any type of digital image • “Listens” to data • Records every unique byte pattern only once • Heter- and hierarchical structures, temporal & spatial • Detects how every byte pattern relates to every other • Autonomic  no training or human intervention • Modeled on neurophysiology Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 11. BII | a disruptive, revolutionary inventionArtificial Intelligence Biologically Inspired Intelligence• Questionable reputation • New approach; new to market• Widely used, nobody knows • First to market w/ SDK• Mostly used in static areas • Usefully in dynamic areas where where models, data & requests models, data and requests change do not change fast rapidly• Needs domain expertise to • No domain expertise needed. setup and host, needs a lot of Less than a day to train developer. mentoring Less than a day to build apps.• Behavior of the solutions are not • Behavior is very similar to how a very consistent and close to human would behave or decide. human behavior or decisions Learn | Solve | Evolve | Inspire © ai-one inc. 2011
  • 12. Topic-Mapper | our product for human languages • Generates lightweight ontology • Contextual learning • Finds patterns • Easy to combine with Natural Language Processing (NLP) and other technologies • Provides inherent semantic associative search and phonetic analysis • Human language independent • Requires only basic structuring of input text (XML) • Ongoing/incremental learning • Works with and without external ontologies (RDF, OWL, etc.) Learn | Solve | Evolve | Inspire © ai-one inc. 2011
  • 13. Topic-Mapper | benefits … language is not math …. 1. Detects more words of higher relevance 2. Faster processing the corpus 3. Much faster incremental updates 4. Enables NLP to find patterns and learn without human intervention 5. Works on very small data sets (e.g. Tweets) = Faster implementation of semantic solutions Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 14. Topic-Mapper | technical description• ai-one core Text library (out-of-process COM server) – .NET 3.5 CLR wrapper (dll)• Small footprint instantiation (<700k)• API documentation & developer’s guide• Code examples – REST & SOAP deployments – BrainBrowser application for pattern search on Internet – BrainView application to visualize lightweight ontologies (LWO)• BrainBoard workbench application for rapid proof of concept development• Text focused support libraries and tools to assist in text preparation, processing, parsing, and loading into ai-one Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 15. BrainBoard | Topic-Mapper prototyping tool Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 16. Topic-Mapper | semantic commandsAssociationreturns the associative network for semanticcorrelation with the (one or more) input words;referred to as "brainstorm“AssociationReversethe inverse of Association; referred to as "focus“AssociationCheckreturns a list of all associative paths betweentwo input words (source and target); Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 17. Topic-Mapper | semantic commandsKeyWordsGiven a pointer to a context, return thewords and a score indicating thesemantic significance between thewords and information contained withinthe context.PhoneticReturns list of words with phonetic similarity to the input word; includes ascore for each word.StatisticReturns frequency counts for input word; counts total occurrences, subtotal bystructures and includes handles for each structure. Learn | Solve | Evolve | Inspire © ai-one inc. 2011
  • 18. Topic-Mapper SDK | teaching commands • StopWords{Get|Set|Erase}: maintenance of a stop word list. stop words are words found in the dataspace, but not used for any of the semantic commands. • Context{Get|Set|Erase|Find}: maintenance of contexts; contexts are bags of words which, by definition, have a strong relation among themselves. • ContextTighten: increases the semantic relation within the reference handle • Relation{Get|Set|Erase|Find}: maintenance of relational triple: subject, object and predicate. Used to teach explicit relationships from entities like thesauri, taxonomies, and ontologies. Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 19. ai-Browser | combines NLP with Topic-Mapper to find, compare, understand documents Select Article Extract raw text Display text for selected Keyword Associations for selected Keyword Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 20. ai-Browser | architecture Combines four technologies to understand complex behavior and language. Works with any NLP API “language is not math; rules-(e.g., OpenNLP or NLTK) NLP based and statistical approaches fail to deliver when the problemWorks with any OWL, is complex, chaotic or data isRDF or unstructured(free form text) as an very dynamic” ontology Topic- Mapper Works with any tool that Ontology Clustering can read XGMML, or PMML file (Cytoscape, MATLab, etc,) Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 21. ai-Browser | Use CasesProves that technology can find the single best answer from millionsof choices by combining human knowledge (free text ontology) with areference point.• Medical research (PubMed)• Finding the best job candidate (LinkedIn)• Finding the ideal matching item in classifieds (CraigList)• Create searchable topic maps for conversations (Twitter, talk radio)Transforms search into research. Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 22. ai-Browser | What’s Inside?Finds what you need, not what you ask for.. Filter • Categorize content based on rules • NLP is trained to understand parts of speech (OpenNLP) • Manually updated and developed Find Relationships • Get all connections between all words • Identify Keywords (Topic-Mapper) • Identify Associations Add Knowledge • Define concepts that are important to you. (Free Form Text or Ontology) • Introduce additional knowledge. • Learn from external sources. Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 23. NLP | Processes Structured Language • Categorizing words into parts of speech • Provides rules of grammar for language • Enables machine to understand structure of language • Provides named-entity extraction • Filtering of Topic-Mapper results • Domain lexiconai-Browser uses NLP to pre-process text to isolate nouns,verbs and modifiers Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 24. ontology | provides domain expertise• Enables faster incremental learning, precision on small data sets• Add enterprise and public domain knowledge• Add user generated knowledge to enhance desired patterns• Generate LWOs from documents(s)• Increases pattern and term relevance for higher keyword rankings for search engines ai-Browser uses ontologies to sharpen results – especially valuable for small texts (like tweets) Learn | Solve | Evolve | Inspire © ai-one inc. 2011
  • 25. clustering | finds similarity of meaning • Enables customers to develop proprietary models • Data mining applications • Enables graph analysis using off-the-shelf tools using XGMML and/or PMML representation of HSDS • Useful for: – Reporting – Visualizing light-weight ontology – Comparing multiple documents – Knowledge representationai-Browser works with many analytical tools to post-process into clusters for reporting, further analysis, etc.We use Cytoscape.org! (but you can use others) Learn | Solve | Evolve | Inspire © ai-one inc. 2011
  • 26. Topic-Mapper | roadmap• ai-one develops series of minimally viable products (MVP) to generate customer interest• ai-one licenses source code to others to modify• Extends functionality by compensating for the “Fail Modes” of existing technologies• Potential MVP Applications – Automated ontology builder – Data mining free form text (AI search) – Data aggregation – Data cleansing – Automated RDF tagging – Genome sequence assembly & analysis Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 27. Topic-Mapper | technical limitations• 32-bit, single thread with 4 GB capacity per instance• Only knows what you feed it – Machine learning ≠ computer programming – Influence results… not control them• Deployment options to overcome limitations: – Moving windows or full-capture – Series, parallel or single instance – REST or SOAPNext Step: 64-bit, multithread version to be released in mid-2012 with 18 exabyte capacity/instance Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 28. ai-one | success stories • Early customers – Manor, SwissPort, BKA, global telecom carrier, others. • Significant funding to maintain long-term focus on transitioning invention into innovation • Our technology shows promise to disrupt markets – Data mining – Text analytics – Bioinformatics – Knowledge management – Personalized medicine – Behavioral marketing Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 29. ai-one | how to start using technology1. Contact ai-one to schedule personalized demonstration2. Develop use cases detailing data sources, problem sets and desired outcomes3. Attend ai-one training seminar4. Refine use cases and project plan5. License ai-one technology & source code for sample application(s)6. Build your app!Most developers can build an app within a day. Learn | Solve | Evolve | Inspire © ai-one inc. 2011
  • 30. ai-one | summary • Big Idea  Machines can learn like humans! • Startup with solid funding & initial customers • “Lean startup” model to develop customers • Technology is a general use technology – not an end- user application. • Extends capabilities of existing programming languages. • Takes less than a day for a developer to start building apps – but requires a “different way of thinking” Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 31. ContactOlin HydeBusiness Developmentoh@ai-one.comai-one inc.World Headquarters ai-one ag ai-one gmbh5711 La Jolla Blvd., Flughofstrasse 55, Koenigsallee 35a,Bird Rock Zürich-Kloten GrunewaldLa Jolla, CA 92037 8152 Glattbrugg 14193 BerlinUnited States of America Switzerland Germanycell: +13232365938 cell: +41794000589 cell: +4915112830531direct:+18583815897 main: +41448284530 main: +493047890050 Learn | Solve |main: +18583641951 Evolve | Inspire© ai-one inc. 2011© ai-one inc. 2011
  • 32. ai-one |additional information & casestudies Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 33. ai-one | additional information YouTube Channel Demos & Videos: http://www.youtube.com/user/semsys MIT Forum Jan 17, 2012 Presentation: http://prezi.com/k1hsog309uji/ai-one- presentation-to-mit-enterprise-forum/ Case Study and Technical Evaluation in International Journal of Knowledge Management (Sept 2011): http://www.irma-international.org/viewtitle/56362/ Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 34. Case Study - SEMPER Project Concept Based Retrieval and Lightweight Ontologies The SEMPER Team is creating an interactive, web based platform for out-patient assistance for alcohol dependency and work related disorders. "Learning a Lightweight Ontology for Semantic Retrieval in Patient-Center Information Systems". Prof. Dr. Ulrich Reimer, University of Applied Sciences St. Gallen et al. In this paper Prof. Reimer describes the use of ai-one (Association command) to learn associated nets of related terms to build „lightweight ontologies” and then how they created “seed concepts” of over lapping related terms with the teaching commands to give the content a notion of relevance. A keyword query then resulted in the return of content that included related concepts. The paper also describes the testing of the ai-one approach versus the classical cosine similarity measure on a tf-idf document term matrix. Learn | Solve | Evolve | Inspire© ai-one inc. 2011
  • 35. ASTIS™ Automatic Shoe Track Information System Followed by test with University of Lausanne, CSI LAB. Learn | Solve | Evolve | Inspire© ai-one inc. 2011© ai-one inc. 2011
  • 36. Genome SQ JV with ibionics / UNI Wildau Improving the matching quality and dramatically increasing the speed per analysis by using ai-one™ technology for pattern analysis and matching. In addition the HSDS can be used as cellarer data space for medical prognostics. The HSDS offers a perfect environment for modeling and weather report for patient health: “In Silico Care Cycles”! Learn | Solve | Evolve | Inspire© ai-one inc. 2011