LIGHTNING TALKS
Powered by Lucene:
IBM Content Analytics with Enterprise Search




Wolfgang Jung



Barcelona, 19th October 2011               © 2011 IBM Corporation
IBM Content Analytics with Enterprise Search



Our agenda in the next 10 minutes
LIGHTNING TALKS
    IBM is commited to Open Source
     – Decade of contribution to the community.

    Adoption of Apache Lucene to IBM Content Analytics
    – The Why, What & examples.

    Demonstration of IBM Content Analytics
    – see the development results live.
               Be enlightened !

2                                                  © 2011 IBM Corporation
IBM Content Analytics with Enterprise Search



IBM is commited to Open Source

    Decade of lineage and contributions to the open source community
      – Apache Hadoop.
          IBM‘s use of BigIndex for Search is mention in Chuck Lams‘s “Hadopp in Action”
      – Apache Derby
      – Apache Geronimo and Jetty
      – Eclipse: Founded by IBM, PMC Board of Directors
      – Apache UIMA: Unstructured Information Management Architecture.
          Developed by IBM, Contributed to Apache
      – Apache Jakarta: Lucene. PMC members
          Significant contributions via IBM Lucene Extension Library (ILEL)
      – Linux ... and more!


3                                                                                  © 2011 IBM Corporation
IBM Content Analytics with Enterprise Search



Adoption of Apache Lucene
to IBM Content Analytics with Enterprise Search
    The use of UIMA is existing since first release in 2005 of IBM OmniFind and later
    IBM Content Analytics, continued into today‘s IBM Content Analytics with
    Enterprise Search
         http://www-01.ibm.com/software/data/content-management/analytics/uima.html


    IBM‘s decision for the use of Lucene
      –Index is a common technology and better to improve
      –lower cost of maintenance
      –advantage in incremental indexing
      –extensibility



4                                                                                     © 2011 IBM Corporation
IBM Content Analytics with Enterprise Search



Adoption of Apache Lucene
to IBM Content Analytics with Enterprise Search
    IBM is a very active contributor. Look for PMC members:
      –Michael McCandless; Shai Erera; Doron Cohen
         http://lucene.apache.org/who.html

    IBM extended Lucene based on our needs. Two examples already
    contributed to community :
      –Query Parser
      –Facets




5                                                             © 2011 IBM Corporation
IBM Content Analytics with Enterprise Search



Adoption of Apache Lucene
to IBM Content Analytics with Enterprise Search
    On 13th December 2006, IBM and Yahoo! announced IBM OmniFind Yahoo! Edition, as
    “no-cost, entry level enterprise search product developed to help eliminate financial and
    technology barriers to intranet and Web search.”
         http://www-03.ibm.com/press/us/en/pressrelease/20767.wss

    This technology included Lucene as index technology and had full support by IBM
      – 45,000+ downloads from the website http://omnifind.ibm.yahoo.net
      – IBM support contracts for clients with “IBM Elite Support for OmniFind Yahoo Edition“
      – Below 15 incidents regarding index technology


    Technology is seen as success for IBM




6                                                                                               © 2011 IBM Corporation
IBM Content Analytics with Enterprise Search


Content Analytics generates new insights and aggregates key
findings gathered from large data volumes in a visualized form

                                                          Extracted Concept
                                                        Claimant: Soft Tissue Injury
                                                                                                     Automatic
                                                                                                     Visualizing
                                               Person    Injury    Body Part      Location     Results of concept evaluation
                                                                                                are displayed to the users
                                               Noun      Verb     Noun Phrase    Prep Phrase

                                               Claus sprained his ankle on the step




                                               Analysed documents
                                                 with identified concepts


       Sources of Information
       Internal (ECM, Files, DBMS, etc.)
        and External (Social, News, etc.)




7                                                                                                           © 2011 IBM Corporation
IBM Content Analytics with Enterprise Search




Rapid Insights from Automotive Complaints

    We will be using publically available data from the National Highway Traffic Safety Agency (NHTSA)
    to demonstrate how IBM Content Analytics can be used to identify problems with automobiles.
    NHTSA receives various reports about malfunctions, accidents, and other issues with automobiles
    from dealerships, repair facilities, and from the general public. NHTSA publishes the data at
    http://www.nhtsa.gov. For this demo we have created a collection from the NHTSA “complaints”
    data spanning several years ending in early 2010. We will show how this and similar data can be
    analyzed to arrive at rapid insights not possible by manually reading through the complaint records.




8                                                                                             © 2011 IBM Corporation
IBM Content Analytics with Enterprise Search



See Content Analytics live!




9                                              © 2011 IBM Corporation
IBM Content Analytics with Enterprise Search



See Content Analytics live!




10                                             © 2011 IBM Corporation
IBM Content Analytics with Enterprise Search




                                               Be enlightened !



11                                                                © 2011 IBM Corporation
LIGHTNING TALKS
Powered by Lucene:
IBM Content Analytics with Enterprise Search




Wolfgang Jung



Barcelona, 19th October 2011                   © 2011 IBM Corporation

Lightning talk :IBM Content Analytics with Enterprise Search - Wolfgang Jung

  • 1.
    LIGHTNING TALKS Powered byLucene: IBM Content Analytics with Enterprise Search Wolfgang Jung Barcelona, 19th October 2011 © 2011 IBM Corporation
  • 2.
    IBM Content Analyticswith Enterprise Search Our agenda in the next 10 minutes LIGHTNING TALKS IBM is commited to Open Source – Decade of contribution to the community. Adoption of Apache Lucene to IBM Content Analytics – The Why, What & examples. Demonstration of IBM Content Analytics – see the development results live. Be enlightened ! 2 © 2011 IBM Corporation
  • 3.
    IBM Content Analyticswith Enterprise Search IBM is commited to Open Source Decade of lineage and contributions to the open source community – Apache Hadoop. IBM‘s use of BigIndex for Search is mention in Chuck Lams‘s “Hadopp in Action” – Apache Derby – Apache Geronimo and Jetty – Eclipse: Founded by IBM, PMC Board of Directors – Apache UIMA: Unstructured Information Management Architecture. Developed by IBM, Contributed to Apache – Apache Jakarta: Lucene. PMC members Significant contributions via IBM Lucene Extension Library (ILEL) – Linux ... and more! 3 © 2011 IBM Corporation
  • 4.
    IBM Content Analyticswith Enterprise Search Adoption of Apache Lucene to IBM Content Analytics with Enterprise Search The use of UIMA is existing since first release in 2005 of IBM OmniFind and later IBM Content Analytics, continued into today‘s IBM Content Analytics with Enterprise Search http://www-01.ibm.com/software/data/content-management/analytics/uima.html IBM‘s decision for the use of Lucene –Index is a common technology and better to improve –lower cost of maintenance –advantage in incremental indexing –extensibility 4 © 2011 IBM Corporation
  • 5.
    IBM Content Analyticswith Enterprise Search Adoption of Apache Lucene to IBM Content Analytics with Enterprise Search IBM is a very active contributor. Look for PMC members: –Michael McCandless; Shai Erera; Doron Cohen http://lucene.apache.org/who.html IBM extended Lucene based on our needs. Two examples already contributed to community : –Query Parser –Facets 5 © 2011 IBM Corporation
  • 6.
    IBM Content Analyticswith Enterprise Search Adoption of Apache Lucene to IBM Content Analytics with Enterprise Search On 13th December 2006, IBM and Yahoo! announced IBM OmniFind Yahoo! Edition, as “no-cost, entry level enterprise search product developed to help eliminate financial and technology barriers to intranet and Web search.” http://www-03.ibm.com/press/us/en/pressrelease/20767.wss This technology included Lucene as index technology and had full support by IBM – 45,000+ downloads from the website http://omnifind.ibm.yahoo.net – IBM support contracts for clients with “IBM Elite Support for OmniFind Yahoo Edition“ – Below 15 incidents regarding index technology Technology is seen as success for IBM 6 © 2011 IBM Corporation
  • 7.
    IBM Content Analyticswith Enterprise Search Content Analytics generates new insights and aggregates key findings gathered from large data volumes in a visualized form Extracted Concept Claimant: Soft Tissue Injury Automatic Visualizing Person Injury Body Part Location Results of concept evaluation are displayed to the users Noun Verb Noun Phrase Prep Phrase Claus sprained his ankle on the step Analysed documents with identified concepts Sources of Information Internal (ECM, Files, DBMS, etc.) and External (Social, News, etc.) 7 © 2011 IBM Corporation
  • 8.
    IBM Content Analyticswith Enterprise Search Rapid Insights from Automotive Complaints We will be using publically available data from the National Highway Traffic Safety Agency (NHTSA) to demonstrate how IBM Content Analytics can be used to identify problems with automobiles. NHTSA receives various reports about malfunctions, accidents, and other issues with automobiles from dealerships, repair facilities, and from the general public. NHTSA publishes the data at http://www.nhtsa.gov. For this demo we have created a collection from the NHTSA “complaints” data spanning several years ending in early 2010. We will show how this and similar data can be analyzed to arrive at rapid insights not possible by manually reading through the complaint records. 8 © 2011 IBM Corporation
  • 9.
    IBM Content Analyticswith Enterprise Search See Content Analytics live! 9 © 2011 IBM Corporation
  • 10.
    IBM Content Analyticswith Enterprise Search See Content Analytics live! 10 © 2011 IBM Corporation
  • 11.
    IBM Content Analyticswith Enterprise Search Be enlightened ! 11 © 2011 IBM Corporation
  • 12.
    LIGHTNING TALKS Powered byLucene: IBM Content Analytics with Enterprise Search Wolfgang Jung Barcelona, 19th October 2011 © 2011 IBM Corporation