Submit Search
Upload
Mark logic text analytics
•
0 likes
•
64 views
Fernando Mesa
Follow
Introduction to MarkLogic and Text Analytics
Read less
Read more
Technology
Report
Share
Report
Share
1 of 31
Download now
Download to read offline
Recommended
Understanding Data
Understanding Data
Kingsley Uyi Idehen
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012
Matthew Petrillo
Metadata Use Cases You Can Use
Metadata Use Cases You Can Use
dmurph4
Hol311 Getting%20 Started%20with%20the%20 Business%20 Data%20 Catalog%20in%20...
Hol311 Getting%20 Started%20with%20the%20 Business%20 Data%20 Catalog%20in%20...
LiquidHub
CWIN 17 / sessions data vault modeling - f2-f - nishat gupta
CWIN 17 / sessions data vault modeling - f2-f - nishat gupta
Capgemini
Getting It System Toolkit: Enhancing User Experience & Customizing a Future f...
Getting It System Toolkit: Enhancing User Experience & Customizing a Future f...
Tim Bowersox
LOD Cloud Knowledge Graph vs COVID-19
LOD Cloud Knowledge Graph vs COVID-19
Kingsley Uyi Idehen
Mohamed Adel CV
Mohamed Adel CV
Mohamed Adel, OBIEE,OBIA,IBM COGNOS,SAP BO,DWH,ETL
Recommended
Understanding Data
Understanding Data
Kingsley Uyi Idehen
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012
Matthew Petrillo
Metadata Use Cases You Can Use
Metadata Use Cases You Can Use
dmurph4
Hol311 Getting%20 Started%20with%20the%20 Business%20 Data%20 Catalog%20in%20...
Hol311 Getting%20 Started%20with%20the%20 Business%20 Data%20 Catalog%20in%20...
LiquidHub
CWIN 17 / sessions data vault modeling - f2-f - nishat gupta
CWIN 17 / sessions data vault modeling - f2-f - nishat gupta
Capgemini
Getting It System Toolkit: Enhancing User Experience & Customizing a Future f...
Getting It System Toolkit: Enhancing User Experience & Customizing a Future f...
Tim Bowersox
LOD Cloud Knowledge Graph vs COVID-19
LOD Cloud Knowledge Graph vs COVID-19
Kingsley Uyi Idehen
Mohamed Adel CV
Mohamed Adel CV
Mohamed Adel, OBIEE,OBIA,IBM COGNOS,SAP BO,DWH,ETL
MarkLogic Semantic use cases
MarkLogic Semantic use cases
Fernando Mesa
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Cambridge Semantics
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Denodo
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
Cindy Irby
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
DataWorks Summit
Tapping Into A Massively Interconnected Knowledge Network
Tapping Into A Massively Interconnected Knowledge Network
BlueFish
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
[db tech showcase Tokyo 2018] #dbts2018 #B36 『Design Your Databases straight ...
[db tech showcase Tokyo 2018] #dbts2018 #B36 『Design Your Databases straight ...
Insight Technology, Inc.
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
Ray Février
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Sandesh Rao
Cranking It Up - SuiteWorld 2017
Cranking It Up - SuiteWorld 2017
Diego Cardozo
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
Inside Analysis
Robert Parkin Portfolio
Robert Parkin Portfolio
rsparkin
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Karen Thompson
Präsentation share point
Präsentation share point
coda-efurt
Interior Designs
Interior Designs
arun kumar
Sharepoint Architecture
Sharepoint Architecture
arun kumar
Microsoft PPT_Sharepoint_introduction
Microsoft PPT_Sharepoint_introduction
Dipti Bohra
Mahendrababu N
Mahendrababu N
suresh babu
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information Access
Inside Analysis
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Rustici Software
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
Samir Dash
More Related Content
Similar to Mark logic text analytics
MarkLogic Semantic use cases
MarkLogic Semantic use cases
Fernando Mesa
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Cambridge Semantics
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Denodo
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
Cindy Irby
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
DataWorks Summit
Tapping Into A Massively Interconnected Knowledge Network
Tapping Into A Massively Interconnected Knowledge Network
BlueFish
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
[db tech showcase Tokyo 2018] #dbts2018 #B36 『Design Your Databases straight ...
[db tech showcase Tokyo 2018] #dbts2018 #B36 『Design Your Databases straight ...
Insight Technology, Inc.
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
Ray Février
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Sandesh Rao
Cranking It Up - SuiteWorld 2017
Cranking It Up - SuiteWorld 2017
Diego Cardozo
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
Inside Analysis
Robert Parkin Portfolio
Robert Parkin Portfolio
rsparkin
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Karen Thompson
Präsentation share point
Präsentation share point
coda-efurt
Interior Designs
Interior Designs
arun kumar
Sharepoint Architecture
Sharepoint Architecture
arun kumar
Microsoft PPT_Sharepoint_introduction
Microsoft PPT_Sharepoint_introduction
Dipti Bohra
Mahendrababu N
Mahendrababu N
suresh babu
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information Access
Inside Analysis
Similar to Mark logic text analytics
(20)
MarkLogic Semantic use cases
MarkLogic Semantic use cases
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Anzo Smart Data Lake 4.0 - a Data Lake Platform for the Enterprise Informatio...
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
Tapping Into A Massively Interconnected Knowledge Network
Tapping Into A Massively Interconnected Knowledge Network
Insights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
[db tech showcase Tokyo 2018] #dbts2018 #B36 『Design Your Databases straight ...
[db tech showcase Tokyo 2018] #dbts2018 #B36 『Design Your Databases straight ...
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Introduction to AutoML and Data Science using the Oracle Autonomous Database ...
Cranking It Up - SuiteWorld 2017
Cranking It Up - SuiteWorld 2017
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
Robert Parkin Portfolio
Robert Parkin Portfolio
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Präsentation share point
Präsentation share point
Interior Designs
Interior Designs
Sharepoint Architecture
Sharepoint Architecture
Microsoft PPT_Sharepoint_introduction
Microsoft PPT_Sharepoint_introduction
Mahendrababu N
Mahendrababu N
Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information Access
Recently uploaded
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Rustici Software
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
Samir Dash
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
WSO2
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Edi Saputra
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
rightmanforbloodline
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
danishmna97
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
MIND CTI
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Zilliz
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
WSO2
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
Safe Software
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
WSO2
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
VictorSzoltysek
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
AnitaRaj43
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
IES VE
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Dropbox
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
WSO2
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
MarkSteadman7
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
Kumar Satyam
Recently uploaded
(20)
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
Mark logic text analytics
1.
Slide 1 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 1 Evolving MyBuzzMetrics with Text Analytics September 2012 Eric Austvold – Insights Executive Fernando Mesa – WW Director of Enterprise Solution Pete Aven – Systems Engineer
2.
Slide 2 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 2 Agenda § Introductions § NM Incite goals for text analytics § MarkLogic evolving MyBuzzMetrics with Text Analytics § Entity Extraction § Topic Discovery / Theme extraction § Data Faceting § Trend spotting § Visualization § Use Cases and Demos § Next Steps
3.
Slide 3 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 3 Goals for Text Analytics § What’s your goal? § What are your clients asking you for? § How do you want to service your internal clients? Analysts, researchers, account managers? § How do you want to service your external clients? Self service reporting? Ad-hoc analysis? Integration with their data? § How do you envision your new solution to complement other Nielsen services?
4.
Slide 4 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 4 Text Analytics - Evolution § Reliant on relational data structures that are challenging to manage, silos of data § Not indexed immediately, not possible to query in real time § New Parses = Re-ingestion § Re-ingestion = new schema design – creates delays § Not real time – difficult to determine buzz § Impossible at 30+ billion docs § Pre-processing required to handle batches of data § Extraction methods lose context and full perspective § Flexible – Built on an infrastructure that can integrate text mining output § Context Aware – Without schema redesigns, context of original document persists as text miners enrich that content, preserving relationships to the original data § Scales – Can accommodate real time ad-hoc queries and reports across a corpus of 30+ billion documents § Enrichment – a better method of leveraging text mining work Traditional Methods MarkLogic Enabled Methods
5.
Slide 5 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 5 The Parse § The Parse § Actor, Action, Object § Fact § Entity § Qualifier § Etc. § Basis Entity Enrichment § Open Enrichment Framework § Calais § Temis § Data Harmony § NetOwl What it means… We can integrate with all enrichment engines.
6.
Slide 6 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 6 The Platform § The Platform § Flexibility § Speed § Scale § Delivery of Insight What it means… Clients can rapidly deliver insight in real time to help users discover new insights.
7.
Slide 7Copyright ©
2009 Mark Logic Corporation. All rights reserved. MarkLogic and Text Analytics Web Services ETL Connector (*) Social Media Connector (*) RDBMS connector Search Unified Index For all data structuresTransactional Database Data Retrieval Repository Classification Concept Extraction Entity Enrichment Web Applications Decision Support APIs/Services Taxonomies App Server Third-party Partners Analytics Leverage value generated from text mining Generate Opinions (in the form of data)
8.
Slide 8 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 8 Traditional Enrichment = Extraction First Name Last Name Other Comments Chris Smith Data Chris bought an upgrade package for his black, 2011, Honda Pilot on 9/16. Car returned for service on 9/21. The bolt on the undercarriage cracked due to heat. He doesn’t think it’s the transmission however as ….. Actor Action Object Chris buy package Fact package-buy car-return bolt-cracked Entity Type Chris person Honda organization 9/16 date Qualifiers upgrade black More Parsing = More Tables/Rows = More Joins = Does Not Scale! And What About Context?
9.
Slide 9 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 9 Enrichment with MarkLogic <actor><person>Chris </person></actor><action>bought </action>an <qualifier>upgrade </qualifier><object>package</object> for his <qualifier>black </qualifier>, <qualifier>2011 </qualifier>, <organization>Honda </organization> Pilot on <date>9/16</date>. Car returned for service on <date> 9/21 </date>. The bolt on the undercarriage cracked due to heat. <person @name=“Chris”>He </person> doesn’t think it’s the transmission however as ….. Pepsi<name> </name><brand> </brand><drink> </drink> Markup Inline! Every Tag Becomes a Candidate For an Index! What it means… Enrichment persists context and scales without a schema redesign, saves time and resources as client needs evolve.
10.
Slide 10 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 10 Text Mining is part of the big picture Words and phrases ... Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) ... Structure Label Author Ing Comp ID Para Org Data/Metadata name:sorbitol date:2012-06-04 company:Roche Entities in Context ... diabetes, since the risk of blindness is very high in such patients... Geospatial <location> <lat>46.946584</lat> <lng>93.076172</lng> </location> Universal Index
11.
Slide 11 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 11 Demo
12.
Slide 12 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 12 Agenda § Introductions § Nielsen’s goals and challenges related to unstructured data § MarkLogic Beyond Big Data Search § Entity Extraction § Topic Discovery / Theme extraction § Data Faceting § Trend spotting § Visualization § Use Cases and Demos § Next Steps
13.
Slide 15 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 15 MarkLogic Analytics § Why Use MarkLogic Analytics? § Term list analytics § Range index analytics § Combining term lists and range indexes § Range index best practices & references
14.
Slide 16 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 16 MarkLogic Analytics – why use it? Applications increasingly combine structured and unstructured information (e.g., electronic healthcare records) Show me male patients that are under the age of 45 with an ADMITTING DIAGNOSIS that included Chest Pain, or with a HISTORY OF PRESENT ILLNESS including symptoms for Chest Pain, Shortness of Breath, or Dizziness. Additionally, identify patients within this population with regular alcohol consumption in the SOCIAL HISTORY, alcoholism in the FAMILY HISTORY, and one of the following 17 synonyms for stress diagnoses in the ASSESSMENT AND TREATMENT PLAN. Structured Unstructured/Contextual
15.
Slide 32 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 32 Agenda § Text Enrichment
16.
Slide 33 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 33 Text Enrichment – with Entities § Load, manipulate, query content as-is § … then enrich the content over time § Entity extraction § Specialized technology § Identifies people, places, things in free text § Entity extraction -> Entity enrichment § Entities are marked-up in-line § Gives you § More focused search (includes proximity, structure) § Analytics § Alerting
17.
Slide 34 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 34 Enrich Your Content … With Entities: Example <Article xmlns:e="http://marklogic.com/entity"> <title><e:person>John Louis</e:person></title> <acknowledgement><e:gpe>Wikipedia</e:gpe>, the free encyclopedia</acknowledgement> <section> <para>"Tiger" <e:person>John Louis</e:person> (born <e:date>14 June 1941</e:date>)[<refto ID="1">1</refto>] was an <e:gpe>England</e:gpe> international speedway rider who rode for <e:organization>Ipswich Witches</e:organization>. He is the father of <e:gpe>Great Britain</e:gpe> international <e:person>Chris Louis</e:person>. <e:person>John</e:person> rode a weslake for most of his career.</para> </section> <section> <title>Career history</title> <para><e:person>John</e:person> finished third in the 1975 Speedway World Championship and was part of the <e:organization>England Speedway World Cup</e:organization> winning teams of 1972, 1974 and 1975. He was also World Pairs Champion in 1976 with <e:person>Malcolm Simmons</e:person>. He also captained <e:gpe>Ipswich</e:gpe> when they were <e:nationality>British</e:nationality> Champions in 1976. <e:person>John</e:person> won the <e:nationality>British</e:nationality> Speedway Championship in 1975. He was also <e:organization>National League Riders</e:organization> champion in 1971 and <e:organization>British League Riders</e:organization> champion in 1979.</para> <para>He retired in 1984 and is now the promoter of <e:organization>Ipswich Witches</e:organization>.</para> </section>
18.
Slide 35 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 35 Entity Enrichment With MarkLogic Server 1. Rule Based using Built-in function § Can leverage a taxonomy for drive entity definition § Uses Content Processing Framework to Automate process 2. Statistical Analysis using built-in Entity Enricher § Licensed BASIS for enrichment § For automated entity enrichment 3. External Using Partner Network § Seamless integration using Open Enrichment Framework § Can use a combination of tools (Best of Bread) § Can leverage both internal and external Solution Three Approaches to Entity Enrichment
19.
Slide 36 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 36 Entity Enrichment: Built-in § Take an XML node, and markup entities in that node § Substitute $expr for each entity in $node § Use any style of markup using $expr plus these variables: § $cts:node § $cts:text § $cts:entity-type § Advantage: the most flexible § Choose your style of markup § Choose which parts you want to markup § Choose which entities you want to use/ignore cts:entity-highlight( $node as node(), $expr as item()* ) as node()
20.
Slide 37 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 37 Entity Enrichment: Built-in: Example(2) cts:entity-highlight( <a>John went to England</a>, <entity>{ element {$cts:entity-type} {$cts:text} } </entity> ) <a> <entity><PERSON>John</PERSON></entity> went to <entity><GPE>England</GPE></entity> </a>
21.
Slide 38 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 38 BASIS Enrichment: What Gets Tagged? With the built-in entity enrichment, you can tag: person organization location GPE (geopolitical entity) facility religion nationality credit card number email latitude/longitude money percent ID (personal ID number) phone number URL UTM date time
22.
Slide 39 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 39 Entity Enrichment Framework § You have a choice … § There are several Entity Extraction engines available § No engine is best-of-breed for all knowledge domains, all languages § The Open Enrichment Framework lets you choose an engine that suits your needs to extract more domain-specific entities and/or support additional languages § Pipelines available § Temis Luxid § Open Calais § Data Harmony § NetOWL § Add other pipelines yourself
23.
Slide 40 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 40 Agenda § Classification
24.
Slide 41 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 41 Classification With MarkLogic Server 1. Rule Based using Reverse Queries § Match documents against a pre-defined rule and automatically tag content § Can use both Forward and Reverse queries for sophisticated scenarios. We call it Match-making 2. Statistical Classification using built-in SVM Classifier 3. External Using Partner Network § Seamless integration using Open Enrichment Framework § Can use a combination of tools (Best of Bread) § Can leverage both internal and external Solution Three Approaches to Classification
25.
Slide 42 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 42 Agenda § Trend Spotting
26.
Slide 43 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 43 Trend Spotting With MarkLogic Server 1. Co-Occurrences with Frequency Rules § Spot trends in Business Entities and their relationship to other concepts as they bubble up and surface above the noise § Use Co-Occurrence Analytical Indexes paired with Alerting to signal trends and anomalies in real-time Analytics + Alerting
27.
Slide 44 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 44 Agenda § Other Text Analytics
28.
Slide 45 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 45 Additional Text Analytics 1. Linking of unstructured Information § CTS:Similar to find related pieces of information in unstructured documents § External Tools for finger-printing (find loose associations) 2. Query Expansion using Synonyms and Taxonomies § Narrow / Broaden Analytics § Parent / Child § Associative & Equivalent 3. Type-Ahead using Lexicons § Support for high-speed distinct values in entire database or in a segment
29.
Slide 46 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 46 Math and Statistical analytical functions 1. math:variance-p 2. cts:variance-p 3. math:variance 4. cts:variance 5. math:stddev-p 6. cts:stddev-p 7. math:stddev 8. cts:stddev 9. math:covariance-p 10. cts:covariance-p 11. math:covariance 12. cts:covariance 13. math:correlation 14. cts:correlation 15. math:linear-model 16. cts:linear-model 17. math:median 18. cts:median 19. math:percentile 20. cts:percentile 21. math:mode 22. math:rank 23. cts:rank 24. math:percent-rank 25. cts:percent-rank
30.
Slide 47 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 47 Next Steps
31.
Slide 48 Copyright
© 2012 MarkLogic® Corporation. All rights reserved.Slide 48 The Only Operational Database Technology for Mission-Critical Big Data Applications
Download now