SlideShare a Scribd company logo
1 of 28
Download to read offline
Text Analytics End to End
Gary Robinson, IBM

© 2013 IBM Corporation
Scenario
Source and analyze blogs and news articles about a popular
brand or service across various social media sites
−

“IBM Watson”

−

Analytics include
Watson applications by industry and within an industry
Watson association with Jeopardy!
Simple sentiment/tone scoring
Scenario
Process
−

Collect data

−

Transform and subset

−

Develop and test a Text Analytics extractor using Eclipse

−

Publish and deploy the extractor to a BigInsights cluster.

−

Apply the Text Analytics extractor from BigSheets

−

Analyze and chart the results
Text Analytics
Identify and extract structured information from unstructured
and semi-structured text
To enable analytics
−

chart, report, join, aggregate, slice, dice and drill, model, mine…
Text Analytics
80% of the world’s data is unstructured or semi-structured text
Social media is rife with information about products and services
−

Discussions, blogs, tweets…

Applications often lock up useful information in blobs, description fields and
semi-structured records that are difficult or impossible to open up for
analysis
−

Call center records, log files…

How do you get a metrics based understanding of facts from unstructured
text?

I had an iphone, but it's dead
I had an iphone, but it's dead
@JoaoVianaa.
@JoaoVianaa.
(I've no idea where it's) !Want a
(I've no idea where it's) !Want a
blackberry now !!!
blackberry now !!!
@rakonturmiami im moving to miami
@rakonturmiami im moving to miami
in 3 months.
in 3 the new
i look foward to months. lifestyle
i look foward to the new lifestyle
I'm at Mickey's Irish Pub Downtown (206 3rd St, Court Ave, Des
I'm at Mickey's Irish 2 others http://4sq.com/gbsaYR Ave, Des
Moines) w/ Pub Downtown (206 3rd St, Court
Moines) w/ 2 others http://4sq.com/gbsaYR
BigInsights & Streams Text Analytics
High Performance rule based Information Extraction Engine
Highly scalable solution for at-rest and in-motion analytics
Pre-built extractors, and toolkit to build custom Extractors
Declarative Information Extraction (IE) system based on an
algebraic framework
Sophisticated tooling to help build, test, and refine rules
Developed at IBM Research since 2004
Embedded in several IBM products
Applications of Text analytics
Broad range of applications in many industries
−

CRM Analytics - Voice of customer, Product and Services
gap analysis, Customer churn

−

Social Media Analytics - Purchase intent, Customer churn
prediction, Reputational Risk

−

Digital Piracy - illegal broadcast of streaming and
video content

−

Log Analytics - Failure analysis and root cause identification,
Availability assurance

−

Regulatory Compliance - Data Redaction to Identify and
protect sensitive information
Deploy to Streams and BigInsights
AQL Language

Extractor
Extractor
Optimizer
Text Analytics
Text Analytics
Module
Module

Compiled
Plan

Streams

Input
Documents

BigInsights

Cluster

Extracted
Information
Downstream
Integration
And processing
Developing an Extractor

Label examples of interesting text

Label clues or elements within or
around the examples

Bottom up

Create or refine AQL to
extract basic features

Create or refine AQL to
Generate candidate concepts

Create or refine AQL to
Filter and Consolidate

Top Down

Select documents to work with
AQL
Annotation Query Language
− SQL like
Familiar syntax and concepts make it easier to learn and
understand
−

Declarative
Describes what computation should be performed and not
how to compute it
Separates semantics from implementation

−

Compiled and optimized for execution
Text Analytics Module (TAM) is deployed to the cluster for
execution by the Text Analytics run time
AQL
Fundamental concepts
−

Views
Created with Select or Extract expressions
Are not materialized unless explicitly requested using
‘output view <name>’ or ‘select into’
The ‘Document’ view identifies the set of input documents
−

select… from Document d
AQL
Fundamental concepts
−

Extract expressions
Typically used to extract basic features
Extract from columns in other views including the text
column in the Document view
Basic capabilities include extraction using regex, dictionary
and sequence
Other operations include splits, blocks and parts of speech
AQL
Fundamental concepts
−

Select expressions
Typically used to combine, aggregate and filter extracted
fields to create candidate concepts and final values
Select existing columns and extract from columns
−

Specified using <from list>

Rich set of operators and clauses
−

where, consolidate, group by, order by, and limit clauses are
optional
Select vs Extract
Which do I use when?
−

Both have a <select list>

−

But you can only specify an <extract specification> in an extract expression

−

Both have a <from list>

−

You can apply simple predicate based filters in the <having clause> of an extract
expression or in the <where clause> of a select expression

−

But you can only use predicates to combine rows from views – join – using the <where
clause> of a select expression

−

You can apply a <consolidation policy> or a <limit> in either an extract or a select
expression

−

But you can only <group> and <order> using a select expression
extract

select

<select list>,

<select list>

<extraction specification>
from <from list>

from <from list>

[having <having clause>]

[where <where clause>]

[consolidate on <column> [using '<policy>' [with priority
from <column> [priority order]]]]

[consolidate on <column> [using '<policy>' [with priority
from <column> [priority order]]]]
[group by <group by list>]
[order by <order by list>]

[limit <maximum number of output tuples for each
document>];

[limit <maximum number of output tuples for each
document>];
Select vs Extract
If you need to extract – use an extract expression
If you need to group, order or join – use a select expression
extract

select

<select list>,

<select list>

<extraction specification>
from <from list>

from <from list>

[having <having clause>]

[where <where clause>]

[consolidate on <column> [using
'<policy>' [with priority from <column>
[priority order]]]]

[consolidate on <column> [using
'<policy>' [with priority from <column>
[priority order]]]]
[group by <group by list>]
[order by <order by list>]

[limit <maximum number of output
tuples for each document>];

[limit <maximum number of output
tuples for each document>];
Scenario
Acquire the Data

Source social media data from BoardReader, an
IBM business partner with a commercial offering
that provides a searchable archive of various web
based data sources
BoardReader App
Transform and Export using BigSheets

Extract a subset of social media data from a
BigSheets workbook populated with data from IBM’s
sample Boardreader application.

Inside a BigSheets workbook,
press the 'Export As' button
and export the workbook
using the aspects specified to
DFS
Download this file to the local
FS of the eclipse development
environment to use as sample
input data for text analytics
development
Building a Text Analytics Extractor
Working in the Eclipse environment you will build an
Extraction Plan and use the Extraction tasks Workflow to
develop and test a simple extractor
Building a Text Analytics Extractor
Using the Eclipse tools
Developing Simple AQL
Simple dictionary based extraction
Testing the Extractor
Run from the workflow and examine the results
Publish the Extractor
Configure and Deploy Application
Back in the BigInsights Web Console the extractor is
available to be deployed
Run the Extractor from BigSheets
Additional Analytics
Develop and deploy additional extractors
−

Understand Watson applications in Healthcare

−

Understand the link with Jeopardy!

−

Understand the tone/sentiment
Additional Resources
Big Data Hub
http://www.ibmbigdatahub.com/

DeveloperWorks
http://www.ibm.com/developerworks/bigdata/

Big Data and Analytics on YouTube
http://www.youtube.com/ibmbigdata

Big Data University
http://www.bigdatauniversity.com/

More Related Content

What's hot

Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsSSaudia
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitecturePalani Kumar
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Simplilearn
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycleManoj Mishra
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data AnalyticsUtkarsh Sharma
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data miningkavitha muneeshwaran
 

What's hot (20)

Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Data analytics
Data analyticsData analytics
Data analytics
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_Architecture
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Data analytics vs. Data analysis
Data analytics vs. Data analysisData analytics vs. Data analysis
Data analytics vs. Data analysis
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data mining
Data miningData mining
Data mining
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 

Similar to Text Analytics

Cognos framework manager
Cognos framework managerCognos framework manager
Cognos framework managermaxonlinetr
 
Saphelp erp2004 en_9d_76563cc368b60fe10000000a114084_content
Saphelp erp2004 en_9d_76563cc368b60fe10000000a114084_contentSaphelp erp2004 en_9d_76563cc368b60fe10000000a114084_content
Saphelp erp2004 en_9d_76563cc368b60fe10000000a114084_contentmgassperera
 
How to Apply Your Taxonomy to Your Content Automatically
How to Apply Your Taxonomy to Your Content AutomaticallyHow to Apply Your Taxonomy to Your Content Automatically
How to Apply Your Taxonomy to Your Content AutomaticallyAccess Innovations, Inc.
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar SlidesSumo Logic
 
SAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptx
SAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptxSAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptx
SAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptxJakeariesMacarayo
 
2016-10 Using the Copy & Move webpart
2016-10 Using the Copy & Move webpart2016-10 Using the Copy & Move webpart
2016-10 Using the Copy & Move webpartAscendore Limited
 
Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage reshmamajji123
 
Oracle BI Publsiher Using Data Template
Oracle BI Publsiher Using Data TemplateOracle BI Publsiher Using Data Template
Oracle BI Publsiher Using Data TemplateEdi Yanto
 
Web Surveys Builder Quick Reference manual
Web Surveys Builder Quick Reference manualWeb Surveys Builder Quick Reference manual
Web Surveys Builder Quick Reference manualSurvey Builder
 
MicroStrategy - Effective Business Dashboards
MicroStrategy - Effective Business DashboardsMicroStrategy - Effective Business Dashboards
MicroStrategy - Effective Business DashboardsMicroStrategy Nederland
 
Setting Up Sumo Logic - Apr 2017
Setting Up Sumo Logic - Apr 2017Setting Up Sumo Logic - Apr 2017
Setting Up Sumo Logic - Apr 2017Sumo Logic
 
Online Survey Software Reference Guide
Online Survey Software Reference GuideOnline Survey Software Reference Guide
Online Survey Software Reference GuidePaddu Govindaraj
 
Product content management_process_samples
Product content management_process_samplesProduct content management_process_samples
Product content management_process_samplesIndra kumar
 
Beginners guide to_optimizer
Beginners guide to_optimizerBeginners guide to_optimizer
Beginners guide to_optimizerMaria Colgan
 
Setting up Sumo Logic - June 2017
Setting up Sumo Logic - June 2017Setting up Sumo Logic - June 2017
Setting up Sumo Logic - June 2017Sumo Logic
 
ELW_Symantec_EMM SymHelp_Kavitha_edited
ELW_Symantec_EMM SymHelp_Kavitha_editedELW_Symantec_EMM SymHelp_Kavitha_edited
ELW_Symantec_EMM SymHelp_Kavitha_editedElizabeth Wilcox
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningNandakumar P
 
Setting Up Sumo Logic - Sep 2017
Setting Up Sumo Logic -  Sep 2017Setting Up Sumo Logic -  Sep 2017
Setting Up Sumo Logic - Sep 2017mariosany
 

Similar to Text Analytics (20)

IBM Operations Analytics For z Systems V2.2 - Client Short Pres
IBM Operations Analytics For z Systems V2.2 - Client Short PresIBM Operations Analytics For z Systems V2.2 - Client Short Pres
IBM Operations Analytics For z Systems V2.2 - Client Short Pres
 
Customized Retail audit
Customized Retail auditCustomized Retail audit
Customized Retail audit
 
Cognos framework manager
Cognos framework managerCognos framework manager
Cognos framework manager
 
Saphelp erp2004 en_9d_76563cc368b60fe10000000a114084_content
Saphelp erp2004 en_9d_76563cc368b60fe10000000a114084_contentSaphelp erp2004 en_9d_76563cc368b60fe10000000a114084_content
Saphelp erp2004 en_9d_76563cc368b60fe10000000a114084_content
 
How to Apply Your Taxonomy to Your Content Automatically
How to Apply Your Taxonomy to Your Content AutomaticallyHow to Apply Your Taxonomy to Your Content Automatically
How to Apply Your Taxonomy to Your Content Automatically
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar Slides
 
SAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptx
SAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptxSAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptx
SAD REPORTING GROUP 2BCFGGGGHHHJJJJ.pptx
 
2016-10 Using the Copy & Move webpart
2016-10 Using the Copy & Move webpart2016-10 Using the Copy & Move webpart
2016-10 Using the Copy & Move webpart
 
Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage Business analytics and it's tools and competitive advantage
Business analytics and it's tools and competitive advantage
 
Oracle BI Publsiher Using Data Template
Oracle BI Publsiher Using Data TemplateOracle BI Publsiher Using Data Template
Oracle BI Publsiher Using Data Template
 
Web Surveys Builder Quick Reference manual
Web Surveys Builder Quick Reference manualWeb Surveys Builder Quick Reference manual
Web Surveys Builder Quick Reference manual
 
MicroStrategy - Effective Business Dashboards
MicroStrategy - Effective Business DashboardsMicroStrategy - Effective Business Dashboards
MicroStrategy - Effective Business Dashboards
 
Setting Up Sumo Logic - Apr 2017
Setting Up Sumo Logic - Apr 2017Setting Up Sumo Logic - Apr 2017
Setting Up Sumo Logic - Apr 2017
 
Online Survey Software Reference Guide
Online Survey Software Reference GuideOnline Survey Software Reference Guide
Online Survey Software Reference Guide
 
Product content management_process_samples
Product content management_process_samplesProduct content management_process_samples
Product content management_process_samples
 
Beginners guide to_optimizer
Beginners guide to_optimizerBeginners guide to_optimizer
Beginners guide to_optimizer
 
Setting up Sumo Logic - June 2017
Setting up Sumo Logic - June 2017Setting up Sumo Logic - June 2017
Setting up Sumo Logic - June 2017
 
ELW_Symantec_EMM SymHelp_Kavitha_edited
ELW_Symantec_EMM SymHelp_Kavitha_editedELW_Symantec_EMM SymHelp_Kavitha_edited
ELW_Symantec_EMM SymHelp_Kavitha_edited
 
UNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data MiningUNIT - 1 Part 2: Data Warehousing and Data Mining
UNIT - 1 Part 2: Data Warehousing and Data Mining
 
Setting Up Sumo Logic - Sep 2017
Setting Up Sumo Logic -  Sep 2017Setting Up Sumo Logic -  Sep 2017
Setting Up Sumo Logic - Sep 2017
 

More from Nicolas Morales

Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?Nicolas Morales
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixNicolas Morales
 
InfoSphere BigInsights for Hadoop @ IBM Insight 2014
InfoSphere BigInsights for Hadoop @ IBM Insight 2014InfoSphere BigInsights for Hadoop @ IBM Insight 2014
InfoSphere BigInsights for Hadoop @ IBM Insight 2014Nicolas Morales
 
IBM Big SQL @ Insight 2014
IBM Big SQL @ Insight 2014IBM Big SQL @ Insight 2014
IBM Big SQL @ Insight 2014Nicolas Morales
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easy60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easyNicolas Morales
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineNicolas Morales
 
Big SQL 3.0 - Toronto Meetup -- May 2014
Big SQL 3.0 - Toronto Meetup -- May 2014Big SQL 3.0 - Toronto Meetup -- May 2014
Big SQL 3.0 - Toronto Meetup -- May 2014Nicolas Morales
 
SQL-on-Hadoop without compromise: Big SQL 3.0
SQL-on-Hadoop without compromise: Big SQL 3.0SQL-on-Hadoop without compromise: Big SQL 3.0
SQL-on-Hadoop without compromise: Big SQL 3.0Nicolas Morales
 
Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Nicolas Morales
 
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Nicolas Morales
 
Social Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data TechnologiesSocial Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data TechnologiesNicolas Morales
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big DataNicolas Morales
 

More from Nicolas Morales (14)

Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
Benchmarking SQL-on-Hadoop Systems: TPC or not TPC?
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 
InfoSphere BigInsights for Hadoop @ IBM Insight 2014
InfoSphere BigInsights for Hadoop @ IBM Insight 2014InfoSphere BigInsights for Hadoop @ IBM Insight 2014
InfoSphere BigInsights for Hadoop @ IBM Insight 2014
 
IBM Big SQL @ Insight 2014
IBM Big SQL @ Insight 2014IBM Big SQL @ Insight 2014
IBM Big SQL @ Insight 2014
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easy60 minutes in the cloud: Predictive analytics made easy
60 minutes in the cloud: Predictive analytics made easy
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop Engine
 
Big SQL 3.0 - Toronto Meetup -- May 2014
Big SQL 3.0 - Toronto Meetup -- May 2014Big SQL 3.0 - Toronto Meetup -- May 2014
Big SQL 3.0 - Toronto Meetup -- May 2014
 
SQL-on-Hadoop without compromise: Big SQL 3.0
SQL-on-Hadoop without compromise: Big SQL 3.0SQL-on-Hadoop without compromise: Big SQL 3.0
SQL-on-Hadoop without compromise: Big SQL 3.0
 
Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0
 
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
 
Social Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data TechnologiesSocial Data Analytics using IBM Big Data Technologies
Social Data Analytics using IBM Big Data Technologies
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big Data
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Text Analytics

  • 1. Text Analytics End to End Gary Robinson, IBM © 2013 IBM Corporation
  • 2. Scenario Source and analyze blogs and news articles about a popular brand or service across various social media sites − “IBM Watson” − Analytics include Watson applications by industry and within an industry Watson association with Jeopardy! Simple sentiment/tone scoring
  • 3. Scenario Process − Collect data − Transform and subset − Develop and test a Text Analytics extractor using Eclipse − Publish and deploy the extractor to a BigInsights cluster. − Apply the Text Analytics extractor from BigSheets − Analyze and chart the results
  • 4. Text Analytics Identify and extract structured information from unstructured and semi-structured text To enable analytics − chart, report, join, aggregate, slice, dice and drill, model, mine…
  • 5. Text Analytics 80% of the world’s data is unstructured or semi-structured text Social media is rife with information about products and services − Discussions, blogs, tweets… Applications often lock up useful information in blobs, description fields and semi-structured records that are difficult or impossible to open up for analysis − Call center records, log files… How do you get a metrics based understanding of facts from unstructured text? I had an iphone, but it's dead I had an iphone, but it's dead @JoaoVianaa. @JoaoVianaa. (I've no idea where it's) !Want a (I've no idea where it's) !Want a blackberry now !!! blackberry now !!! @rakonturmiami im moving to miami @rakonturmiami im moving to miami in 3 months. in 3 the new i look foward to months. lifestyle i look foward to the new lifestyle I'm at Mickey's Irish Pub Downtown (206 3rd St, Court Ave, Des I'm at Mickey's Irish 2 others http://4sq.com/gbsaYR Ave, Des Moines) w/ Pub Downtown (206 3rd St, Court Moines) w/ 2 others http://4sq.com/gbsaYR
  • 6. BigInsights & Streams Text Analytics High Performance rule based Information Extraction Engine Highly scalable solution for at-rest and in-motion analytics Pre-built extractors, and toolkit to build custom Extractors Declarative Information Extraction (IE) system based on an algebraic framework Sophisticated tooling to help build, test, and refine rules Developed at IBM Research since 2004 Embedded in several IBM products
  • 7. Applications of Text analytics Broad range of applications in many industries − CRM Analytics - Voice of customer, Product and Services gap analysis, Customer churn − Social Media Analytics - Purchase intent, Customer churn prediction, Reputational Risk − Digital Piracy - illegal broadcast of streaming and video content − Log Analytics - Failure analysis and root cause identification, Availability assurance − Regulatory Compliance - Data Redaction to Identify and protect sensitive information
  • 8. Deploy to Streams and BigInsights AQL Language Extractor Extractor Optimizer Text Analytics Text Analytics Module Module Compiled Plan Streams Input Documents BigInsights Cluster Extracted Information Downstream Integration And processing
  • 9. Developing an Extractor Label examples of interesting text Label clues or elements within or around the examples Bottom up Create or refine AQL to extract basic features Create or refine AQL to Generate candidate concepts Create or refine AQL to Filter and Consolidate Top Down Select documents to work with
  • 10. AQL Annotation Query Language − SQL like Familiar syntax and concepts make it easier to learn and understand − Declarative Describes what computation should be performed and not how to compute it Separates semantics from implementation − Compiled and optimized for execution Text Analytics Module (TAM) is deployed to the cluster for execution by the Text Analytics run time
  • 11. AQL Fundamental concepts − Views Created with Select or Extract expressions Are not materialized unless explicitly requested using ‘output view <name>’ or ‘select into’ The ‘Document’ view identifies the set of input documents − select… from Document d
  • 12. AQL Fundamental concepts − Extract expressions Typically used to extract basic features Extract from columns in other views including the text column in the Document view Basic capabilities include extraction using regex, dictionary and sequence Other operations include splits, blocks and parts of speech
  • 13. AQL Fundamental concepts − Select expressions Typically used to combine, aggregate and filter extracted fields to create candidate concepts and final values Select existing columns and extract from columns − Specified using <from list> Rich set of operators and clauses − where, consolidate, group by, order by, and limit clauses are optional
  • 14. Select vs Extract Which do I use when? − Both have a <select list> − But you can only specify an <extract specification> in an extract expression − Both have a <from list> − You can apply simple predicate based filters in the <having clause> of an extract expression or in the <where clause> of a select expression − But you can only use predicates to combine rows from views – join – using the <where clause> of a select expression − You can apply a <consolidation policy> or a <limit> in either an extract or a select expression − But you can only <group> and <order> using a select expression extract select <select list>, <select list> <extraction specification> from <from list> from <from list> [having <having clause>] [where <where clause>] [consolidate on <column> [using '<policy>' [with priority from <column> [priority order]]]] [consolidate on <column> [using '<policy>' [with priority from <column> [priority order]]]] [group by <group by list>] [order by <order by list>] [limit <maximum number of output tuples for each document>]; [limit <maximum number of output tuples for each document>];
  • 15. Select vs Extract If you need to extract – use an extract expression If you need to group, order or join – use a select expression extract select <select list>, <select list> <extraction specification> from <from list> from <from list> [having <having clause>] [where <where clause>] [consolidate on <column> [using '<policy>' [with priority from <column> [priority order]]]] [consolidate on <column> [using '<policy>' [with priority from <column> [priority order]]]] [group by <group by list>] [order by <order by list>] [limit <maximum number of output tuples for each document>]; [limit <maximum number of output tuples for each document>];
  • 17. Acquire the Data Source social media data from BoardReader, an IBM business partner with a commercial offering that provides a searchable archive of various web based data sources
  • 19. Transform and Export using BigSheets Extract a subset of social media data from a BigSheets workbook populated with data from IBM’s sample Boardreader application. Inside a BigSheets workbook, press the 'Export As' button and export the workbook using the aspects specified to DFS Download this file to the local FS of the eclipse development environment to use as sample input data for text analytics development
  • 20. Building a Text Analytics Extractor Working in the Eclipse environment you will build an Extraction Plan and use the Extraction tasks Workflow to develop and test a simple extractor
  • 21. Building a Text Analytics Extractor Using the Eclipse tools
  • 22. Developing Simple AQL Simple dictionary based extraction
  • 23. Testing the Extractor Run from the workflow and examine the results
  • 25. Configure and Deploy Application Back in the BigInsights Web Console the extractor is available to be deployed
  • 26. Run the Extractor from BigSheets
  • 27. Additional Analytics Develop and deploy additional extractors − Understand Watson applications in Healthcare − Understand the link with Jeopardy! − Understand the tone/sentiment
  • 28. Additional Resources Big Data Hub http://www.ibmbigdatahub.com/ DeveloperWorks http://www.ibm.com/developerworks/bigdata/ Big Data and Analytics on YouTube http://www.youtube.com/ibmbigdata Big Data University http://www.bigdatauniversity.com/