It's increasingly clear that Big Data is not just about volume – but also the variety, complexity and velocity of enterprise information. Integrating data with insights from unstructured information such as documents, call logs, and web content is essential to driving sustainable business value. Aggregating and analyzing unstructured content is challenging because human expression is diverse, varies by location, and changes over time. To understand the causes of data trends, you need advanced text analytic capabilities. Furthermore, you need a system that provides direct, real-time access to discover hidden insights. In this session, you will learn how united information access (UIA) uniquely completes the picture by integrating Big Data directly with unstructured content and advanced text analytics, and making it directly accessible to business users.
3. Big Data vs. Extreme Information
Source: 'Big Data' Is Only the Beginning of Extreme Information Management, April 7, 2011, Gartner Group
4. Completing the Big Data Picture
Structured Unstructured Unstructured
Data Data Content
• Stored and/or sourced • Also known as • Any type of free-form
from relational databases Unstructured Data or text information
• “Normalized” so that Non-Relational Data • Documents (500+
each piece of data is • Contains tags or other formats); scanned
stored once markers to readily parse documents; email; web
• Organized in tables that data fields, etc. content; SharePoint;
are related to each other • Clickstream data, web knowledge bases; etc.
logs, etc.
5. Unstructured Content – Valuable Opportunity
57% of data and IT managers Don’t know/Not sure 6%
Not Important Extremely
surveyed say unstructured at this time 8% Important 18%
content is extremely or very
important to their Somewhat Very Important
39%
businesses… Important 30%
More resources for
Don’t know/ Unstructured Content 13%
Not sure 20%
Equal resources
for Unstructured …yet 52% also say more
& Structured 14%
Other 1%
resources are committed to
More resources structured data.
for Structured
Data 52%
Source: Unisphere Research (June 2011)
6. Unified Information Access (UIA)
• Query any information – structured or unstructured –
with the precision of SQL and the fuzziness of search
• Build applications and rapidly create value by avoiding
the typical risks presented by information silos
• Use text analytics, language modeling and machine
learning components to enrich and link information
together across silos
• Allow users to consume information the way they want
to consume it, with search or Business Intelligence (BI)
“[Attivio is] on the forefront of a shift away from
reliance on relational databases… “ --Nick Patience
7. Unified Information Access – Enabling Variety
INFORMATION
Enrich unstructured content and link it to structured data
to find “WHAT” and “WHY”
8. Unified Information Access – Conceptual View
1
John Smith <jsmith@customer.com>
8
New engagement
I am delighted that we were able to
move forward … your service desk has
been wonderful and helped resolve…
Analyze & enrich Retain & respect
unstructured data normalized structure
9. SEARCH & UIA PACKAGED ACTIVE AD HOC QUERY BI &
DISCOVERY APPLICATIONS APPLICATIONS DASHBOARDS TOOLS REPORTING
ATTIVIO
ACTIVE INTELLIGENCE ENGINE (AIE) 3.0
WEB SERVER FILE SERVER EMAIL SERVER CONTENT MGMT HADOOP++ CRM, ERP ADBMS/EDW
10. SEARCH & UIA PACKAGED ACTIVE AD HOC QUERY BI &
DISCOVERY APPLICATIONS APPLICATIONS DASHBOARDS TOOLS REPORTING
SEARCH API ANSI-92 SQL
JAVA, WSDL, REST ODBC, JDBC
QUERY & RESPONSE WORKFLOWS
PREDICTIVE AUTOCOMPLETE, FUZZY MATCHING, FACET FINDER™, ACTIVE SECURITY,
BEHAVIORAL ANALYTICS*, CONTENT SPOTLIGHTING, ALERTS
UNIVERSAL ENGINE
INCREMENTAL REAL-TIME INDEXING, QUERY RESOLUTION, JOIN/GRAPH, RELEVANCY, CONTENT STORE
INGESTION & TRIGGER WORKFLOWS
LANGUAGE PROCESSING, TEXT EXTRACTION, TEXT ANALYTICS, DATA MINING,
CLASSIFICATION*, ONTOLOGY*
CONTENT API CONNECTORS
JAVA, WSDL, REST
WEB SERVER FILE SERVER EMAIL SERVER CONTENT MGMT HADOOP++ CRM, ERP ADBMS/EDW
12. AIE – ANSI-92 SQL with ODBC, JDBC
• Use a wide array of
existing BI products
with AIE
• Easily integrate AIE
with existing BI/DW
infrastructure
• AIE ODBC 3.5
compliant driver
included
13.
14.
15.
16. AIE – Triples & Graphs
<triple id="1">
<entityId>P01</entityId>
<name>Joe</name>
<is>person</is>
...
</triple>
All people who live in a college town:
JOIN(is:person, INNER(JOIN(is:city, INNER(is:college, on="name=locatedIn")),
on="livesIn=name"))
All people who live in a college town with “happy students”:
JOIN(is:person, INNER(JOIN(is:city, INNER(JOIN(is:college,
INNER(AND(table:news, NEAR(happiest, students)), ON="name=college")),
ON="name=locatedIn")), ON="livesIn=name"))
17. SEARCH & ACTIVE PACKAGED AD HOC QUERY BI &
DISCOVERY DASHBOARDS APPLICATIONS TOOLS REPORTING
ATTIVIO ADBMS
ACTIVE INTELLIGENCE ENGINE (AIE) 3.0 BIG DATA OR ANALYTIC PLATFORM
WEB SERVER FILE SERVER EMAIL SERVER HADOOP++ STRUCTURED DATA
21. SEARCH & UIA PACKAGED ACTIVE AD HOC QUERY BI &
DISCOVERY APPLICATIONS APPLICATIONS DASHBOARDS TOOLS REPORTING
ATTIVIO
ACTIVE INTELLIGENCE ENGINE (AIE) 3.0
WEB SERVER FILE SERVER EMAIL SERVER CONTENT MGMT HADOOP++ CRM, ERP ADBMS/EDW
22. AIE – Non-Collocated JOIN
Unlimited scaling of
Node 1 JOIN capabilities
No special planning required to
JOIN across content/data spread
across partitions
Hash-based Partitioning Query
Cross-Node JOIN
of Ingested Documents
Coordination
and Records
Node 2 JOIN(table:A, INNER(table:B),
Table A INNER(table:email), on=“emailaddress”
Table B
23. Hadoop & AIE – Complementary
Hadoop is great for…
• Rapidly collecting an extremely large volume of unprocessed information
• Providing a flexible, (if sometimes complicated) way to ask almost any
question of information
• Bringing information to data scientists
• Batch processing where latency is not a concern
AIE is great for…
• Deep insight across structured and especially unstructured information
• Handling the Variety of Extreme Information
• Getting answers quickly
• Providing simple ways of asking questions
• Bringing information to end users using their desired method
• Real-time / high-velocity analysis
24. SEARCH & UIA PACKAGED ACTIVE AD HOC QUERY BI &
DISCOVERY APPLICATIONS APPLICATIONS DASHBOARDS TOOLS REPORTING
ATTIVIO
ACTIVE INTELLIGENCE ENGINE (AIE) 3.0
HADOOP
(HIVE, HDFS, HBASE)
FILE SERVER EMAIL SERVER CONTENT MGMT ADBMS/EDW WEB SERVER MONITORED SENSOR
SYSTEM
25.
26.
27.
28.
29.
30. AIE XT Module – Key Features
• Connectors to Big Data sources
• Hadoop (Hive, Hbase, HDFS)
• Cloudera
• Others coming soon…
• Data integration in the
engine/workflow
• Text analytics
• Data cleansing & mining
• Correlate at query time
• Universal information repository
• Natively parallel, scales without
excessive hardware costs
• ODBC/JDBC Connectivity Module
• Attivio Classification Engine
• Attivio Behavioral Analytics Module
31. AIE & Hadoop – Find “Mapreduce Tutorial”
Using MapReduce Using AIE to Index
Using AIE Workflow
public class SampleSimpleIngestTransformer extends AbstractSingleDocumentTransformer {
private String value = "mapreduce
tutorial";
@Override
public ProcessingResult
processDocument(AttivioDocument doc) throws
AttivioException {
for (Field<?> f : doc) {
for (FieldValue<?> fv : f) {
if
(fv.getValueAsString().contains(value)) {
return dropResult();
}
}
}
return okResult();
}
public String getValue() {
return value;
}
public void setValue(String value) {
this.value = value;
}
}
32. Case Study
• Content aggregator needed to provide faster, better
Problem
customer experience to build business
• Needed to replace Lucene implementation, which
couldn't be adapted to meet requirements
• Goals: reduce latency, serve more queries, improve
relevancy of results, streamline white-label business
• Handles massive query volume, rapid updates and
Why AIE?
low latency better than competitors
• Able to improve relevancy with: information about Decision Drivers
past purchases, fuzzy search and language modeling
High query volume
• Workflow supports white-label strategy without
writing more software Low latency
Rapid updates
• Thumbplay can offer more content and handle more
Results
Results relevancy tuning
demand; customer experience improved
Workflow
• Operations simplified by reduced complexity
Rapid development &
• Rapid development reduces cost and time in serving
deployment
revenue-generating partnerships
33. Case Study
• Launch a major new online music service
Problem
incorporating streaming music, local caching,
internet radio, personalization and multiple
subscription/service levels
• Expect 2,000+ queries per second during beta, up to
5x that in production
• Handles massive query volume, rapid updates and
Why AIE?
low latency better than competitors
• Able to improve relevancy using fuzzy name Decision Drivers
matching, artist aliases and transaction history
High query volume
• Workflow supports white-label strategy without
writing more software Low latency
Rapid updates
• iHeartRadio launched with no scalability or
Results
Relevancy based on sales
performance problems
history
• Operations simplified by reduced complexity
Workflow
• Rapid development reduces cost and time in serving
revenue-generating partnerships
35. Case Study: DCS eMap
1. Review Saved Searches 2. New or refined search 3. Who are custodians? What are 4. What is all this about?
their profiles?
Review Issues Include exclude threads Show Tag Cloud
Who’s active and who’s
Review MDi Reports Mark relevance Show Facets
passive?
Review related thread
Where should we start looking?
Refine search query
5. 6. 7. 8.
What are the specific and Who is talking to who? Review email, tweets, posts Compare threads across
related conversations? email, Facebook and Twiitter
Who’s been dropped/added? Annotate items
Who are the participants? Annotate items
Annotate conversations Review attachments
What other conversations Review attachments
Create issues for further review Tag issues for further review
did they participate in?
36. Case Study: Database Archiving
• Leading database archiving suite reduces IT costs and
Problem
demands on production systems through archiving and
legacy application decommissioning
• Needed an easy way to quickly find data across archives
without a priori knowledge of archive structure
• Ease of integration, rapid development model and high
Decision Drivers
Why AIE?
degree of flexibility; grey-box platform
• Discovery-oriented information access capabilities such as Addresses key gap in product line
FacetFinder, fuzzy query operators, spelling correction, etc…
Easy integration with existing
• Support for multiple query types - keyword search and SQL architecture
• Strategic unified information access vision alignment Unified information access
• Eight months from project inception to GA strategic direction
Multiple query modes – search
Results
• High-speed access to archive information; no more
concerns about ability to access data once archived and SQL
• Powerful cross-archive query capabilities enhance legal
discovery and data retention use cases
37. Case Study: Financial Services Regulation
• Leading financial firm needed to reduce costs and
Problem
risk by expediting rule monitoring and policy updates
• 700+ staff track 200+ global regulators, who publish
in different formats (Word, web, PDF, etc.)
• Needed to streamline collection/reporting of metrics
and policy activity for oversight and audits
• Ability to harvest, analyze and link diverse
Why AIE?
information types
• Provide immediate notification of new rules and
interactive, role-based dashboards
• Ability to both push and pull information, generate
audit reports and issue alerts
Decision Drivers
Information harvesting and analysis
• No more manual monitoring of regulators
Results
Unified information
• Changes and drafts are detected and tracked on the
dashboard Workflow, alerts and triggers
• Users see a roll up of the risk that matters to them Role-based Active Dashboard
• Workflow automates compliance processes SharePoint Integration
38. Case Study: IT Incident Management
• Disruptions in application availability affected a large
financial services ability to meet SLA
Why AIE? Problem
• The company’s goal is to identify warning conditions so
fixes can be applied before an incident
• Key challenge: required information for issue resolution
is scattered across more than 60 diverse sources
• Ease of use in retrieving and linking information
across data and content sources
• Query processing speed and scalability
• Reporting and analysis of service metrics with all Decision Drivers
relevant data - pushed to users via role-based Active Unified information across content
Dashboard and alerts
and data sources
• Executive dashboards with comprehensive information Facets and JOINs for tracing and
Results
and push-delivery ensure insight and rapid results connecting related tickets,
knowledge base docs and experts
• Reduced MTTR for 17,000 annual service events from
27 to 3 minutes Ability to push information
Powered by Active Dashboard
39. Convergence Architecture Using AIE
(3) The new convergence
application consumes Taxonomies, Ontologies,
information from the UIA Lexicons
engine, and when
Convergence App
necessary acts on it using
the wrapped/generalized
methods
Attivio AIE
API/Methods Content/Data
(2) Content from
(1) Important functions in legacy applications
legacy apps are wrapped is loaded into a UIA
and generalized engine and
normalized/rationali
zed using various
Legacy Legacy Legacy Legacy
App 1 App 2 App 3 … App N
information
structures like an
Ontology
40. ENTER FOR A CHANCE TO
WIN A 50” LG HDTV
LEAVE YOUR BUSINESS CARD TO ENTER
It’s That Easy to WIN!!!
SHOW EXHIBITORS AND THEIR EMPLOYEES ARE NOT ELIGIBLE TO PARTICIPATE IN THE DRAWING. TV WILL BE SHIPPED TO WINNER AFTER THE SHOW, WITHIN THE US ONLY.
41. QUESTIONS?
FOR MORE INFORMATION PLEASE VISIT
WWW.ATTIVIO.COM OR STOP BY OUR BOOTH
Editor's Notes
Volume: machine generated transactions (sensors, log files, click streams, automated trades)Variety: includes other sources of all kinds of data which may have very high volume (such as email or IM)Velocity: batch is not always sufficient, especially for “point of sale” analyticsComplexity:variable data types, unstructured content, extracting meaning by referencing other transactions