SlideShare a Scribd company logo
1 of 28
Download to read offline
NoSQL & MongoDB..Part III
Arindam Chatterjee
Aggregation in MongoDB
•
•
•
•

•

Aggregations are operations that process data records and return computed
results.
MongoDB provides a rich set of aggregation operations that examine and
perform calculations on the data sets.
Running data aggregation on the mongod instance simplifies application code
and limits resource requirements.
Like queries, aggregation operations in MongoDB use collections of
documents as an input and return results in the form of one or more
documents.
In MongoDB aggregations are implemented using
– Aggregation Pipeline
– Map-Reduce
Aggregation in MongoDB
•
•
•
•

•

Aggregations are operations that process data records and return computed
results.
MongoDB provides a rich set of aggregation operations that examine and
perform calculations on the data sets.
Running data aggregation on the mongod instance simplifies application code
and limits resource requirements.
Like queries, aggregation operations in MongoDB use collections of
documents as an input and return results in the form of one or more
documents.
In MongoDB aggregations are implemented using
– Aggregation Pipeline
– Map-Reduce
Aggregation Pipeline
Map Reduce
•
•
•
•
•

MongoDB applies the map phase to each input document (i.e. the documents
in the collection that match the query condition).
The map function emits key-value pairs.
For those keys that have multiple values, MongoDB applies the reduce
phase, which collects and condenses the aggregated data.
MongoDB then stores the results in a collection.
MongoDB supports sharded collections both as input and output.
Map Reduce
Illustration
Map Reduce
Map Reduce..more example
•

Insert data in collection “orders” as follows
–

db.orders.insert({
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 25,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]

});

•

Task: Find the total price per customer

•

Step I: Define map function that emits “cust_id” and “price” pair
• var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
Map Reduce..more example..2
•

Define Reduce function with two arguments keyCustId and valuesPrices
– The valuesPrices is an array whose elements are the price values emitted by the
map function and grouped by keyCustId.
– The function reduces the valuesPrices array to the sum of its elements.
• var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
}

•

Perform the map-reduce on all documents in the orders collection using the
mapFunction1 map function and the reduceFunction1 reduce function.
– db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)

•

Do a find() to check the new collection “map_reduce_example”
– db.map_reduce_example.find();
Full Text Search in MongoDB
•

Important Concepts
– Stop Words: filter words that are irrelevant for searching. Examples are is, at, the,
am, I, your etc.
– Stemming: process of reducing words to their root, base .E.g. “waiting”, “waited”,
“waits” have the same root “wait”

•

Example: I am your father, Luke
– “I”, “am”, “your” are Stop Words
– After removing the Stop Words, the words left are “father” and “Luke”
– These are processed in next step
Text Search process in MongoDB
•
•
•

Tokenizes and stems the search term(s) during both the index creation and the text command
execution.
Assigns a score to each document that contains the search term in the indexed fields. The score
determines the relevance of a document to a given search query.
By default, the text command returns at most the top 100 matching documents as determined by
the scores.
Full Text Search in MongoDB..Example
•

While starting the MongoDB server, use the following parameters
– mongod --setParameter textSearchEnabled=true

•

Create a text Index on Collection “txt”
– db.txt.ensureIndex( {txt: "text"}

•

To show up the text index use the following
– db.txt.getIndices()

•

Insert data in collection “txt”
– db.txt.insert( {txt: "I am your father, Luke"} )

•

Stop word filtering has already happened. The following command shows
only 2 keys in the index txt.txt.$txt_text
– db.txt.validate()

•

Perform a Full Text Search using the following
– db.txt.runCommand( "text", { search : "father" } )
Text Analytics
What is Text Analytics
•

Process of identifying meaningful information from unstructured content
Social Media Analytics : Facebook, Twitter
What do people
Feel about the
latest movie?

What is our
competitor
doing in
market?

What is the
response to the
Last ad
campaign?

What is the
sentiment of
people in the
organization

What are
People feeling
about the new
brand of product
Text Analytics..2

Email Analytics

Log Analytics

• Customer Support
• Regulatory Compliance

• IT Server Log
Text Analytics..3

Fraud Detection
Analytics
• Insurance Claims
• Credit Card Transactions
• Tax Return claims
Text Analytics: Scenarios
•
•

Obtain reviews from various blogs, review
sites about a new movie
Highlight important viewer’s comments on
the movie

In the process, the Text Analytics engine
performs the following
•
•
•
•

Understand human language
Understand Positive vs. Negative
comments
Identify sarcasm, criticism, pun
Tries to interpret like a human being
Sentiment Analysis of the movie
Krrish 3 (Hindi) (U)
Krrish 3 (2013)
152 min - Action
6.5
Your rating:

6.5
November 2013 (India)

6.5/10

Ratings: 6.5/10 from 6,762 users
Reviews: 135 user | 26 critic
Krrish and his scientist father have to save the world and
their own family from an evil man named Kaal and his team
of human-animal mutants led by the ruthless Kaya. Will they
succeed? How?
Director:
Rakesh Roshan
Writers:
Robin Bhatt (screenplay), Honey Irani (screenplay), 5 more
credits »
Stars:
Priyanka Chopra, Hrithik Roshan, Amitabh Bachchan | See
full cast and crew »

“Wish I were 12 again”,
Author: shahin mahmud
1 November 2013
“Plagiarism..Plagiarism... Everywhere”
Author: venugopal19196 from Guntur
2 November 2013
“Krrish ek soch hain jo hum tak nahi pahunch paye”
Author: darkshadowsxtreme from India
4 November 2013
“Far below expectations”,
Author: Arpan Mallik from India
3 November 2013
“Krrish 3: No more than a mere rubbish..”
Author: amruthvvkp from India
3 November 2013
Text Analytics: Information Extraction
•
•

Distill structured data from unstructured and semi-structured text
Exploit the extracted data in your applications
Noun
Krish 3
Rakesh Roshan
Priyanka Chopra
Hrithik Roshan
Amitabh Bacchan
Robin Bhatt
Honey Irani

Unstructured
content

Adjective
good
worst
more
below

Comment
“Krrish ek soch hain jo hum tak nahi pahunch paye"
"rubbish"
"plagiarism"

Text Extraction
Engine

Extraction logic

Structured
Content
Text Analytics: Information Extraction..2
Pattern Recognition

Entities and Relations

• Phone numbers

• Person

• Date formats
• Email addresses
• URL

• Location
• Organization
• Association between entities

Linguistic Annotation

Others

• Tokenization

• Topic identification
• Sentiment / Opinion
• Classification
• Ontology

• Parts of Speech
• Normalization
• Co-reference resolution
Text Analytics Terminology
•

RegEx: Regular expression to recognize patterns of text, e.g. Phone number

•

Dictionaries: A list of entries containing domain specific terms. Example:
dictionary of city names, dictionary of IT companies

•

Text Extraction Script: A script that uses dictionaries and regex on a set of
text documents and performs extraction of text. Example: GATE Extractor
program

•

Annotation: A labeled text, matching a particular criteria. Example: Person
name
Precision: Measure of exactness or accuracy of pattern recognition program
Recall: Measure of completeness

•
•

The higher the precision and recall, the better the program is
Text Analytics Approaches
•

Grammar based
– Input text viewed as a
sequence of tokens
– Rules expressed as regular
expression patterns over
these tokens

•

Algebra based
– Extract SPANs matching a
dictionary or regex
– Create an operator for each
basic operation
– Compose operators to build
complex extractors
MongoDB as Analytics Platform
•
•

The flexibility of MongoDB makes it perfect for storing analytics.
Customers have different types of analytics engines on MongoDB platform
like
– usage metrics,
– business domain specific metrics,
– financial platforms.

•

•

The most generic type of metrics that most clients start tracking are events
(e.g. “how many people walked into my stores” or “how many people
opened an iPhone application”).
The queries to support the above questions should be efficient in a
distributed environment
MongoDB as Analytics Platform…2
•

Example: Insert data as follows
– {
store_id: ObjectId(), // Object id of a store
event: "door open", // will be one of "door opened", "sale made", or "phone calls"
created_at: new Date("2013-01-29T08:43:00Z")
}

•

To run a query on the event, store_id, and created_at, you run the following query.
– db.events.find({store_id: ObjectId("aaa"),
created_at: {$gte: new Date("2013-01-29T00:00:00Z"),
$lte: new Date("2013-01-30T00:00:00Z")}})

•

The above query runs fast in local environment but is painfully slow in a distributed
environment having large database

•

Multiple compound indexes are created to increase speed.
– db.events.ensureIndex({store_id: 1, created_at: 1})
db.events.ensureIndex({event: 1, created_at: 1})
db.events.ensureIndex({store_id: 1, event: 1, created_at: 1} )
MongoDB as Analytics Platform…2
•

Achieving Optimization
– Each of the indexes should fit into the RAM
– Any new document will have a seemingly randomly chosen “store_id”.
– An insert command will have a high probability of inserting the document record
to the middle of an index.
– To minimize RAM usage, it is best to insert sequentially: termed “writing to the
right side of the index”.
– Any new key is greater than or equal to the previous index key.
MongoDB as Analytics Platform…3
•

Achieving Optimization using “time bucket”
– Create a time_bucket attribute that breaks down acceptable date ranges to hour,
day, month, week, quarter, and/or year.
{

store_id: ObjectId(), // Object id of a store
event: "door open",
created_at: new Date("2013-01-29T08:43:00Z"),
time_bucket: [
"2013-01-29 08-hour", "2013-01-29-day", "2013-04-week",
"2013-01-quarter", "2013-year” ]}

"2013-01-month",

– Create the following indexes
db.events.ensureIndex({time_bucket: 1, store_id: 1, event: 1})
db.events.ensureIndex({time_bucket: 1, event: 1})

– Instead of running the query on entire range, run the following
db.events.find({store_id: ObjectId("aaa"), "time_bucket": "2013-01-29-day"})
MongoDB as Analytics Platform…4
•

Benefit of “time bucket”
– Using the optimized time_bucket, new documents are added to the right side of
the index.
– Any inserted document will have a greater time_bucket value than the previous
documents.
– By adding to the right side of the index and using time_bucket to query,
Mon-goDB will swap to disk any rarely older doc-u-ments resulting in minimal
RAM usage.
– The “hot data” size will be the most recently accessed (typically 1- 3 months with
most analytics applications), and the older data will settle nicely to disk.
– Nei-ther queries nor inserts will access the middle of the index, and older index
chunks can swap to disk.
Thank You

More Related Content

What's hot

Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...MongoDB
 
Back to basics Italian webinar 2 Mia prima applicazione MongoDB
Back to basics Italian webinar 2  Mia prima applicazione MongoDBBack to basics Italian webinar 2  Mia prima applicazione MongoDB
Back to basics Italian webinar 2 Mia prima applicazione MongoDBMongoDB
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamskyData Con LA
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseMongoDB
 
Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationBack to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationMongoDB
 
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...MongoDB
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherMongoDB
 
Doing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookupDoing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookupMongoDB
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...MongoDB
 
MongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingMongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingManish Kapoor
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsAndrew Morgan
 
User Data Management with MongoDB
User Data Management with MongoDB User Data Management with MongoDB
User Data Management with MongoDB MongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 
ElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to AggregationsElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to Aggregationsenterprisesearchmeetup
 
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphMongoDB
 

What's hot (20)

Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
 
Back to basics Italian webinar 2 Mia prima applicazione MongoDB
Back to basics Italian webinar 2  Mia prima applicazione MongoDBBack to basics Italian webinar 2  Mia prima applicazione MongoDB
Back to basics Italian webinar 2 Mia prima applicazione MongoDB
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick Database
 
Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationBack to Basics: My First MongoDB Application
Back to Basics: My First MongoDB Application
 
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
Doing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookupDoing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookup
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
 
MongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingMongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and Profiling
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
 
MongoDB + Spring
MongoDB + SpringMongoDB + Spring
MongoDB + Spring
 
User Data Management with MongoDB
User Data Management with MongoDB User Data Management with MongoDB
User Data Management with MongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
ElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to AggregationsElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to Aggregations
 
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
 

Similar to Nosql part3

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRaghunath A
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)MongoDB
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 
Making App Developers More Productive
Making App Developers More ProductiveMaking App Developers More Productive
Making App Developers More ProductivePostman
 
MongoDB Days Silicon Valley: Building Applications with the MEAN Stack
MongoDB Days Silicon Valley: Building Applications with the MEAN StackMongoDB Days Silicon Valley: Building Applications with the MEAN Stack
MongoDB Days Silicon Valley: Building Applications with the MEAN StackMongoDB
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBBack to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBMongoDB
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & AggregationMongoDB
 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWAnkur Raina
 
Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)MongoDB
 
Mongophilly indexing-2011-04-26
Mongophilly indexing-2011-04-26Mongophilly indexing-2011-04-26
Mongophilly indexing-2011-04-26kreuter
 
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-stepsMatteo Moci
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesignMongoDB APAC
 
Back to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDBBack to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDBMongoDB
 
Novedades de MongoDB 3.6
Novedades de MongoDB 3.6Novedades de MongoDB 3.6
Novedades de MongoDB 3.6MongoDB
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDBElieHannouch
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataAbhishek M Shivalingaiah
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBTakahiro Inoue
 

Similar to Nosql part3 (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
Nosql part 2
Nosql part 2Nosql part 2
Nosql part 2
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Making App Developers More Productive
Making App Developers More ProductiveMaking App Developers More Productive
Making App Developers More Productive
 
MongoDB Days Silicon Valley: Building Applications with the MEAN Stack
MongoDB Days Silicon Valley: Building Applications with the MEAN StackMongoDB Days Silicon Valley: Building Applications with the MEAN Stack
MongoDB Days Silicon Valley: Building Applications with the MEAN Stack
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBBack to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDB
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUW
 
Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)
 
Mongophilly indexing-2011-04-26
Mongophilly indexing-2011-04-26Mongophilly indexing-2011-04-26
Mongophilly indexing-2011-04-26
 
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
 
Back to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDBBack to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDB
 
Novedades de MongoDB 3.6
Novedades de MongoDB 3.6Novedades de MongoDB 3.6
Novedades de MongoDB 3.6
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
 
MongoDB_ppt.pptx
MongoDB_ppt.pptxMongoDB_ppt.pptx
MongoDB_ppt.pptx
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
 

More from Ruru Chowdhury

The One With The Wizards and Dragons. Prelims
The One With The Wizards and Dragons. PrelimsThe One With The Wizards and Dragons. Prelims
The One With The Wizards and Dragons. PrelimsRuru Chowdhury
 
The One With The Wizards and Dragons. Finals
The One With The Wizards and Dragons. FinalsThe One With The Wizards and Dragons. Finals
The One With The Wizards and Dragons. FinalsRuru Chowdhury
 
Statr session 25 and 26
Statr session 25 and 26Statr session 25 and 26
Statr session 25 and 26Ruru Chowdhury
 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24Ruru Chowdhury
 
Statr session 21 and 22
Statr session 21 and 22Statr session 21 and 22
Statr session 21 and 22Ruru Chowdhury
 
Statr session 19 and 20
Statr session 19 and 20Statr session 19 and 20
Statr session 19 and 20Ruru Chowdhury
 
Statr session 17 and 18
Statr session 17 and 18Statr session 17 and 18
Statr session 17 and 18Ruru Chowdhury
 
Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Ruru Chowdhury
 
Statr session 15 and 16
Statr session 15 and 16Statr session 15 and 16
Statr session 15 and 16Ruru Chowdhury
 
Statr session14, Jan 11
Statr session14, Jan 11Statr session14, Jan 11
Statr session14, Jan 11Ruru Chowdhury
 
JM Statr session 13, Jan 11
JM Statr session 13, Jan 11JM Statr session 13, Jan 11
JM Statr session 13, Jan 11Ruru Chowdhury
 
Statr sessions 11 to 12
Statr sessions 11 to 12Statr sessions 11 to 12
Statr sessions 11 to 12Ruru Chowdhury
 
Nosql part1 8th December
Nosql part1 8th December Nosql part1 8th December
Nosql part1 8th December Ruru Chowdhury
 
Statr sessions 9 to 10
Statr sessions 9 to 10Statr sessions 9 to 10
Statr sessions 9 to 10Ruru Chowdhury
 

More from Ruru Chowdhury (20)

The One With The Wizards and Dragons. Prelims
The One With The Wizards and Dragons. PrelimsThe One With The Wizards and Dragons. Prelims
The One With The Wizards and Dragons. Prelims
 
The One With The Wizards and Dragons. Finals
The One With The Wizards and Dragons. FinalsThe One With The Wizards and Dragons. Finals
The One With The Wizards and Dragons. Finals
 
Statr session 25 and 26
Statr session 25 and 26Statr session 25 and 26
Statr session 25 and 26
 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24
 
Statr session 21 and 22
Statr session 21 and 22Statr session 21 and 22
Statr session 21 and 22
 
Statr session 19 and 20
Statr session 19 and 20Statr session 19 and 20
Statr session 19 and 20
 
Statr session 17 and 18
Statr session 17 and 18Statr session 17 and 18
Statr session 17 and 18
 
Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)Statr session 17 and 18 (ASTR)
Statr session 17 and 18 (ASTR)
 
Statr session 15 and 16
Statr session 15 and 16Statr session 15 and 16
Statr session 15 and 16
 
Statr session14, Jan 11
Statr session14, Jan 11Statr session14, Jan 11
Statr session14, Jan 11
 
JM Statr session 13, Jan 11
JM Statr session 13, Jan 11JM Statr session 13, Jan 11
JM Statr session 13, Jan 11
 
Statr sessions 11 to 12
Statr sessions 11 to 12Statr sessions 11 to 12
Statr sessions 11 to 12
 
Nosql part1 8th December
Nosql part1 8th December Nosql part1 8th December
Nosql part1 8th December
 
Statr sessions 9 to 10
Statr sessions 9 to 10Statr sessions 9 to 10
Statr sessions 9 to 10
 
R part iii
R part iiiR part iii
R part iii
 
R part II
R part IIR part II
R part II
 
Statr sessions 7 to 8
Statr sessions 7 to 8Statr sessions 7 to 8
Statr sessions 7 to 8
 
R part I
R part IR part I
R part I
 
Statr sessions 4 to 6
Statr sessions 4 to 6Statr sessions 4 to 6
Statr sessions 4 to 6
 
Statistics with R
Statistics with R Statistics with R
Statistics with R
 

Recently uploaded

Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroomSamsung Business USA
 
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...HetalPathak10
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxAvaniJani1
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 

Recently uploaded (20)

Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom
 
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
 
Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 

Nosql part3

  • 1. NoSQL & MongoDB..Part III Arindam Chatterjee
  • 2. Aggregation in MongoDB • • • • • Aggregations are operations that process data records and return computed results. MongoDB provides a rich set of aggregation operations that examine and perform calculations on the data sets. Running data aggregation on the mongod instance simplifies application code and limits resource requirements. Like queries, aggregation operations in MongoDB use collections of documents as an input and return results in the form of one or more documents. In MongoDB aggregations are implemented using – Aggregation Pipeline – Map-Reduce
  • 3. Aggregation in MongoDB • • • • • Aggregations are operations that process data records and return computed results. MongoDB provides a rich set of aggregation operations that examine and perform calculations on the data sets. Running data aggregation on the mongod instance simplifies application code and limits resource requirements. Like queries, aggregation operations in MongoDB use collections of documents as an input and return results in the form of one or more documents. In MongoDB aggregations are implemented using – Aggregation Pipeline – Map-Reduce
  • 5. Map Reduce • • • • • MongoDB applies the map phase to each input document (i.e. the documents in the collection that match the query condition). The map function emits key-value pairs. For those keys that have multiple values, MongoDB applies the reduce phase, which collects and condenses the aggregated data. MongoDB then stores the results in a collection. MongoDB supports sharded collections both as input and output.
  • 8. Map Reduce..more example • Insert data in collection “orders” as follows – db.orders.insert({ _id: ObjectId("50a8240b927d5d8b5891743c"), cust_id: "abc123", ord_date: new Date("Oct 04, 2012"), status: 'A', price: 25, items: [ { sku: "mmm", qty: 5, price: 2.5 }, { sku: "nnn", qty: 5, price: 2.5 } ] }); • Task: Find the total price per customer • Step I: Define map function that emits “cust_id” and “price” pair • var mapFunction1 = function() { emit(this.cust_id, this.price); };
  • 9. Map Reduce..more example..2 • Define Reduce function with two arguments keyCustId and valuesPrices – The valuesPrices is an array whose elements are the price values emitted by the map function and grouped by keyCustId. – The function reduces the valuesPrices array to the sum of its elements. • var reduceFunction1 = function(keyCustId, valuesPrices) { return Array.sum(valuesPrices); } • Perform the map-reduce on all documents in the orders collection using the mapFunction1 map function and the reduceFunction1 reduce function. – db.orders.mapReduce( mapFunction1, reduceFunction1, { out: "map_reduce_example" } ) • Do a find() to check the new collection “map_reduce_example” – db.map_reduce_example.find();
  • 10. Full Text Search in MongoDB • Important Concepts – Stop Words: filter words that are irrelevant for searching. Examples are is, at, the, am, I, your etc. – Stemming: process of reducing words to their root, base .E.g. “waiting”, “waited”, “waits” have the same root “wait” • Example: I am your father, Luke – “I”, “am”, “your” are Stop Words – After removing the Stop Words, the words left are “father” and “Luke” – These are processed in next step
  • 11. Text Search process in MongoDB • • • Tokenizes and stems the search term(s) during both the index creation and the text command execution. Assigns a score to each document that contains the search term in the indexed fields. The score determines the relevance of a document to a given search query. By default, the text command returns at most the top 100 matching documents as determined by the scores.
  • 12. Full Text Search in MongoDB..Example • While starting the MongoDB server, use the following parameters – mongod --setParameter textSearchEnabled=true • Create a text Index on Collection “txt” – db.txt.ensureIndex( {txt: "text"} • To show up the text index use the following – db.txt.getIndices() • Insert data in collection “txt” – db.txt.insert( {txt: "I am your father, Luke"} ) • Stop word filtering has already happened. The following command shows only 2 keys in the index txt.txt.$txt_text – db.txt.validate() • Perform a Full Text Search using the following – db.txt.runCommand( "text", { search : "father" } )
  • 14. What is Text Analytics • Process of identifying meaningful information from unstructured content Social Media Analytics : Facebook, Twitter What do people Feel about the latest movie? What is our competitor doing in market? What is the response to the Last ad campaign? What is the sentiment of people in the organization What are People feeling about the new brand of product
  • 15. Text Analytics..2 Email Analytics Log Analytics • Customer Support • Regulatory Compliance • IT Server Log
  • 16. Text Analytics..3 Fraud Detection Analytics • Insurance Claims • Credit Card Transactions • Tax Return claims
  • 17. Text Analytics: Scenarios • • Obtain reviews from various blogs, review sites about a new movie Highlight important viewer’s comments on the movie In the process, the Text Analytics engine performs the following • • • • Understand human language Understand Positive vs. Negative comments Identify sarcasm, criticism, pun Tries to interpret like a human being
  • 18. Sentiment Analysis of the movie Krrish 3 (Hindi) (U) Krrish 3 (2013) 152 min - Action 6.5 Your rating: 6.5 November 2013 (India) 6.5/10 Ratings: 6.5/10 from 6,762 users Reviews: 135 user | 26 critic Krrish and his scientist father have to save the world and their own family from an evil man named Kaal and his team of human-animal mutants led by the ruthless Kaya. Will they succeed? How? Director: Rakesh Roshan Writers: Robin Bhatt (screenplay), Honey Irani (screenplay), 5 more credits » Stars: Priyanka Chopra, Hrithik Roshan, Amitabh Bachchan | See full cast and crew » “Wish I were 12 again”, Author: shahin mahmud 1 November 2013 “Plagiarism..Plagiarism... Everywhere” Author: venugopal19196 from Guntur 2 November 2013 “Krrish ek soch hain jo hum tak nahi pahunch paye” Author: darkshadowsxtreme from India 4 November 2013 “Far below expectations”, Author: Arpan Mallik from India 3 November 2013 “Krrish 3: No more than a mere rubbish..” Author: amruthvvkp from India 3 November 2013
  • 19. Text Analytics: Information Extraction • • Distill structured data from unstructured and semi-structured text Exploit the extracted data in your applications Noun Krish 3 Rakesh Roshan Priyanka Chopra Hrithik Roshan Amitabh Bacchan Robin Bhatt Honey Irani Unstructured content Adjective good worst more below Comment “Krrish ek soch hain jo hum tak nahi pahunch paye" "rubbish" "plagiarism" Text Extraction Engine Extraction logic Structured Content
  • 20. Text Analytics: Information Extraction..2 Pattern Recognition Entities and Relations • Phone numbers • Person • Date formats • Email addresses • URL • Location • Organization • Association between entities Linguistic Annotation Others • Tokenization • Topic identification • Sentiment / Opinion • Classification • Ontology • Parts of Speech • Normalization • Co-reference resolution
  • 21. Text Analytics Terminology • RegEx: Regular expression to recognize patterns of text, e.g. Phone number • Dictionaries: A list of entries containing domain specific terms. Example: dictionary of city names, dictionary of IT companies • Text Extraction Script: A script that uses dictionaries and regex on a set of text documents and performs extraction of text. Example: GATE Extractor program • Annotation: A labeled text, matching a particular criteria. Example: Person name Precision: Measure of exactness or accuracy of pattern recognition program Recall: Measure of completeness • • The higher the precision and recall, the better the program is
  • 22. Text Analytics Approaches • Grammar based – Input text viewed as a sequence of tokens – Rules expressed as regular expression patterns over these tokens • Algebra based – Extract SPANs matching a dictionary or regex – Create an operator for each basic operation – Compose operators to build complex extractors
  • 23. MongoDB as Analytics Platform • • The flexibility of MongoDB makes it perfect for storing analytics. Customers have different types of analytics engines on MongoDB platform like – usage metrics, – business domain specific metrics, – financial platforms. • • The most generic type of metrics that most clients start tracking are events (e.g. “how many people walked into my stores” or “how many people opened an iPhone application”). The queries to support the above questions should be efficient in a distributed environment
  • 24. MongoDB as Analytics Platform…2 • Example: Insert data as follows – { store_id: ObjectId(), // Object id of a store event: "door open", // will be one of "door opened", "sale made", or "phone calls" created_at: new Date("2013-01-29T08:43:00Z") } • To run a query on the event, store_id, and created_at, you run the following query. – db.events.find({store_id: ObjectId("aaa"), created_at: {$gte: new Date("2013-01-29T00:00:00Z"), $lte: new Date("2013-01-30T00:00:00Z")}}) • The above query runs fast in local environment but is painfully slow in a distributed environment having large database • Multiple compound indexes are created to increase speed. – db.events.ensureIndex({store_id: 1, created_at: 1}) db.events.ensureIndex({event: 1, created_at: 1}) db.events.ensureIndex({store_id: 1, event: 1, created_at: 1} )
  • 25. MongoDB as Analytics Platform…2 • Achieving Optimization – Each of the indexes should fit into the RAM – Any new document will have a seemingly randomly chosen “store_id”. – An insert command will have a high probability of inserting the document record to the middle of an index. – To minimize RAM usage, it is best to insert sequentially: termed “writing to the right side of the index”. – Any new key is greater than or equal to the previous index key.
  • 26. MongoDB as Analytics Platform…3 • Achieving Optimization using “time bucket” – Create a time_bucket attribute that breaks down acceptable date ranges to hour, day, month, week, quarter, and/or year. { store_id: ObjectId(), // Object id of a store event: "door open", created_at: new Date("2013-01-29T08:43:00Z"), time_bucket: [ "2013-01-29 08-hour", "2013-01-29-day", "2013-04-week", "2013-01-quarter", "2013-year” ]} "2013-01-month", – Create the following indexes db.events.ensureIndex({time_bucket: 1, store_id: 1, event: 1}) db.events.ensureIndex({time_bucket: 1, event: 1}) – Instead of running the query on entire range, run the following db.events.find({store_id: ObjectId("aaa"), "time_bucket": "2013-01-29-day"})
  • 27. MongoDB as Analytics Platform…4 • Benefit of “time bucket” – Using the optimized time_bucket, new documents are added to the right side of the index. – Any inserted document will have a greater time_bucket value than the previous documents. – By adding to the right side of the index and using time_bucket to query, Mon-goDB will swap to disk any rarely older doc-u-ments resulting in minimal RAM usage. – The “hot data” size will be the most recently accessed (typically 1- 3 months with most analytics applications), and the older data will settle nicely to disk. – Nei-ther queries nor inserts will access the middle of the index, and older index chunks can swap to disk.