SlideShare a Scribd company logo
“Segmentation”
as the Workhorse of
Business Analytics
Overview
We research hierarchy of topics extracted from documents (news,
publications, discussions etc.).
Our system is targeted at data researchers.
It provides:
 Trend tracking
 Similar and related topics detection
 Topic segmentation, which aims to solve information
overload(http://mlvl.github.io/Hierarchie/) problem
The topic model we use is not a collection of tags but is the combination of
NLP + statistical analysis.
Possible applications
 creating concept infographics(http://findtheconversation.com/concept-map/)
 estimating concept influence(http://brightpointinc.com/political_influence/)
 detecting semantic relations(http://bl.ocks.org/mbostock/1153292)
 nested segments visualization (http://bl.ocks.org/mbostock/7607535)-
concepts hierarchy
Research plans
Test prototype
We developed a prototype called Data Mining Tool (DMT) for
testing analytics model.
As test data, we use tech and political news (about 2k + 1k RSS
Feeds delivering 10k news daily).
DMT workflow
1. Import Documents to Index
2. Extract meta-data for each Document (NLP: keywords labels, terms etc.)
3. Extract Chains using Cluster Analysis
4. Assign Weights to Topics
5. Build Trends using ranking by Current Weight and Weight Dynamics
6. Build Segments (related topics, nested topics)
7. Visualize Data (Trends Statistics, Segments Hierarchy)
8. Explore Data (Flexible Search UI: Trends, Documents, Segments, Keywords
etc.)
9. Use API to communicate with the system
DMT workflow chart
Test documents
About 370k news were imported in Sept-Nov 2014.
Document & Terms distribution statistics
NLP analysis (meta-data extraction)
Sentence NLP in index Current NLP
Metadata
Clustering analysis
Assign weight
By Summary
Creating topics tree
By Terms
By Labels
Clustering visualization
Clustering
histogram
Statistics attributes
Trends
Weight
of trends
Segments and
related topics
Trends Weight Dynamics
Segments
Visualization of Related and Nested Topics
zoom in
Visualization. Related Topics
Main Topic
Related
Topic
China
zoom in
Visualization. Nested Topics
Nested
Topics
Main Topic
China
zoom in
Visualization of topics for “Japan” keyword
United States
electionJapan
Hierarchies Topic Tree. Graph
42 Topics
for “Japan” keyword
JAPAN
20 Topics
for “Japan” keyword
Grouping similar Topics (all topics)
790 topics 526 topics
Document search
Search by metadata
Keep track of the
analyzed articles
Glossary
Term – sequence of characters for training NLP application
(represents Named Entity).
Trend - unique keywords chain with weight.
Topic – abstract ‘cluster’ of relations between particular keywords
that occur in Trend.
Segment - group of similar Trends, intersected by search results.
Segmentation – relations between topics from different segments,
based on subtopic dynamics. Represents 'new knowledge'.
Thread - sequence of keywords extracted from given sentence.
Label - an attribute of Term that defines its properties.
Technologies
PHP, CakePHP Framework
Python, Frameworks: NLTK, Django, Django-Rest-
Framework
Java, Jersey Framework, Stanford CoreNlp
Elasticsearch, MySQL DB
Team
We are a team with more than 3 years experience of Data Mining
research and projects.
We are interested in making sense of big data and experimenting with
Machine Learning Techniques. We build Semantic Networks and NLP
projects based on open-source projects as well as our own.
Oleksandr Shamrai - PHP software engineer, responsive for core
algorithms implementation and performance, team development tools
and rules
Pavel Yakovlev - Business analyst and QA, has passion for data mining:
cluster analysis and recommendation solutions
Max Leonov - Python software engineer, responsible for NLP (Natural
Languages Processing) applications modeling, development, testing and
deployment process

More Related Content

What's hot

Text mining
Text miningText mining
Text mining
Pankaj Thakur
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrievalKU Leuven
 
data mining
data miningdata mining
data mining
manasa polu
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
Analysing Large Citation Network
Analysing Large Citation NetworkAnalysing Large Citation Network
Analysing Large Citation Network
Milad Alshomary
 
Konsep Dasar Information Retrieval - Edi faizal
Konsep Dasar Information Retrieval - Edi faizal Konsep Dasar Information Retrieval - Edi faizal
Konsep Dasar Information Retrieval - Edi faizal
EdiFaizal2
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
Kira
 
Techniques of information retrieval
Techniques of information retrieval Techniques of information retrieval
Techniques of information retrieval
Tariq Hassan
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
Trey Grainger
 
Semantic Data Normalization For Efficient Clinical Trial Research
Semantic Data Normalization For Efficient Clinical Trial ResearchSemantic Data Normalization For Efficient Clinical Trial Research
Semantic Data Normalization For Efficient Clinical Trial Research
Ontotext
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challenge
Gan Keng Hoon
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
Text Data Mining
Text Data MiningText Data Mining
Text Data MiningKU Leuven
 
How to be successful with search in your organisation
How to be successful with search in your organisationHow to be successful with search in your organisation
How to be successful with search in your organisation
voginip
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Laurent Alquier
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)9866825059
 
Selection of Tags for Tag Clouds
Selection of Tags for Tag CloudsSelection of Tags for Tag Clouds
Selection of Tags for Tag Clouds
Aakash Gupta
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search Engine
Jay R Modi
 
Text mining
Text miningText mining
Text mining
Koshy Geoji
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining Processing
Ontotext
 

What's hot (20)

Text mining
Text miningText mining
Text mining
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
data mining
data miningdata mining
data mining
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Analysing Large Citation Network
Analysing Large Citation NetworkAnalysing Large Citation Network
Analysing Large Citation Network
 
Konsep Dasar Information Retrieval - Edi faizal
Konsep Dasar Information Retrieval - Edi faizal Konsep Dasar Information Retrieval - Edi faizal
Konsep Dasar Information Retrieval - Edi faizal
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Techniques of information retrieval
Techniques of information retrieval Techniques of information retrieval
Techniques of information retrieval
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Semantic Data Normalization For Efficient Clinical Trial Research
Semantic Data Normalization For Efficient Clinical Trial ResearchSemantic Data Normalization For Efficient Clinical Trial Research
Semantic Data Normalization For Efficient Clinical Trial Research
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challenge
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Text Data Mining
Text Data MiningText Data Mining
Text Data Mining
 
How to be successful with search in your organisation
How to be successful with search in your organisationHow to be successful with search in your organisation
How to be successful with search in your organisation
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Selection of Tags for Tag Clouds
Selection of Tags for Tag CloudsSelection of Tags for Tag Clouds
Selection of Tags for Tag Clouds
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search Engine
 
Text mining
Text miningText mining
Text mining
 
Best Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining ProcessingBest Practices for Large Scale Text Mining Processing
Best Practices for Large Scale Text Mining Processing
 

Viewers also liked

THE FOUR SEASONS CARNIVAL
THE FOUR SEASONS CARNIVALTHE FOUR SEASONS CARNIVAL
THE FOUR SEASONS CARNIVAL
musicenglish
 
Project management - different roles in a project
Project management - different roles in a projectProject management - different roles in a project
Project management - different roles in a projectSOPUGent
 
Lessons in School Gardening for the SouthEast: Vegetables
Lessons in School Gardening for the SouthEast: VegetablesLessons in School Gardening for the SouthEast: Vegetables
Lessons in School Gardening for the SouthEast: Vegetables
xx5v1
 
quijote de la mancha
 quijote de la mancha quijote de la mancha
quijote de la mancha
polocglol
 
List of Publications for the Use of School Home Garden Teachers
List of Publications for the Use of School Home Garden TeachersList of Publications for the Use of School Home Garden Teachers
List of Publications for the Use of School Home Garden Teachers
xx5v1
 
The School Garden in the Course of Study
The School Garden in the Course of StudyThe School Garden in the Course of Study
The School Garden in the Course of Study
xx5v1
 
2014 Winter Olympics New Events
2014 Winter Olympics New Events2014 Winter Olympics New Events
2014 Winter Olympics New EventsSarah Dursteler
 
How to Make the School Garden Soil More Productive
How to Make the School Garden Soil More ProductiveHow to Make the School Garden Soil More Productive
How to Make the School Garden Soil More Productive
xx5v1
 
Question 1
Question 1Question 1
Question 1theabell
 
School Garden Manual for the West
School Garden Manual for the WestSchool Garden Manual for the West
School Garden Manual for the West
xx5v1
 
SEO best practice in 2014 with ActiveStandards
SEO best practice in 2014 with ActiveStandardsSEO best practice in 2014 with ActiveStandards
SEO best practice in 2014 with ActiveStandards
James Baverstock
 
Planting the School Garden
Planting the School GardenPlanting the School Garden
Planting the School Garden
xx5v1
 
Mapas mentales dllo humano
Mapas mentales dllo humanoMapas mentales dllo humano
Mapas mentales dllo humanovapelaez
 
Horario tads 2014 1
Horario tads 2014 1Horario tads 2014 1
Horario tads 2014 1ramiltonls
 
Better E-commerce
Better E-commerceBetter E-commerce
Better E-commerceeuromsg
 
Otel şi̇rket felsefe ve yapilanma
Otel şi̇rket felsefe ve yapilanmaOtel şi̇rket felsefe ve yapilanma
Otel şi̇rket felsefe ve yapilanma
Ahmet Seymen
 
Part Played by the Leaf in the Production of School Garden Crop
Part Played by the Leaf in the Production of School Garden CropPart Played by the Leaf in the Production of School Garden Crop
Part Played by the Leaf in the Production of School Garden Crop
xx5v1
 
Engr de 6_unitplan_flip_flopcircuits
Engr de 6_unitplan_flip_flopcircuitsEngr de 6_unitplan_flip_flopcircuits
Engr de 6_unitplan_flip_flopcircuitsanil u
 

Viewers also liked (20)

THE FOUR SEASONS CARNIVAL
THE FOUR SEASONS CARNIVALTHE FOUR SEASONS CARNIVAL
THE FOUR SEASONS CARNIVAL
 
Project management - different roles in a project
Project management - different roles in a projectProject management - different roles in a project
Project management - different roles in a project
 
Lessons in School Gardening for the SouthEast: Vegetables
Lessons in School Gardening for the SouthEast: VegetablesLessons in School Gardening for the SouthEast: Vegetables
Lessons in School Gardening for the SouthEast: Vegetables
 
quijote de la mancha
 quijote de la mancha quijote de la mancha
quijote de la mancha
 
Værkanalyse
VærkanalyseVærkanalyse
Værkanalyse
 
List of Publications for the Use of School Home Garden Teachers
List of Publications for the Use of School Home Garden TeachersList of Publications for the Use of School Home Garden Teachers
List of Publications for the Use of School Home Garden Teachers
 
The School Garden in the Course of Study
The School Garden in the Course of StudyThe School Garden in the Course of Study
The School Garden in the Course of Study
 
2014 Winter Olympics New Events
2014 Winter Olympics New Events2014 Winter Olympics New Events
2014 Winter Olympics New Events
 
How to Make the School Garden Soil More Productive
How to Make the School Garden Soil More ProductiveHow to Make the School Garden Soil More Productive
How to Make the School Garden Soil More Productive
 
Question 1
Question 1Question 1
Question 1
 
School Garden Manual for the West
School Garden Manual for the WestSchool Garden Manual for the West
School Garden Manual for the West
 
SEO best practice in 2014 with ActiveStandards
SEO best practice in 2014 with ActiveStandardsSEO best practice in 2014 with ActiveStandards
SEO best practice in 2014 with ActiveStandards
 
Planting the School Garden
Planting the School GardenPlanting the School Garden
Planting the School Garden
 
Mapas mentales dllo humano
Mapas mentales dllo humanoMapas mentales dllo humano
Mapas mentales dllo humano
 
Horario tads 2014 1
Horario tads 2014 1Horario tads 2014 1
Horario tads 2014 1
 
Better E-commerce
Better E-commerceBetter E-commerce
Better E-commerce
 
Otel şi̇rket felsefe ve yapilanma
Otel şi̇rket felsefe ve yapilanmaOtel şi̇rket felsefe ve yapilanma
Otel şi̇rket felsefe ve yapilanma
 
Part Played by the Leaf in the Production of School Garden Crop
Part Played by the Leaf in the Production of School Garden CropPart Played by the Leaf in the Production of School Garden Crop
Part Played by the Leaf in the Production of School Garden Crop
 
Engr de 6_unitplan_flip_flopcircuits
Engr de 6_unitplan_flip_flopcircuitsEngr de 6_unitplan_flip_flopcircuits
Engr de 6_unitplan_flip_flopcircuits
 
WRD110
WRD110WRD110
WRD110
 

Similar to Segmentation

Text Analytics in Enterprise Search
Text Analytics in Enterprise SearchText Analytics in Enterprise Search
Text Analytics in Enterprise Search
Findwise
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
Gabriel Moreira
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
Scott Abel
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
Rinke Hoekstra
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
Cambridge Semantics
 
Final presentation
Final presentationFinal presentation
Final presentation
Nitish Upreti
 
qualitative.ppt
qualitative.pptqualitative.ppt
qualitative.ppt
CityComputers3
 
Using Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchUsing Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative Research
JoshuaApolonio1
 
Content Strategy Workflow
Content Strategy WorkflowContent Strategy Workflow
Content Strategy Workflow
quidsupport
 
intro.ppt
intro.pptintro.ppt
intro.ppt
UbaidURRahman78
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful in
Kumari Naveen
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.ppt
HaHa501620
 
Presentation_Doceng.pptx
Presentation_Doceng.pptxPresentation_Doceng.pptx
Presentation_Doceng.pptx
XINWEI50
 
Qda ces 2013 toronto workshop
Qda ces 2013 toronto workshopQda ces 2013 toronto workshop
Qda ces 2013 toronto workshop
CesToronto
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
El Habib NFAOUI
 
Overview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceOverview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial Intelligence
Enterprise Knowledge
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
Infrrd
 
A Simple method to Create Content using NLP
A Simple method to Create Content using NLP A Simple method to Create Content using NLP
A Simple method to Create Content using NLP
Sante J. Achille
 
Social recommender system
Social recommender systemSocial recommender system
Social recommender system
Kapil Kumar
 

Similar to Segmentation (20)

Text Analytics in Enterprise Search
Text Analytics in Enterprise SearchText Analytics in Enterprise Search
Text Analytics in Enterprise Search
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Final presentation
Final presentationFinal presentation
Final presentation
 
qualitative.ppt
qualitative.pptqualitative.ppt
qualitative.ppt
 
Using Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchUsing Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative Research
 
Content Strategy Workflow
Content Strategy WorkflowContent Strategy Workflow
Content Strategy Workflow
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
NLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful inNLP Tasks and Applications.ppt useful in
NLP Tasks and Applications.ppt useful in
 
lect36-tasks.ppt
lect36-tasks.pptlect36-tasks.ppt
lect36-tasks.ppt
 
Presentation_Doceng.pptx
Presentation_Doceng.pptxPresentation_Doceng.pptx
Presentation_Doceng.pptx
 
Qda ces 2013 toronto workshop
Qda ces 2013 toronto workshopQda ces 2013 toronto workshop
Qda ces 2013 toronto workshop
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Overview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceOverview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial Intelligence
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
 
A Simple method to Create Content using NLP
A Simple method to Create Content using NLP A Simple method to Create Content using NLP
A Simple method to Create Content using NLP
 
Social recommender system
Social recommender systemSocial recommender system
Social recommender system
 

Recently uploaded

SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 

Recently uploaded (20)

SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 

Segmentation

  • 1. “Segmentation” as the Workhorse of Business Analytics
  • 2. Overview We research hierarchy of topics extracted from documents (news, publications, discussions etc.). Our system is targeted at data researchers. It provides:  Trend tracking  Similar and related topics detection  Topic segmentation, which aims to solve information overload(http://mlvl.github.io/Hierarchie/) problem The topic model we use is not a collection of tags but is the combination of NLP + statistical analysis.
  • 3. Possible applications  creating concept infographics(http://findtheconversation.com/concept-map/)  estimating concept influence(http://brightpointinc.com/political_influence/)  detecting semantic relations(http://bl.ocks.org/mbostock/1153292)  nested segments visualization (http://bl.ocks.org/mbostock/7607535)- concepts hierarchy
  • 5. Test prototype We developed a prototype called Data Mining Tool (DMT) for testing analytics model. As test data, we use tech and political news (about 2k + 1k RSS Feeds delivering 10k news daily).
  • 6. DMT workflow 1. Import Documents to Index 2. Extract meta-data for each Document (NLP: keywords labels, terms etc.) 3. Extract Chains using Cluster Analysis 4. Assign Weights to Topics 5. Build Trends using ranking by Current Weight and Weight Dynamics 6. Build Segments (related topics, nested topics) 7. Visualize Data (Trends Statistics, Segments Hierarchy) 8. Explore Data (Flexible Search UI: Trends, Documents, Segments, Keywords etc.) 9. Use API to communicate with the system
  • 8. Test documents About 370k news were imported in Sept-Nov 2014. Document & Terms distribution statistics
  • 9. NLP analysis (meta-data extraction) Sentence NLP in index Current NLP Metadata
  • 10. Clustering analysis Assign weight By Summary Creating topics tree By Terms By Labels
  • 12. Trends Weight of trends Segments and related topics Trends Weight Dynamics
  • 14. Visualization of Related and Nested Topics zoom in
  • 15. Visualization. Related Topics Main Topic Related Topic China zoom in
  • 17. Visualization of topics for “Japan” keyword United States electionJapan
  • 18. Hierarchies Topic Tree. Graph 42 Topics for “Japan” keyword JAPAN 20 Topics for “Japan” keyword
  • 19. Grouping similar Topics (all topics) 790 topics 526 topics
  • 20. Document search Search by metadata Keep track of the analyzed articles
  • 21. Glossary Term – sequence of characters for training NLP application (represents Named Entity). Trend - unique keywords chain with weight. Topic – abstract ‘cluster’ of relations between particular keywords that occur in Trend. Segment - group of similar Trends, intersected by search results. Segmentation – relations between topics from different segments, based on subtopic dynamics. Represents 'new knowledge'. Thread - sequence of keywords extracted from given sentence. Label - an attribute of Term that defines its properties.
  • 22. Technologies PHP, CakePHP Framework Python, Frameworks: NLTK, Django, Django-Rest- Framework Java, Jersey Framework, Stanford CoreNlp Elasticsearch, MySQL DB
  • 23. Team We are a team with more than 3 years experience of Data Mining research and projects. We are interested in making sense of big data and experimenting with Machine Learning Techniques. We build Semantic Networks and NLP projects based on open-source projects as well as our own. Oleksandr Shamrai - PHP software engineer, responsive for core algorithms implementation and performance, team development tools and rules Pavel Yakovlev - Business analyst and QA, has passion for data mining: cluster analysis and recommendation solutions Max Leonov - Python software engineer, responsible for NLP (Natural Languages Processing) applications modeling, development, testing and deployment process