The document discusses business event recognition from online news articles. It describes extracting and labeling data on acquisition, vendor-supplier, and job events. Features are engineered from the text, which is then converted to vectors for machine learning. Semi-supervised techniques like Naive Bayes with Expectation Maximization are applied and achieve up to 77% accuracy on vendor-supplier data recognition.
Example data specifications and info requirements framework OVERVIEWAlan D. Duncan
Ā
This example framework offers a set of outline principles, standards and guidelines to describe and clarify the semantic meaning of data terms in support of an Information Requirements Management process.
It provides template guidance to Information Management, Data Governance and Business Intelligence practitioners for such circumstances that need clear, unambiguous and reliable understanding of the context, semantic meaning and intended usages for data.
Data Quality Challenges & Solution Approaches in Yahoo!ās Massive DataDATAVERSITY
Ā
Data is Yahoo!'s most strategic assets - from user engagement and insights data to revenue and billing data. Three years ago, Yahoo! invested in a Data Quality program.
By applying industry principles and techniques the Data Quality program has provided proactive and reactive system solutions to Audience data issues and root causes by addressing technical challenges of data quality at scale and engaging and leveraging the rest of the organization in the solution: from product teams all through the data stack (data sourcing, ETL, aggs and analytics) to analysts and sciences teams who consume the data. This methodology is now being scaled to the all data across Yahoo! including Search and Display Advertising.
In this talk, we will discuss our approach to bring large scale deep analytics to the masses. R is an extremely popular numerical computer environment, but scientific data processing frequently hits its memory limits. On the other hand, system to execute data intensive tasks like Hadoop or Stratosphere are not popular among R users because writing programs using these paradigms is cumbersome. We present an innovative approach to overcome these limitations using the Stratosphere/Apache Flink big data platform by means of a R package and ready-to-use distributed algorithm.
This solution allows the user, with small modifications in the R code, to easily execute distributed scenarios using popular machine learning techniques. We will cover the implementation details of the proposed solution including the architecture of the system, the functionality implemented and working examples.
In addition, we will cover what are the differences between our approach and other solutions that integrate R with Hadoop or other large-scale analytics systems.
Finally, the results of the performance tests show that this solution is competitive with the already existing R implementations for small amounts of data and able to scale-up to gigabyte level.
Example data specifications and info requirements framework OVERVIEWAlan D. Duncan
Ā
This example framework offers a set of outline principles, standards and guidelines to describe and clarify the semantic meaning of data terms in support of an Information Requirements Management process.
It provides template guidance to Information Management, Data Governance and Business Intelligence practitioners for such circumstances that need clear, unambiguous and reliable understanding of the context, semantic meaning and intended usages for data.
Data Quality Challenges & Solution Approaches in Yahoo!ās Massive DataDATAVERSITY
Ā
Data is Yahoo!'s most strategic assets - from user engagement and insights data to revenue and billing data. Three years ago, Yahoo! invested in a Data Quality program.
By applying industry principles and techniques the Data Quality program has provided proactive and reactive system solutions to Audience data issues and root causes by addressing technical challenges of data quality at scale and engaging and leveraging the rest of the organization in the solution: from product teams all through the data stack (data sourcing, ETL, aggs and analytics) to analysts and sciences teams who consume the data. This methodology is now being scaled to the all data across Yahoo! including Search and Display Advertising.
In this talk, we will discuss our approach to bring large scale deep analytics to the masses. R is an extremely popular numerical computer environment, but scientific data processing frequently hits its memory limits. On the other hand, system to execute data intensive tasks like Hadoop or Stratosphere are not popular among R users because writing programs using these paradigms is cumbersome. We present an innovative approach to overcome these limitations using the Stratosphere/Apache Flink big data platform by means of a R package and ready-to-use distributed algorithm.
This solution allows the user, with small modifications in the R code, to easily execute distributed scenarios using popular machine learning techniques. We will cover the implementation details of the proposed solution including the architecture of the system, the functionality implemented and working examples.
In addition, we will cover what are the differences between our approach and other solutions that integrate R with Hadoop or other large-scale analytics systems.
Finally, the results of the performance tests show that this solution is competitive with the already existing R implementations for small amounts of data and able to scale-up to gigabyte level.
This is the half page article I wrote to El Comercio, Peru's best newspaper, coverging the increasing focus of Portuguese Companies and Government in Peru.
acasimiro@rtm.com.pe alexandrecasimiro@hotmail.com
acasimiro@rtm.com.pe alexandrecasimiro@hotmail.com
Improvement of Traffic Monitoring System by Density and Flow Control For Indi...IJSRD
Ā
The growth and scale of vehicles today makes management of traffic a constant problem. The existing traffic control system works based on a timing mechanism, meaning an equal time slot is provided for each junction. This is inefficient for non-uniform flow of vehicles. Hence there is a need for a system which is adaptive in nature. Routes should have an option of being granted more time slots depending on the requirements for the given route. This paper proposes a traffic congestion control system which would be adaptive in nature and provide time slot to each route based on traffic density.
Ecosystem is a defined place in which interactions take place between a community, with all its complex interrelationships and the physical environment.
Business analytics and its basic concepts
The presentation can help you to understand the basic concepts of business analytics, process of analytics, scope and nature of analytics, types of analytics and advantages of analytics.
This is the half page article I wrote to El Comercio, Peru's best newspaper, coverging the increasing focus of Portuguese Companies and Government in Peru.
acasimiro@rtm.com.pe alexandrecasimiro@hotmail.com
acasimiro@rtm.com.pe alexandrecasimiro@hotmail.com
Improvement of Traffic Monitoring System by Density and Flow Control For Indi...IJSRD
Ā
The growth and scale of vehicles today makes management of traffic a constant problem. The existing traffic control system works based on a timing mechanism, meaning an equal time slot is provided for each junction. This is inefficient for non-uniform flow of vehicles. Hence there is a need for a system which is adaptive in nature. Routes should have an option of being granted more time slots depending on the requirements for the given route. This paper proposes a traffic congestion control system which would be adaptive in nature and provide time slot to each route based on traffic density.
Ecosystem is a defined place in which interactions take place between a community, with all its complex interrelationships and the physical environment.
Business analytics and its basic concepts
The presentation can help you to understand the basic concepts of business analytics, process of analytics, scope and nature of analytics, types of analytics and advantages of analytics.
Transforming Data into Insights, Decisions, and Actions ąøØąø²ąøŖąøąø£ą¹ąøąøąøąøąø²ąø£ą¹ąøą¹ąøąø±ąø§ą¹ąø„ąøą¹ąø„ąø°ąøą¹ąøąø”ąø¹ąø„ ą¹ąø Business Aspect ą¹ąøąø·ą¹ąøąøąø±ąøą¹ąøąø„ąø·ą¹ąøąøąøąøąøą¹ąøąø£ą¹ąø„ąø°ąøąø„ąø¢ąøøąøąøą¹ąøąø²ąøąøąø²ąø£ąøąø„ąø²ąø with Case Studies
Assessment 2:
Description/Focus
Essay
Value
50%
Due Date
Midnight Sunday 2 (Week 12)
Length
2500 words
Task: Human services practitioners work across many domains of practice including direct work with individuals, groups and communities.
1. Critically examine the policy or policies that you consider impact upon a client group and suggest ways that policy could be changed to improve the life outcomes for those with whom you are working.
2. Develop a framework that you would adopt for influencing policy change that aligns with your professional values, standards and ethics.
Presentation: The document will be typed in a word document, 12 pt. Font, 1Ā½ or Double spacing
Assessment criteria:
Ā· Critical analysis of social policy
Ā· Application of theory to practice
Ā· Adherence to academic conventions of writing
(eg referencing; writing style)
Ā· At least 8 references. Format APA 6th referencing.
Running head: NETWORK AND WORKFLOW FOR A DATA ANALYTICS COMPANY 1
NETWORK AND WORKFLOW FOR A DATA ANALYTICS COMPANY 2
Network and Workflow for a Data Analytics Company on Ssports
Student Name Nezar Al Massad
Institution Name Dr. Mark O'Connell
Network and Workflow for a Ddata Analytics Company on Ssports.
A companyās network and workflow play a major roles in its performance and growth. Different companies consist of rely on different networks and workflows depending on the services/tasks they are providing and the number of workers and members of staff. A network tends to connect workers and members of staff at different levels of the company. This network tends to create a good and effective workflow within the company, hence a company network and workflow go hand in hand. When creating a network and a workflow of a company, the workers and members of staff working duration must be considered in order to achieve a company objective (Moretti, 2017).Also, the mode of employment which may be permanent or temporary/laying down of workers within a short period of time, to a large extent determines a companyās network and workflow. The change of an organizational requirement due to growth and expansion creates a need for a company to adapt a new network and workflow. A network in company plays a vital role of guiding how the company should run its operations. Comment by Mark O'Connell: Duration?? Comment by Mark O'Connell: What? Laying down?? Comment by Mark O'Connell: OK so stop educating us about the factors that determine a companyās network and tell us about YOUR network Comment by Mark O'Connell: Too obvious
My company in the world requires data analysts for to perform analysisdata analysis allowing them to and make important strategic decisions and identify opportunities in the market, and therefore data analysts are becoming very important vital to our company. Despite this, there are many companies coming u.
Discover how to boost your reporting and navigate Sage 300 faster and easier in this presentation. You can watch the full recording here: http://bit.ly/2qf1awF
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
Ā
mohan-sc13m055
1. Business Event Recognition
From Online News Articles
Machine Learning Graduate Program
Mohan Kashyap.P
SC13M055
Supervisor: Dr. Sumitra.S
Department of Mathematics
IIST
Mentor: Mahesh CR
CEO
Tataatsu Idealabs
May 18, 2015
2. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
1
Acknowledgement
TATAATSU IDEALABS for allowing me to carry out my thesis
work.
Business Event Recognition From Online News Articles Mohan Kashyap P
3. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
2
Tataatsu Idealabs
ā¢ An organization which works on two main products
Collablayer and Disquery
ā¢ Disquery is NLP analytics engine that extracts semantic
signals and identiļ¬es pattern from unstructured text. Quicker
insight of data helps to make better decisions.
ā¢ Busniess event recognition falls under the category of
Disquery.
Business Event Recognition From Online News Articles Mohan Kashyap P
4. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
3
Illustrative Example
Business Event Recognition From Online News Articles Mohan Kashyap P
5. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
4
Outline
1 Basic Overview
Introduction
2 Data Extraction,Data Pre-processing and Feature Engineering
Data Extraction
Text To Numeric Conversion
3 Algorithms and Results
Semi-Supervised Techniques
Machine Learning Approach
Unsupervised Feature Vector Learning Approach
4 Conclusion
Challenges Encountered
Future Work
References
Appendix
Business Event Recognition From Online News Articles Mohan Kashyap P
6. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Introduction 5
Introduction
ā¢ The project work deals with the identiļ¬cation of business
events.
ā¢ The process starts from crawling of data.
ā¢ Followed by labeling of the extracted data.
ā¢ Further on, application of data-preprocessing and feature
engineering techniques.
ā¢ Later doing evaluation by machine learning approaches.
Business Event Recognition From Online News Articles Mohan Kashyap P
7. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Introduction 6
Objective
ā¢ Given an online article or content of interest from the end
user.
ā¢ The developed automated model must predict whether the
given document contains a business event or not.
ā¢ Business events in our scenario are restricted to merger and
acquistion, vendor-supplier and job-event.
ā¢ If model predicts as business event then it has to give out
additional information.
ā¢ Additional information is the āentitiesā i.e. organizations and
persons involved.
Business Event Recognition From Online News Articles Mohan Kashyap P
8. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Introduction 7
Motivation
ā¢ Major business events happen everyday around the globe.
ā¢ An organization as a competitor will be interested to
understand the analytics of the another organization
ā¢ To develop better business strategies.
ā¢ Enhance decision making which leads to development and
growth of the organization.
Business Event Recognition From Online News Articles Mohan Kashyap P
9. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Introduction 8
Related Works
ā¢ The paper close to our work is Recognition of Named-Event
Passages in News Articles[1].
ā¢ This paper describes about the method for ļ¬nding named
events:
ā¢ In violent behaviour and business domains.
ā¢ In business domain it describes about:
ā¢ Management changes, mergers and acquisitions, strikes, legal
troubles and bankruptcy.
Business Event Recognition From Online News Articles Mohan Kashyap P
10. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Data Extraction 9
Extraction And Labeling Of The Data
ā¢ Crawlers were written to extract the business event data.
ā¢ Using NLP the gathered data was split into sentences using
the sentence tokenizer.
ā¢ Three types of classes labeled were acquistion, vendor supplier
and job.
Business Event Recognition From Online News Articles Mohan Kashyap P
11. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Data Extraction 10
Data Description
Vendor-Supplier Event Data
Title : Tri-State signs agreement with NextEra Energy Resources for new wind
facility in eastern Colorado; WESTMINSTER, Colo., Feb. 5, 2014
/PRNewswire/ ā Tri-State Generation and Transmission Association, Inc.
announced that it has entered into a 25-year agreement with a subsidiary of
NextEra Energy Resources, LLC for a 150 megawatt wind power generating
facility to be constructed in eastern Colorado,in the service territory of
Tri-State member cooperative K. C. Electric Association (Hugo, Colo.)
Acquistion Event Data
Sun Pharmaceutical Industries announced on Monday that it would acquire
troubled rival Ranbaxy Laboratories in a USD 4-billion deal that includes USD
800 million debt.
Job Event Data
Bank of America Merrill Lynch has hired Tristan Cheesman as head of
European ABS syndicate according to a source.
Business Event Recognition From Online News Articles Mohan Kashyap P
12. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Data Extraction 11
Data Pre-processing And Feature Engineering
ā¢ Cleansing of the tagged sentences by removing of stopwords
and special symbols.
ā¢ Building of hand crafted features by observing the data
pattern.
ā¢ Type1 features - Captures the semantics and pattern in the
data[2].
ā¢ Type2 features - Entity type features.
ā¢ Type3 features - Rhetorical features.
Business Event Recognition From Online News Articles Mohan Kashyap P
13. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Data Extraction 12
Type1 Features
ā¢ Nouns and Noun phrases
example: Title agreement Next Era Energy wind facility
eastern Colorado WESTMINSTER Colo. Feb. Generation
Transmission Association Inc.
ā¢ Capital words
example: WESTMINSTER LLC K. C.
ā¢ Pattern of POS tags adjective-noun, adjective-adjective-noun
format
example: new wind 25-year agreement Tri-State member
Business Event Recognition From Online News Articles Mohan Kashyap P
14. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Data Extraction 13
Type2 Features
ā¢ Organization names
example : K. C. Electric Association NextEra Energy Resource
ā¢ Organization references
example : k. c. electric association nextera energy resources
ā¢ Location
example : WESTMINSTER Colo. Colorado
ā¢ Person
example : Jack stone
Business Event Recognition From Online News Articles Mohan Kashyap P
15. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Text To Numeric Conversion 14
Bag Of Words Approach
ā¢ Data obtained from pre-processing and feature engineering
has to be converted into vectors.
ā¢ Bag of words is a method used to convert word to vectors[9].
ā¢ The two vectorizers used under this method are count
vectorizes and tf-idf vectorizers.
ā¢ Count-vectorizers: use counts of the words to convert them
into vectors.
ā¢ TF-IDF vectorizers: converts word into vectors based on
importance of each word in the sentence.
Business Event Recognition From Online News Articles Mohan Kashyap P
16. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Text To Numeric Conversion 15
Illustration Of Count And Tf-Idf Vectorizers
ā¢ Count vectorizer illustration : document=[[John likes to
watch movies. Mary likes movies too] ; [John also likes to
watch football games.]]
sentence1 : [0,0,0,1,2,1,2,1,1,1]
sentence2 : [1,1,1,1,1,0,0,1,0,1]
ā¢ Tf-idf vectorizer illustration : TF(movies,sentence1)
=1 + log(2)= 1.3010
IDF(movies,document)= log(2
1) = 0.3010
TF-IDF = TF(movies, sentence1) Ć IDF(movies, document)
= 1.3010 Ć 0.3010 = 0.3916
Business Event Recognition From Online News Articles Mohan Kashyap P
17. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Text To Numeric Conversion 16
Word-Embedding
ā¢ In this method each word is represented by a 100 to 300
dimensional vector[8].
ā¢ The representation is word with vector is of two types.
ā¢ uniformly distributed variable U[-1,1].
ā¢ pre-trained word vectors.
Business Event Recognition From Online News Articles Mohan Kashyap P
18. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Semi-Supervised Techniques 17
Naive Bayes With Expectation Maximization[3]
ā¢ Train naive Bayes classiļ¬er using the labeled data.
ā¢ Predict the probablistic labels.
ā¢ Retrain the classiļ¬er using this probablistic labels.
ā¢ Repeat this process until convergence.
Business Event Recognition From Online News Articles Mohan Kashyap P
19. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Semi-Supervised Techniques 18
Results For Vendor-Supplier Event Dataset
Table : Variation in accuracies and F-scores in semi-supervised learning
using naive Bayes for Vendor-supplier data
Semi-supervised learning using naive Bayes for vendor-supplier dataset
Training data
points in percent-
age
Accuracy F-scores Description on dataset
30 0.5597 0.5915 Testing data=527,training
data=227
40 0.7434 0.65 Testing data=454,training
data=300
50 0.7765 0.674 Testing data=376,training
data=376
Business Event Recognition From Online News Articles Mohan Kashyap P
20. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Semi-Supervised Techniques 19
Results For Job Event Dataset
Table : Variation in accuracies and F-scores in semi-supervised learning
using naive Bayes for Job event data
Semi-supervised learning using naive Bayes for Job dataset
Training data
points in percent-
age
Accuracy F-scores Description on data
30 0.7483 0.4444 Testing data=1967,training
data=842
40 0.7544 0.4863 Testing data=1686,training
data=1123
50 0.8014 0.52 Testing data=1405,training
data=1404
Business Event Recognition From Online News Articles Mohan Kashyap P
21. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Semi-Supervised Techniques 20
Results For Acquistion Event Data Set
Table : Variation in accuracies and F-scores in semi-supervised learning
using naive Bayes for Acquisition event data
Semi-supervised learning using naive Bayes for Acquisition dataset
Training data points in percent-
age
Accuracy F-scores Description on data
30 0.7929 0.8178 Testing data=966,training
data=413
40 0.7989 0.82 Testing data=828,training
data=521
50 0.8057 0.8241 Testing data=689,training
data=690
Business Event Recognition From Online News Articles Mohan Kashyap P
22. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Semi-Supervised Techniques 21
Active Learning
ā¢ Active learning was implemented using query by committee
approach[10].
ā¢ The classiļ¬ers used in the committee were ada boost classiļ¬er,
random forest classiļ¬er and gradient boosting classiļ¬er.
ā¢ This method performed better compared to semi-supervised
navie Bayes classiļ¬er.
Business Event Recognition From Online News Articles Mohan Kashyap P
23. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Semi-Supervised Techniques 22
Results For Vendor-Supplier Event Dataset
Table : Variation in accuracies and F-scores using Active Learning for
Vendor-supplier event data
Active Learning using QBC approach
Training data points in percent-
age
Accuracy F-scores Description on data
30 0.842 0.7348 Testing data=529,training
data=225
40 0.84 0.7352 Testing data=454,training
data=300
50 0.8643 0.76 Testing data=376,training
data=376
Business Event Recognition From Online News Articles Mohan Kashyap P
24. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Semi-Supervised Techniques 23
Results For Job Event Dataset
Table : Variation in accuracies and F-scores using Active Learning for Job
event data
Active Learning using QBC approach
Training data points in percent-
age
Accuracy F-scores Description on data
30 0.9054 0.6204 Testing data=1967,training
data=842
40 0.9116 0.6558 Testing data=1686,training
data=1123
50 0.9216 0.6758 Testing data=1405,training
data=1404
Business Event Recognition From Online News Articles Mohan Kashyap P
25. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Semi-Supervised Techniques 24
Results For Acquistion Event Dataset
Table : Variation in accuracies and F-scores using Active Learning for
Acquisition event data
Active Learning using QBC approach
Training data points in percent-
age
Accuracy F-scores Description on data
30 0.7855 0.7549 Testing data=966,training
data=413
40 0.812 0.7867 Testing data=828,training
data=521
50 0.82 0.7995 Testing data=689,training
data=690
Business Event Recognition From Online News Articles Mohan Kashyap P
26. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Machine Learning Approach 25
Ensemble Classifers With Bag Of Words For Business
Event Classiļ¬cation
ā¢ The classiļ¬ers used were ada boosting classiļ¬er[6], random
forest classifer[7] and gradient boosting classiļ¬er[5].
ā¢ The ļ¬nal prediction was performed by voting of these three
classiļ¬ers.
ā¢ The base-learner used was decision-trees.
ā¢ The number of base-learners used were 500.
Business Event Recognition From Online News Articles Mohan Kashyap P
27. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Machine Learning Approach 26
Results For Vendor-Supplier Data Set With Parameter As
500
Test score for Vendor-supplier using voting of three ensemble classiļ¬ers with number of estimators as 500
Area under ROC Accuracy F-scores Confusion matrix values
88% 91.97% 85.211% truepositives=196,falsepositives=16,
truenegatives=583,falsenegatives=52
Business Event Recognition From Online News Articles Mohan Kashyap P
28. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Machine Learning Approach 27
Results For Job Data Set With Parameter As 500
Test score for Job using voting of three ensemble classiļ¬ers with number of estimators as 500
Area under ROC Accuracy F-scores Confusion matrix values
87.56% 92.3% 83.88% truepositives=149,falsepositives=16,
truenegatives=486,falsenegatives=36
Business Event Recognition From Online News Articles Mohan Kashyap P
29. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Machine Learning Approach 28
Results For Acquistion Data Set With Parameter As 500
Test score for Acquisition using voting of three ensemble classiļ¬ers with number of estimators as 500
Area under ROC Accuracy F-scores Confusion matrix values
92% 94.21% 91.10% truepositives=245,falsepositives=8,
truenegatives=591,falsenegatives=34
Business Event Recognition From Online News Articles Mohan Kashyap P
30. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Machine Learning Approach 29
Peformance Measure Analysis For Vendor-Supplier On The
Whole-Dataset
Average accuracy and F1-score
Classiļ¬er Accuracy F-scores
Gradient-Boost 0.9063 0.8277
Ada-boost 0.8968 0.8154
Random forest 0.9057 0.8254
Business Event Recognition From Online News Articles Mohan Kashyap P
31. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Machine Learning Approach 30
Performance Measure Analysis For Acquistion On The
Whole-Dataset
Average accuracy and F1-score
Classiļ¬er Accuracy F-scores
Gradient-Boost 0.9338 0.8883
Ada-boost 0.9398 0.9021
Random forest 0.94054 0.90602
Business Event Recognition From Online News Articles Mohan Kashyap P
32. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Machine Learning Approach 31
Peformance Measure Analysis For Job On The
Whole-Dataset
Average accuracy and F1-score
Classiļ¬er Accuracy F-scores
Gradient-Boost 0.90962 0.81014
Ada-boost 0.9006 0.8088
Random forest 0.90236 0.80322
Business Event Recognition From Online News Articles Mohan Kashyap P
33. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Unsupervised Feature Vector Learning Approach 32
Multilayer Feed Forward Network With Word Embedding
ā¢ Each word was intialized with U[-1,1] variate of 100 dimesion.
ā¢ For each of the sentences a word-embedding matrix was
developed.
ā¢ Window approach with max-pooling was applied on this
matrix to convert it into sentence vector.
ā¢ The sentence vector was fed into MFN for classiļ¬cation.
ā¢ The performance of this method was satisfactory.
Table : Variation in test score for MFN with word embedding
Test score for MFN with word embedding on vendor-supplier dataset
Accuracy F-score Confusion matrix values
0.65 0.39 Truenegatives=140, Truepositive =
13,falsepositives = 3, falsenegatives
= 69
Business Event Recognition From Online News Articles Mohan Kashyap P
34. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Unsupervised Feature Vector Learning Approach 33
CNN Used For Sentence Modeling With Word-Embedding
Approach[4]
Figure : The Image describes the architecture for Convolutional Neural
Network with Sentence Modeling.
Business Event Recognition From Online News Articles Mohan Kashyap P
35. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Unsupervised Feature Vector Learning Approach 34
Experimental Setup Up For CNN Sentence Modeling
ā¢ Shape of the input matrix for vendor-supplier was 2515Ć300
ā¢ For job event it was 1192Ć300 and for acquistion 580Ć300.
ā¢ The ļ¬lter shapes used to extract features were 3Ć300, 4Ć300
and 5Ć300.
ā¢ The dimension of the hidden units was 100Ć2 dimension.
ā¢ The activation function used was RELU.
Business Event Recognition From Online News Articles Mohan Kashyap P
36. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Unsupervised Feature Vector Learning Approach 35
Results For CNN
ā¢ For the vendor supplier data overall average accuracy for
CNN-rand was 0.9081 and for CNN-word2vec was 0.9167.
ā¢ For the Acquistion data overall average accuracy for
CNN-rand was 0.9359 and for CNN-word2vec was 0.9657.
ā¢ For the Job event data overall average accuracy for CNN-rand
was 0.8046 and for CNN-word2vec was 0.8108.
Business Event Recognition From Online News Articles Mohan Kashyap P
37. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Challenges Encountered 36
Challenges Encountered
ā¢ Uncertainty in data extraction.
ā¢ Business event datasets were unstructured.
ā¢ Bag of words vectorizers fail to capture the exact meaning of
the word.
ā¢ Application of active learning methods was time consuming.
Business Event Recognition From Online News Articles Mohan Kashyap P
38. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Challenges Encountered 37
Summary
ā¢ An automated model for recognizing business events in
respective business domains was developed.
ā¢ Tf-idf vectorizers performed better compared to the
count-vectorizers.
ā¢ All the three ensemble classiļ¬ers showed good performance.
ā¢ CNN-word2vec models performed better compared to the
CNN-rand models.
Business Event Recognition From Online News Articles Mohan Kashyap P
39. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Challenges Encountered 38
Summary
ā¢ In the acquisition dataset CNN models perform better
compared to the ensemble classiļ¬ers.
ā¢ In vendor-supplier dataset CNN models perform slightly better
compared to the ensemble classiļ¬ers.
ā¢ In job event dataset ensemble classiļ¬ers perform better
compared to the CNN models.
Business Event Recognition From Online News Articles Mohan Kashyap P
40. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Future Work 39
Future work
ā¢ The problem of co-reference resolution.
ā¢ Application of HMM.
ā¢ Extending business events classiļ¬cation for more number of
domains.
Business Event Recognition From Online News Articles Mohan Kashyap P
41. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
References 40
[1] Marujo, Luis, Wang Ling, Anatole Gershman, Jaime Carbonell, and JoĖao P. Neto2 David Matos.
Recognition of Named-Event Passages in News Articles. In 24th International Conference on Computational
Linguistics, pp.321-329. 2012.
[2] Marujo, Luis, Anatole Gershman, Jaime Carbonell, Robert Frederking, and JoĖao P. Neto. Supervised topical
key phrase extraction of news stories using crowdsourcing, light ļ¬ltering and co-reference normalization.In
proceedings of 8th international conference on Language Resources and Evaluvation(LREC) ,pp.156-162.
2012.
[3] Nigam, Kamal, Andrew McCallum, and Tom Mitchell. Semi-supervised text classiļ¬cation using EM.
Semi-Supervised Learning,pp 33-56. 2006.
[4] Kim, Yoon. Convolutional Neural Networks for Sentence Classiļ¬cation.Proceedings of the 2014 Conference
on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746-1751. 2014.
[5] Friedman, Jerome H.Greedy function approximation: a gradient boosting machine. Annals of statistics:pp
1189-1232. 2001.
[6] Freund, Yoav, and Robert E. Schapire. A desicion-theoretic generalization of on-line learning and an
application to boosting. In Computational learning theory, pp. 23-37. Springer Berlin Heidelberg, 1995.
[7] Breiman, Leo. Random forests. Machine learning 45, no. 1 (2001),pp. 5-32. 2001.
[8] Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeļ¬rey Dean. Eļ¬cient estimation of word representations in
vector space. arXiv preprint arXiv:1301.3781 (2013).
[9] Harris, Zellig S.Distributional structure.Word, Vol 10, 1954, pp. 146-162.
[10] Abe, N., and Mamitsuka, H. Query learning strategies using boosting and bagging. Proceedings of 15th
International Conferenec on Machine Learning (ICML-98), pp. 1-10. 1998.
Business Event Recognition From Online News Articles Mohan Kashyap P
42. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
References 41
http://127.0.0.1:5000/ Link
Business Event Recognition From Online News Articles Mohan Kashyap P
43. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Appendix 42
Ada-boost
In adaBoost we assign (non-negative) weights to points in the data set which are normalized, so that it forms a
distribution. In each iteration, we generate a training set by sampling from the data using the weights, i.e. the data
point (Xi , yi ) would be chosen with probability wi , where wi is the current weight for that data point. We
generate the training set by such repeated independent sampling. After learning the current classiļ¬er, we increase
the (relative) weights of data points that are misclassiļ¬ed by the current classiļ¬er. We generate a fresh training set
using the modiļ¬ed weights and so on. The ļ¬nal classiļ¬er is essentially a weighted majority voting by all the
classiļ¬ers. The description of the algorithm as in (Freund et al., 1995) is given below:
Input n examples: (X1, y1), ..., (Xn, yn), Xi ā H ā Rn
, yi ā [ā1, 1]
1 Initialize: wi (1) = 1
n
, āi, each data point is initialized with equal weight, so when data points are sampled
from the probability distribution the chance of getting the data point in the training set is equally likely.
2 We assume that there as M classiļ¬ers within the Ensembles.
For m=1 to M do
1 Generate a training set by sampling with wi (m).
2 Learn classiļ¬er hm using this training set.
3 let Ī¾m = n
i=1 wi (m) I[yi =hm(Xi )] where IA is the indicator function of A and is deļ¬ned as
IA = 1 if [yi = hm(Xi )]
IA = 0 if [yi = hm(Xi )]
so Ī¾m is the error computed due to the mth classiļ¬er.
4 Set Ī±m=log( 1āĪ¾m
Ī¾m
) computed hypothesis weight, such that Ī±m > 0 because of the assumption
that Ī¾ < 0.5.
5 Update the weight distribution over the training set as
wi (m + 1)= wi (m) exp(Ī±mI[yi =hm(Xi )])
Normalization of the updated weights so that wi (m + 1) is a distribution. wi (m + 1) =
wi (m+1)
i w
i
(m+1)
end for
Business Event Recognition From Online News Articles Mohan Kashyap P
44. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Appendix 43
Output is ļ¬nal vote h(X) = sgn( M
m=1 Ī±mhm(x)) is the weighted
sum of all classiļ¬ers in the ensemble.
In the adaboost algorithm M is a parameter. Due to the sampling
with weights, we can continue the procedure for arbitrary number
of iterations. Loss function used in adaboost algorithm is
exponential loss function and for a particular data point its deļ¬ned
as exp(āyi f (Xi ))
Business Event Recognition From Online News Articles Mohan Kashyap P
45. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Appendix 44
Random forest classiļ¬er
Input n examples: (X1, y1), ..., (Xn, yn) = D, Xi ā Rn, where D is
the whole dataset.
for i=1,...,B:
1 Choose a boostrap sample Di from D.
2 Construct a decision Tree Ti from the bootstrap sample Di
such that at each node, choose a random subset of m features
and only consider splitting on those features.
Finally given the testdata Xt take the majority votes for
classiļ¬cation. Here B is the number of bootstrap data sets
generated from original data set D.
Business Event Recognition From Online News Articles Mohan Kashyap P
46. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Appendix 45
Gradient boosting classiļ¬er
Boosting algorithms are set of machine learning algorithms, which builds strong classiļ¬er from set of weak
classiļ¬ers, typically decision tress. Gradient boosting is one such algorithm which builds the model in a stage-wise
fashion, and it generalizes the model by allowing optimization of an arbitrary diļ¬erentiable loss function. The
diļ¬erentiable loss function in our case is Binomial deviance loss function. The algorithm is implemented as follows
as described in (Friedman et al.,2001).
Input : training set (Xi , yi ), where i = 1....n , Xi ā H ā Rn
and yi ā [ā1, 1] diļ¬erential loss function
L(y, F(X)) which in our case is Binomial deviance loss function deļ¬ned as log(1 + exp(ā2yF(X))) and M are the
number of iterations .
1 Initialize model with a constant value:
F0(X) =arg min
Ī³
n
i=1 L(yi , Ī³).
2 For m = 1 to M:
1 Compute the pseudo-responses:
rim = ā
āL(yi ,F(Xi ))
āF(Xi ) F(X)=Fmā1(X)
for i = 1, . . . , n.
2 Fit a base learnerhm(X) to pseudo-response, train the pseudo response
using the training set {(Xi , rim)}n
i=1.
3 Compute multiplierĪ³m by solving the optimization problem:
Ī³m = arg min
Ī³
n
i=1 L yi , Fmā1(Xi ) + Ī³hm(Xi ) .
4 Update the model: Fm(X) = Fmā1(X) + Ī³mhm(X).
3 Output FM (X) = M
m=1 Ī³mhm(X)
The value of the weight Ī³m is found by an approximated newton raphson solution given as Ī³m =
Xi āhm
rim
Xi āhm|rim|(2ā|rim|)
Business Event Recognition From Online News Articles Mohan Kashyap P
47. Basic Overview Data Extraction,Data Pre-processing and Feature Engineering Algorithms and Results Conclusion
Appendix 46
CNN
let N be the number of sentences in the vocabulary and n be the number of words in the particular sentence, where
xi ā Rk
be the k-dimensional word vector corresponding to the i-th word in the sentence. A sentence of length n
(padded where necessary) is represented as
x1:n = x1 ā x2 ā ... ā xn
where ā is the concatenation operator. In general, let xi:i+j refer to the concatenation of words xi , xi+1 , . . . ,
xi+j . The weight vector w is initialized with a random uniformly distributed matrix of size RhĆk
. A convolution
operation involves a ļ¬lter weight matrix w, which is applied to a window of h words of a particular sentence to
produce a new feature. For example, a feature ci is generated from a window of words xi:i+hā1 by
ci = f (w Ā· xi:i+hā1 + b).
Here b ā R is a bias term and f is a non-linear function such as the hyperbolic tangent. This ļ¬lter is applied to
each possible window of words in the sentence [x1:h, x2:h+1, ..., xnāh+1:n] to produce a feature map.
c = [c1, c2, ..., cnāh+1]
with c ā Rnāh+1
, We then apply a max-pooling operation over the feature map and take the maximum value
cā
= max[c] as the feature corresponding to this particular ļ¬lter. The idea is to capture the most important
feature one with the highest value for each feature map. This pooling scheme naturally deals with variable sentence
lengths. We have described the process by which one feature is extracted from one ļ¬lter. The model uses multiple
ļ¬lters (with varying window sizes) to obtain multiple features. These features are also called as unsupervised
features, because they are obtained by applications of diļ¬erent ļ¬lters with variable window sizes randomly. These
features form the penultimate layer and are passed to a fully connected soft-max layer whose output is the
probability distribution over labels.
To avoid overļ¬tting of CNN models, drop-out mechanism is adopted.
Business Event Recognition From Online News Articles Mohan Kashyap P