Improving neural question generation using answer separationNAVER Engineering
Neural question generation (NQG) is the task of generating a question from a given passage with deep neural networks. Previous NQG models suffer from a problem that a significant proportion of the generated questions include words in the question target, resulting in the generation of unintended questions. In this paper, we propose answer-separated seq2seq, which better utilizes the information from both the passage and the target answer. By replacing the target answer in the original passage with a special token, our model learns to identify which interrogative word should be used. We also propose a new module termed keyword-net, which helps the model better capture the key information in the target answer and generate an appropriate question. Experimental results demonstrate that our answer separation method significantly reduces the number of improper questions which include answers. Consequently, our model significantly outperforms previous state-of-the-art NQG models.
Improving neural question generation using answer separationNAVER Engineering
Neural question generation (NQG) is the task of generating a question from a given passage with deep neural networks. Previous NQG models suffer from a problem that a significant proportion of the generated questions include words in the question target, resulting in the generation of unintended questions. In this paper, we propose answer-separated seq2seq, which better utilizes the information from both the passage and the target answer. By replacing the target answer in the original passage with a special token, our model learns to identify which interrogative word should be used. We also propose a new module termed keyword-net, which helps the model better capture the key information in the target answer and generate an appropriate question. Experimental results demonstrate that our answer separation method significantly reduces the number of improper questions which include answers. Consequently, our model significantly outperforms previous state-of-the-art NQG models.
nn_important study materoial okfjevh rjivowij50853
this is nn hwechqewioenwec ewcjoqewc ew ewc ew ewjoce cipo jwe h ewoce eoerijoc jerjew weioew ewd qewodjqe ci ew dew de ew wd fj weo wejfwe f weijeifiwj oewcjhcp wjdmwenf wjwijdqewiof jwoefjiofjqoef jwejfioqwfe wqefpjewqijewe weifj ewfiwjfpwef weqfojewoef qefqewfew jpopqwefhqwejfowq ewf weofjwioqfe wefoijwepfoih ewf w few fjwo wef wef ew fjp[ jwe ffjqew fjqe qwe[f jwefewhfp qdwpfheq0ef qwefqeiwfhq0wdchqdfv jierjfioheq erjfiojerfewf jwiewfjoqejheq qwewioiqewofjqeowfqefhdweew nfigierfgbr wefqefeferfpo j djwfjcwpc ewefjioef wejfp
Test Strategies for Conventional Software, Unit Testing, Targets for Unit Test Cases, Common Computational Errors in Execution Paths, Other Errors to Uncover, Problems to uncover in Error Handling, Drivers and Stubs for Unit Testing, Fundamentals of Software Engineering
Optimization as a model for few shot learningKaty Lee
paper presentation of "Optimization as a model for few shot learning" at ICLR 2017 by Sachin Ravi and Hugo Larochelle
highly related to "learning to learn by gradient descent by gradient descent"
This presentation nlp classifiers, the different types of models tfidf, word2vec & DL models such as feed forward NN , CNN & siamese networks. Details on important metrics such as precision, recall AUC are also given
nn_important study materoial okfjevh rjivowij50853
this is nn hwechqewioenwec ewcjoqewc ew ewc ew ewjoce cipo jwe h ewoce eoerijoc jerjew weioew ewd qewodjqe ci ew dew de ew wd fj weo wejfwe f weijeifiwj oewcjhcp wjdmwenf wjwijdqewiof jwoefjiofjqoef jwejfioqwfe wqefpjewqijewe weifj ewfiwjfpwef weqfojewoef qefqewfew jpopqwefhqwejfowq ewf weofjwioqfe wefoijwepfoih ewf w few fjwo wef wef ew fjp[ jwe ffjqew fjqe qwe[f jwefewhfp qdwpfheq0ef qwefqeiwfhq0wdchqdfv jierjfioheq erjfiojerfewf jwiewfjoqejheq qwewioiqewofjqeowfqefhdweew nfigierfgbr wefqefeferfpo j djwfjcwpc ewefjioef wejfp
Test Strategies for Conventional Software, Unit Testing, Targets for Unit Test Cases, Common Computational Errors in Execution Paths, Other Errors to Uncover, Problems to uncover in Error Handling, Drivers and Stubs for Unit Testing, Fundamentals of Software Engineering
Optimization as a model for few shot learningKaty Lee
paper presentation of "Optimization as a model for few shot learning" at ICLR 2017 by Sachin Ravi and Hugo Larochelle
highly related to "learning to learn by gradient descent by gradient descent"
This presentation nlp classifiers, the different types of models tfidf, word2vec & DL models such as feed forward NN , CNN & siamese networks. Details on important metrics such as precision, recall AUC are also given
2. Contents
• Introduction
• When Is the Neural Network Trained?
• Controlling the Training Process with Learning
Parameters
• Iterative Development Process
• Avoiding Over-training
• Automating the Process
3. Introduction (1)
• Training a neural network
– perform a specific processing function
1) 어떤 parameter?
2) how used to control the training process
3) management of the training data - training process 에 미치는 영향
?
– Development Process
• 1) Data preparation
• 2) neural network model & architecture 선택
• 3) train the neural network
– neural network 의 구조와 그 function 에 의해 결정
– Application
– “trained”
4. Introduction (2)
• Learning Parameters for Neural Network
• Disciplined approach to iterative neural network
development
6. When Is the Neural Network Trained?
• When the network is trained?
– the type of neural network
– the function performing
• classification
• clustering data
• build a model or time-series forecast
– the acceptance criteria
• meets the specified accuracy
– the connection weights are “locked”
– cannot be adjusted
7. When Is the Neural Network Trained?
Classification (1)
• Measure of success : percentage of correct
classification
– incorrect classification
– no classification : unknown, undecided
• threshold limit
8. When Is the Neural Network Trained?
Classification (2)
•confusion matrix
: possible output categories and the corresponding
percentage of correct and incorrect classifications
Category A Category B Category C
Category A 0.6 0.25 0.15
Category B 0.25 0.45 0.3
Category C 0.15 0.3 0.55
9. When Is the Neural Network Trained?
Clustering (1)
• Output a of clustering network
– open to analysis by the user
• Training regimen is determined:
– the number of times the data is presented to the neural
network
– how fast the learning rate and the neighborhood decay
• Adaptive resonance network training (ART)
– vigilance training parameter
– learn rate
10. When Is the Neural Network Trained?
Clustering (2)
• Lock the ART network weights
– disadvantage : online learning
• ART network are sensitive to the order of the
training data
11. When Is the Neural Network Trained?
Modeling (1)
• Modeling or regression problems
• Usual Error measure
– RMS(Root Square Error)
• Measure of Prediction accuracy
– average
– MSE(Mean Square Error)
– RMS(Root Square Error)
• The Expected behavior
– 초기의 RMS error 는 매우 높으나 , 점차 stable
minimum 으로 안정화된다
12. When Is the Neural Network Trained?
Modeling (2)
13. When Is the Neural Network Trained?
Modeling (3)
• 안정화되지 않는 경우
– network fall into a local minima
• the prediction error doesn’t fall
• oscillating up and down
– 해결 방법
• reset(randomize) weight and start again
• training parameter
• data representation
• model architecture
14. When Is the Neural Network
Trained?
• Forecasting Forecasting (1)
– prediction problem
– RMS(Root Square Error)
– visualize : time plot of the actual and desired network
output
• Time-series forecasting
– long-term trend
• influenced by cyclical factor etc.
– random component
• variability and uncertainty
– neural network are excellent tools for modeling
complex time-series problems
• recurrent neural network : nonlinear dynamic systems
– no self-feedback loop & no hidden neurons
15. When Is the Neural Network
Trained?
Forecasting (2)
16. Controlling the Training Process with
Learning Parameters (1)
• Learning Parameters depends on
– Type of learning algorithm
– Type of neural network
17. Controlling the Training Process with
Learning Parameters (2)
- Supervised training
Pattern
Pattern
Neural Network Prediction
Prediction
Desired
Desired
Output
Output
1) How the error is computed
2) How big a step we take when adjusting the
connection weights
18. Controlling the Training Process with
Learning Parameters (3)
- Supervised training
• Learning rate
– magnitude of the change when adjusting the connection
weights
– the current training pattern and desired output
• large rate
– giant oscillations
• small rate
– to learn the major features of the problem
• generalize to patterns
19. Controlling the Training Process with
Learning Parameters (4)
- Supervised training
• Momentum
– filter out high-frequency changes in the weight values
– oscillating around a set values 방지
– Error 가 오랫동안 영향을 미친다
• Error tolerance
– how close is close enough
– 많은 경우 0.1
– 필요성
• net input must be quite large?
20. Controlling the Training Process with
Learning Parameters (5)
-Unsupervised learning
• Parameter
– selection for the number of outputs
• granularity of the segmentation
(clustering, segmentation)
– learning parameters (architecture is set)
• neighborhood parameter : Kohonen maps
• vigilance parameter : ART
21. Controlling the Training Process with
Learning Parameters (6)
-Unsupervised learning
• Neighborhood
– the area around the winning unit, where the non-wining
units will also be modified
– roughly half the size of maximum dimension of the
output layer
– 2 methods for controlling
• square neighborhood function, linear decrease in the learning
rate
• Gaussian shaped neighborhood, exponential decay of the
learning rate
– the number of epochs parameter
– important in keeping the locality of the topographic
amps
22. Controlling the Training Process with
Learning Parameters (7)
-Unsupervised learning
• Vigilance
– control how picky the neural network is going to be
when clustering data
– discriminating when evaluating the differences between
two patterns
– close-enough
– Too-high Vigilance
• use up all of the output units
23. Iterative Development Process (1)
• Network convergence issues
– fall quickly and then stays flat / reach the global
minima
– oscillates up and down / trapped in a local minima
– 문제의 해결 방법
• some random noise
• reset the network weights and start all again
• design decision
25. Iterative Development Process (3)
• Model selection
– inappropriate neural network model for the function to
perform
– add hidden units or another layer of hidden units
– strong temporal or time element embedded
• recurrent back propagation
• radial basis function network
• Data representation
– key parameter is not scaled or coded
– key parameter is missing from the training data
– experience
26. Iterative Development Process (4)
• Model architecture
– not converge : too complex for the architecture
– some additional hidden units, good
– adding many more?
• Just, Memorize the training patterns
– Keeping the hidden layers as this as possible, get the
best results
28. Automating the Process
• Automate the selection of the appropriate number
of hidden layers and hidden units
– pruning out nodes and connections
– genetic algorithms
– opposite approach to pruning
– the use of intelligent agents