Despite the existence of data analysis tools such as R, SQL, Excel and others, it is still insufficient to cope with today's big data analysis needs.
The author proposes a CUI (Character User Interface) toolset with dozens of functions to neatly handle tabular data in TSV (Tab Separated Values) files.
It implements many basic and useful functions that have not been implemented in existing software with each function borrowing the ideas of Unix philosophy and covering the most frequent pre-analysis tasks during the initial exploratory stage of data analysis projects.
Also, it greatly speeds up basic analysis tasks, such as drawing cross tables, Venn diagrams, etc., while existing software inevitably requires rather complicated programming and debugging processes for even these basic tasks.
Here, tabular data mainly means TSV (Tab-Separated Values) files as well as other CSV (Comma Separated Value)-type files which are all widely used for storing data and suitable for data analysis.
Despite the existence of data analysis tools such as R, SQL, Excel and others, it is still insufficient to cope with today's big data analysis needs.
The author proposes a CUI (Character User Interface) toolset with dozens of functions to neatly handle tabular data in TSV (Tab Separated Values) files.
It implements many basic and useful functions that have not been implemented in existing software with each function borrowing the ideas of Unix philosophy and covering the most frequent pre-analysis tasks during the initial exploratory stage of data analysis projects.
Also, it greatly speeds up basic analysis tasks, such as drawing cross tables, Venn diagrams, etc., while existing software inevitably requires rather complicated programming and debugging processes for even these basic tasks.
Here, tabular data mainly means TSV (Tab-Separated Values) files as well as other CSV (Comma Separated Value)-type files which are all widely used for storing data and suitable for data analysis.
27. val acc 10.720%
| epoch 25 | iter 1 / 351 | time 0[s] | loss 0.73
| epoch 25 | iter 21 / 351 | time 0[s] | loss 0.75
| epoch 25 | iter 41 / 351 | time 1[s] | loss 0.80
| epoch 25 | iter 61 / 351 | time 2[s] | loss 0.78
| epoch 25 | iter 81 / 351 | time 2[s] | loss 0.78
| epoch 25 | iter 101 / 351 | time 3[s] | loss 0.78
| epoch 25 | iter 121 / 351 | time 3[s] | loss 0.78
| epoch 25 | iter 141 / 351 | time 4[s] | loss 0.79
| epoch 25 | iter 161 / 351 | time 5[s] | loss 0.76
| epoch 25 | iter 181 / 351 | time 5[s] | loss 0.76
| epoch 25 | iter 201 / 351 | time 6[s] | loss 0.77
| epoch 25 | iter 221 / 351 | time 7[s] | loss 0.79
| epoch 25 | iter 241 / 351 | time 8[s] | loss 0.81
| epoch 25 | iter 261 / 351 | time 8[s] | loss 0.79
| epoch 25 | iter 281 / 351 | time 9[s] | loss 0.80
| epoch 25 | iter 301 / 351 | time 10[s] | loss 0.80
| epoch 25 | iter 321 / 351 | time 10[s] | loss 0.76
| epoch 25 | iter 341 / 351 | time 11[s] | loss 0.77
Q 77+85
T 162
X 164
---
Q 975+164
T 1139
X 1129
---
Q 582+84
T 666
X 672
---
Q 8+155
T 163
O 163
---
28. Q 367+55
T 422
X 429
---
Q 600+257
T 857
X 859
---
Q 761+292
T 1053
X 1049
---
Q 830+597
T 1427
X 1441
---
Q 26+838
T 864
X 858
---
Q 143+93
T 236
X 239
---
val acc 9.560%
33. # coding: utf-8
import sys
sys.path.append('..')
import numpy as np
import matplotlib.pyplot as plt
from dataset import sequence
from common.optimizer import Adam
from common.trainer import Trainer
from common.util import eval_seq2seq
from attention_seq2seq import AttentionSeq2seq
from seq2seq import Seq2seq
from peeky_seq2seq import PeekySeq2seq
# データの読み込み
(x_train, t_train), (x_test, t_test) = sequence.load_data('date.txt')
char_to_id, id_to_char = sequence.get_vocab()
# 入力文を反転
x_train, x_test = x_train[:, ::-1], x_test[:, ::-1]
# ハイパーパラメータの設定
vocab_size = len(char_to_id)
wordvec_size = 16
hidden_size = 256
batch_size = 128
max_epoch = 10
max_grad = 5.0
model = AttentionSeq2seq(vocab_size, wordvec_size, hidden_size)
# model = Seq2seq(vocab_size, wordvec_size, hidden_size)
# model = PeekySeq2seq(vocab_size, wordvec_size, hidden_size)
optimizer = Adam()
trainer = Trainer(model, optimizer)
34. acc_list = []
for epoch in range(max_epoch):
trainer.fit(x_train, t_train, max_epoch=1,
batch_size=batch_size, max_grad=max_grad)
correct_num = 0
for i in range(len(x_test)):
question, correct = x_test[[i]], t_test[[i]]
verbose = i < 10
correct_num += eval_seq2seq(model, question, correct,
id_to_char, verbose, is_reverse=True)
acc = float(correct_num) / len(x_test)
acc_list.append(acc)
print('val acc %.3f%%' % (acc * 100))
model.save_params()
# グラフの描画
x = np.arange(len(acc_list))
plt.plot(x, acc_list, marker='o')
plt.xlabel('epochs')
plt.ylabel('accuracy')
plt.ylim(-0.05, 1.05)
plt.show()
epocch3 をサンプルで記載したが、この時点で損失がとても小さい値であり、すべて
正解となっているため、かなりの早い段階で学習が進んだものと考えられる。
---
val acc 90.620%
| epoch 3 | iter 1 / 351 | time 0[s] | loss 0.12
| epoch 3 | iter 21 / 351 | time 11[s] | loss 0.10
35. | epoch 3 | iter 41 / 351 | time 23[s] | loss 0.06
| epoch 3 | iter 61 / 351 | time 34[s] | loss 0.05
| epoch 3 | iter 81 / 351 | time 45[s] | loss 0.04
| epoch 3 | iter 101 / 351 | time 57[s] | loss 0.03
| epoch 3 | iter 121 / 351 | time 68[s] | loss 0.02
| epoch 3 | iter 141 / 351 | time 79[s] | loss 0.02
| epoch 3 | iter 161 / 351 | time 90[s] | loss 0.02
| epoch 3 | iter 181 / 351 | time 102[s] | loss 0.02
| epoch 3 | iter 201 / 351 | time 113[s] | loss 0.01
| epoch 3 | iter 221 / 351 | time 126[s] | loss 0.01
| epoch 3 | iter 241 / 351 | time 138[s] | loss 0.01
| epoch 3 | iter 261 / 351 | time 151[s] | loss 0.01
| epoch 3 | iter 281 / 351 | time 164[s] | loss 0.01
| epoch 3 | iter 301 / 351 | time 178[s] | loss 0.01
| epoch 3 | iter 321 / 351 | time 189[s] | loss 0.01
| epoch 3 | iter 341 / 351 | time 202[s] | loss 0.01
Q 10/15/94
T 1994-10-15
O 1994-10-15
---
Q thursday, november 13, 2008
T 2008-11-13
O 2008-11-13
---
Q Mar 25, 2003
T 2003-03-25
O 2003-03-25
---
Q Tuesday, November 22, 2016
T 2016-11-22
O 2016-11-22
---
Q Saturday, July 18, 1970
T 1970-07-18
O 1970-07-18
---
36. Q october 6, 1992
T 1992-10-06
O 1992-10-06
---
Q 8/23/08
T 2008-08-23
O 2008-08-23
---
Q 8/30/07
T 2007-08-30
O 2007-08-30
---
Q 10/28/13
T 2013-10-28
O 2013-10-28
---
Q sunday, november 6, 2016
T 2016-11-06
O 2016-11-06