Buzztterの裏側とその周辺技術

10,721 views
10,560 views

Published on

Published in: Technology
0 Comments
17 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
10,721
On SlideShare
0
From Embeds
0
Number of Embeds
122
Actions
Shares
0
Downloads
154
Comments
0
Likes
17
Embeds 0
No embeds

No notes for slide

Buzztterの裏側とその周辺技術

  1. 1. w w TFtgt DFtgt TFref DFref w TFtgt DFtgt w TFref DFref
  2. 2. >> t = Time.parse(quot;2007-11-3quot;) => Sat Nov 03 00:00:00 +0900 2007 >> Status.count(:conditions=>[quot;created_at BETWEEN ? AND ?quot;, t, t.tomorrow]) => 125626
  3. 3. Tue Nov 06 15:17:40 +0900 2007 - received 8 / 20, 5793 tuples Tue Nov 06 15:17:45 +0900 2007 - received 10 / 20, 5794 tuples Tue Nov 06 15:17:51 +0900 2007 - received 10 / 20, 5798 tuples Tue Nov 06 15:17:55 +0900 2007 - received 4 / 20, 5797 tuples Tue Nov 06 15:18:00 +0900 2007 - received 5 / 20, 5797 tuples Tue Nov 06 15:18:05 +0900 2007 - received 11 / 20, 5797 tuples Tue Nov 06 15:18:12 +0900 2007 - received 8 / 20, 5802 tuples Tue Nov 06 15:18:16 +0900 2007 - received 9 / 20, 5807 tuples Tue Nov 06 15:18:21 +0900 2007 - received 8 / 20, 5809 tuples Tue Nov 06 15:18:25 +0900 2007 - received 12 / 20, 5810 tuples Tue Nov 06 15:18:30 +0900 2007 - received 10 / 20, 5812 tuples Tue Nov 06 15:18:35 +0900 2007 - received 13 / 20, 5817 tuples Tue Nov 06 15:18:40 +0900 2007 - received 3 / 20, 5811 tuples Tue Nov 06 15:18:45 +0900 2007 - received 5 / 20, 5811 tuples Tue Nov 06 15:18:50 +0900 2007 - received 15 / 20, 5820 tuples Tue Nov 06 15:18:55 +0900 2007 - received 14 / 20, 5826 tuples Tue Nov 06 15:19:01 +0900 2007 - received 3 / 20, 5823 tuples Tue Nov 06 15:19:08 +0900 2007 - received 8 / 20, 5814 tuples Tue Nov 06 15:19:12 +0900 2007 - received 8 / 20, 5822 tuples Tue Nov 06 15:19:18 +0900 2007 - received 10 / 20, 5818 tuples
  4. 4. w w TFtgt DFtgt TFref DFref w TFtgt DFtgt w TFref DFref
  5. 5. k
  6. 6. i j i, j j Ci,j = P (tk−1 |tk )P (tk+1 |tk ) k=i Ci,j < 0.75 i..j
  7. 7. count_by_sql [quot;SELECT COUNT(DISTINCT(user_id)) FROM statuses WHERE #{IGNORE_COND} AND language = ? AND (created_at BETWEEN ? AND ?) AND text @@ ?quot;, language, t.ago(ago), t, add_pragma(word)]
  8. 8. 2007-11-06 13:19:45 ANALYZER-ng(22499) begin for japanese-utf8 2007-11-06 13:19:46 ANALYZER-ng(22499) extracted 3120 sentences 2007-11-06 13:20:12 ANALYZER-ng(22499) 6006 keywords extracted from 3120 sentences 2007-11-06 13:20:12 ANALYZER-ng(22499) deleting stopwords ... 2007-11-06 13:20:19 ANALYZER-ng(22499) odd terms removed (5902 terms) 2007-11-06 13:20:19 ANALYZER-ng(22499) ignore case (5895 terms) 2007-11-06 13:20:19 ANALYZER-ng(22499) trivial terms are removed (1796 terms) 2007-11-06 13:21:38 ANALYZER-ng(22499) occurrence calculated (72.738133 s) 2007-11-06 13:23:35 ANALYZER-ng(22499) modified DDFs calculated 2007-11-06 13:23:35 ANALYZER-ng(22499) scores calculated (1563 terms) 2007-11-06 13:23:40 ANALYZER-ng(22499) redundant terms removed (1151 terms) 2007-11-06 13:23:42 ANALYZER-ng(22499) end for japanese-utf8 (237.531316 s) 2007-11-06 13:23:42 ANALYZER-ng(22499) begin for english 2007-11-06 13:23:43 ANALYZER-ng(22499) extracted 6181 sentences 2007-11-06 13:24:20 ANALYZER-ng(22499) 10168 keywords extracted from 6181 sentences 2007-11-06 13:24:20 ANALYZER-ng(22499) deleting stopwords ... 2007-11-06 13:24:33 ANALYZER-ng(22499) odd terms removed (9808 terms) 2007-11-06 13:24:33 ANALYZER-ng(22499) ignore case (9444 terms) 2007-11-06 13:24:33 ANALYZER-ng(22499) trivial terms are removed (2738 terms) 2007-11-06 13:26:18 ANALYZER-ng(22499) occurrence calculated (96.306258 s) 2007-11-06 13:27:59 ANALYZER-ng(22499) modified DDFs calculated 2007-11-06 13:27:59 ANALYZER-ng(22499) scores calculated (2109 terms) 2007-11-06 13:28:10 ANALYZER-ng(22499) redundant terms removed (1643 terms) 2007-11-06 13:28:13 ANALYZER-ng(22499) end for english (270.044345 s)

×