A Corpus LinguisticsBased Approach forEstimating ArabicOnline Content
5,340,000
1,950,000
0.5 %
1%
1.4 %
3%
0.5 %1.4 % %     1
Zipff’s Law
CorporaBuilding
Dmoz corpus75,560 pages530.1 MB659,756 uniq. words
Wikipedia corpus95,140 pages213.3 MB760,690 uniq. words
CCA corpus377 pages82,878 uniq. words
Common
‫‪Word‬‬   ‫‪Document‬‬   ‫‪Frequency‬‬   ‫‪Word‬‬       ‫‪Document‬‬   ‫‪Frequency‬‬ ‫فً‬      ‫812,06‬   ‫882,770,1‬    ...
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
A corpus linguistics based approach for estimating online content
Upcoming SlideShare
Loading in …5
×

A corpus linguistics based approach for estimating online content

799 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
799
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A corpus linguistics based approach for estimating online content

  1. 1. A Corpus LinguisticsBased Approach forEstimating ArabicOnline Content
  2. 2. 5,340,000
  3. 3. 1,950,000
  4. 4. 0.5 %
  5. 5. 1%
  6. 6. 1.4 %
  7. 7. 3%
  8. 8. 0.5 %1.4 % % 1
  9. 9. Zipff’s Law
  10. 10. CorporaBuilding
  11. 11. Dmoz corpus75,560 pages530.1 MB659,756 uniq. words
  12. 12. Wikipedia corpus95,140 pages213.3 MB760,690 uniq. words
  13. 13. CCA corpus377 pages82,878 uniq. words
  14. 14. Common
  15. 15. ‫‪Word‬‬ ‫‪Document‬‬ ‫‪Frequency‬‬ ‫‪Word‬‬ ‫‪Document‬‬ ‫‪Frequency‬‬ ‫فً‬ ‫812,06‬ ‫882,770,1‬ ‫أو‬ ‫967,62‬ ‫457,501‬ ‫من‬ ‫949,16‬ ‫250,068‬ ‫هذه‬ ‫982,92‬ ‫469,79‬‫على‬ ‫648,65‬ ‫496,894‬ ‫بين‬ ‫266,23‬ ‫535,48‬ ‫إلى‬ ‫995,84‬ ‫513,872‬ ‫اهلل‬ ‫308,62‬ ‫612,48‬ ‫أن‬ ‫934,04‬ ‫564,772‬ ‫أخبار‬ ‫010,03‬ ‫498,18‬‫عن‬ ‫637,05‬ ‫428,142‬ ‫كل‬ ‫772,03‬ ‫422,18‬‫التً‬ ‫734,53‬ ‫200,661‬ ‫الزئيسية‬ ‫000,14‬ ‫161,08‬ ‫ال‬ ‫221,04‬ ‫788,351‬ ‫بعد‬ ‫073,23‬ ‫713,87‬ ‫مع‬ ‫797,83‬ ‫751,031‬ ‫الصفحة‬ ‫738,72‬ ‫449,66‬ ‫ما‬ ‫736,33‬ ‫403,921‬ ‫لم‬ ‫304,52‬ ‫152,46‬ ‫هذا‬ ‫363,13‬ ‫521,901‬ ‫كان‬ ‫613,32‬ ‫813,36‬‫الذي‬ ‫474,23‬ ‫448,801‬ ‫العالم‬ ‫782,32‬ ‫864,06‬

×