Data Analysis in the Hebrew Bible

741 views

Published on

Joint work with Martijn Naaijer (VU University).
With the Hebrew Bible encoded in Linguistic Annotation Framework (LAF-ISO), and with a new LAF processing tool, we demonstrate how you can do practical data analysis. The tool, LAF-Fabric, integrates with the ipython notebook approach. Our example here is lexeme cooccurrence analysis of bible books. For now, the road from data to visualization is more important than the exact visualization.

Published in: Education, Spiritual, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
741
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Analysis in the Hebrew Bible

  1. 1. DATA ANALYSIS IN THE HEBREW BIBLE CLIN 2014-01-17 Dirk Roorda (DANS/TLA), Martijn Naaijer and Gino Kalkman (VU ETCBC)
  2. 2. RESEARCH @ just started
  3. 3. EXEGESIS preaching the word of God the devil is in the details meanings of specific words
  4. 4. DISTANT READING scan large quantities of text find patterns signals in the noise study other aspects than meaning text transmission linguistic variation literary form
  5. 5. VARIATION IN BIBLICAL HEBREW Timespan of Hebrew Bible writing: ~1000 years Assumption: we can divide the books in 2 groups EBH (early biblical Hebrew) LBH (late biblical Hebrew)
  6. 6. "PROOF" Select some features that differ for EBH and LBH Risk of circularity We need data analysis that is comprehensive (not eclectic) critical (not everything is a signal)
  7. 7. SYNTACTIC VARIATION syntactic features drivers of change phrase, clause, text diachrony variation large units chapters books geography demography
  8. 8. THE HEBREW BIBLE AS DATA
  9. 9. THE HEBREW BIBLE IN LAF LAF ISO 24612:2012 SHEBANQ (github) 2.27 GB 1.5 M nodes 1.5 M edges 40 M features 400 K words 13 M XML ids
  10. 10. PROCESSING LAF it is XML but not document-like (not asTEI) and not database like (not nice for XQUERY) it is graph-like
  11. 11. PROCESSING LAF eXist (>30min loading time, simple queries >60min) indexes needed: but which ones tried POIO (>60min loading time, needs >20GB RAM) straightforward object oriented in Python scripting language overhead
  12. 12. LAF-FABRIC LAF-Fabric also Python loads in a few seconds uses C-like arrays executes in a few seconds on a laptop can run in a Terminal as an IPython notebook
  13. 13. gender notebook
  14. 14. COOCCURRENCES 1 Common Nouns 2 Proper Nouns Nodes are books Edges are cooccurrences of lexemes (1 or 2)
  15. 15. WEIGHTED EDGES S(lex): number of books containing lex C(b1, b2): intersection of lexemes of b1 and b2 L(b1, b2): union of lexemes of b1 and b2
  16. 16. cooccurrences notebook
  17. 17. cooccurrences notebook
  18. 18. cooccurrences notebook
  19. 19. cooccurrences notebook
  20. 20. cooccurrences notebook
  21. 21. cooccurrences notebook
  22. 22. cooccurrences notebook
  23. 23. Common Nouns no weight
  24. 24. Common Nouns with weight
  25. 25. Proper Nouns no weight
  26. 26. Proper Nouns with weight
  27. 27. DATA-DRIVEN THEOLOGY m.naaijer@vu.nl g.j.kalkman@vu.nl dirk.roorda@dans.knaw.nl Thank You

×