10gen and usersA course is a (really long) documentAllowed OCW Search to get new features seamlessly
PHP scriptOnly fields to be indexed
Use course meta data in one algo to always produce the same output given the same inputs.
Need a way to work with all kinds of input
Uses regexs. Ugly, but works.PHP crashes with regexes matching really long strings.Split up string into array and loop, detecting encoding and reacting accordingly.It’s probably wrong for cases I’ve yet to see.
Uses CloudFusion libraryObject name = unique ID.
Mongo db full text search with sphinx
MongoDB Full Text Searchwith Sphinx<br />Pierre Far, PhD<br />Twitter: @ocwsearch<br />Web: www.ocwsearch.com<br />Email: firstname.lastname@example.org<br />
About<br />A search engine of the full text of OpenCourseWare course materials.<br />2600+ courses, 10 universities, 11 OCW collections<br />Courses in English, Japanese, Spanish, Dutch<br />
xmlpipe2<br />An XML documents input into Sphinx<br />Any XML source so...<br />Read courses from MongoDB and stream as XML<br />sphinxsearch.com/wiki/doku.php?id=sphinx_xmlpipe2_tutorial<br />
Pitfall 1: Document ID<br />“ALL DOCUMENT IDS MUST BE UNIQUE UNSIGNED NON-ZERO INTEGER NUMBERS”<br />Generate a unique 10-digit numeric ID for each course.<br />Must be deterministic<br />Unique index on field.<br />