Boy meets Girl Story
How we built it
EmberJS Single Page Search App
GPSN UI (Bootsrap CSS)
Solr as a NoSQL
• Used “atomic updates” to merge three
source datasets into single ﬁnal dataset.
• All text displayed in application stored in
• Dynamic schema supports many languages,
en, cn right now.
Think about DataVolume
• Started with older dataset, and tasks like TIFF -> PNG
conversion became progressively harder. Map/Reduce nice,
need more visibility into progress..
• Should have sharded our Search Index from the beginning
just to make indexing faster and cheaper process (500 gb
• 8 shards dropped time from 12 hours to 2 hours.
Merging took 5!
• We had too many steps in our pipeline
5 days 3 days 30 Minutes
Detector to pick File
Telling some stories
• How to inject “Discovery” into your app
• The Cloud to the Rescue (sorta!)
• Parsers and Parsers and Parsers
➡Don’t be Afraid to Share!
Your BigData solution
• Allow users to export data
• Most business users want to work in Excel.
• Allow other applications to build on top of
of your application.
• Lots of easy “Print to
• Data stored in S3 as:
• individual patent ﬁles
• chunky downloads.
• Filtering to expand or
select speciﬁc data sets.
• Permalinks: simple, very
• Underlying Solr service
is exposed to public via
proxy. You can query
• Need advance querying?
Use Lucene syntax in