In this talk I am proposing the technique of combining human input with data programing and weak supervision to create a high quality model that evolves with feedback. We apply dark data extraction method: snorkel, developed at Stanford (https://hazyresearch.github.io/snorkel/) to create an honor code violation detector (HCVD). Snorkel is a framework that uses inputs from SME’s and business partners and converts them into heuristic noisy rules. It combines the rules using a generative model to determine high and low quality rules and outputs a high accuracy training data based on combined rules.
HCVD detects key phrases (example: do my online quiz) that indicate honor code violation.
We run this model daily and place the HCVD texts (around 2%) in front of humans, the feedback from the humans is periodically checked and the rules are edited
to change the weak supervision to produce a fresh training set for modeling. This is an ongoing and iterative process that uses interactive machine learning to evolve the Natural Language Comprehension model as new data gets collected.