AN UPDATE Prepared by Nadia Millington & Luis Rosenthal
Quality of phone • Ideally Nokia 6300 ( or above) will allow appropriate visualisation of the image is its resolution and screen size. • If microworkers do not have an appropriate phone, they can access this phone via a microfinance loan or we can develop a scheme whereby refurbished high end phones from the first world ( which have been fully depreciated) can be sent to the BOP at a fraction of the cost ( some as low as 20USDs) allowing for high visualisation and good quality screen size.
Data transmission costs The money that the microworkers earn is expected to be significantly higher than the data costs based on our quick and dirty review of phone costs in 3 developing countries. Assuming each job pays 20US cents we see data charges as a small percentage of their earnings and their only cost (2-15%). We expect even these percentages to be reduced based on a thorough review of all the available packages
Can the services be automated by a computer?High accuracy OCR software can read more than 400 The accuracy of OCR systems is, in practice, directlycharacters/second. dependent upon the quality of the input documents. OCR is not very tolerant of bad picture quality unlikeHowever: human readers. As such it is expected the OCR useOCR software is not efficient in recognizing handwriting and with receipt will have higher error thresholds. Thedistinguishing between fonts which are quite similar to main difficulties encountered with receipts , invoiceshandwriting. In such cases manual entry plays better role etc that are a challenge to OCR arethan OCR process.Data entry provides complete flexibility allowing micro • Variations in shape, due to serifs and styleoperators to prepare digital documents from multiple variations.formats- even audio recording of spending can be included, • Deformations, caused by broken characters,and notes on partial payments scribbled on the receipts smudged characters and speckle.etc. • Variations in spacing, due to subscripts, superscripts, skew and variable spacing.OCR may be efficient during the initial level of data entry • Mixture of text and graphics.service but cannot be a substitute of data entry servicebecause recognition of typewritten text is still not 100%accurate even where clear imaging is available. OCRsoftware ranges from 71% to 95%; but total accuracy canbe achieved only by human review. Errors occur becauseof :•Distinguishing noise from text- Dots and accents may bemistaken for noise, and vice versa.•Mistaking graphics or geometry for text- This leads tonontext being sent to recognition. ni = m• Mistaking text for graphics or geometry- In this case thetext will not be passed to the recognition stage. This often Common OCR issues include mistaking an “ni” for an “m”happens if characters are connected to graphics.
When OCR doesn’t workThese imperfections may affect and cause problems in different parts of the recognition process of anOCR-system, resulting misclassifications
Finally Most OCR has some human interaction. Modern optical character recognition software relies on human interaction to correct misrecognized characters. Even though the software often reliably identifies low-confidence output, the simple language and vocabulary models employed are insufficient to automatically correct mistakes. A developer of the software lemon.com confirms this- he states “Whenever the machine learning system or the OCR system have a low confidence result, it can ask for human assistance, usually with a multiple choice answer or a request to edit an entry”. Models where OCR does not use human intervention, the consumer is expected to correct their own errors which is not a value proposition AskMom would ever employ as we are selling convenience It is possible to enhance the AskMom Business model with OCR technology on the front end utilising microworkers for quality assurance and low confidence results. The use of micro workers would still mean that we are operating at costs below other players. However, the human element is the key as it differentiates us. It allows AskMom to have higher levels of flexibility for recording complex, ill printed, receipts with accuracy from all parts of the world (offering a global solution) as opposed to the other options like lemon which only works within the US jurisdiction