8. Job CV & cover letter
Job <—> Applicant analyses empowering the recruiter
Large graphs connect jobs, educations & skills
Augment job descriptions and profiles
Relevant skills, job experience and education for the job
Probable skills with confidence on profiles
15. Interesting challenge: Given batch of J job descriptions
Score 5M profiles (all jobs simultaneously)
For each job (sequentially):
Top N in each partition
Merge Top N from each partition
Sequential is a nuisance
Collect on driver
Results are not distributed
16. Tree Digests
[1] Dunning, T. "COMPUTING EXTREMELY ACCURATE QUANTILES USING t-DIGESTS”.
https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf
Images shamelessly copied from here (thanks Cam Davidson-Pilon!):
[2] https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest
Compressing the CDF
Estimate quantile or percentiles with low error
Associative and commutative
“I streamed 8mb of pareto-distributed data into a t-Digest. The resulting size was 5kb, and I could estimate
any percentile or quantile desired. Accuracy was on the order of 0.002%.”
[1]
[2]
17. Given batch of J job descriptions
Score 5M profiles (all jobs simultaneously)
Compute t-Digests locally on executors
Sum t-Digests
Broadcast t-Digests
Filter partitions where score >= percentile
Approximate Top N remain in partitions
t-Digests are small
Collect on driver is comparatively small
Results remain distributed
Approximates Top N