11. Jerry Prawiharjo
Phd in Optoelectronics from Southampton England
◦ Distributed computation on Beowulf cluster (MPI)
Product Development Engineer at Neophotonics
◦ Test software development and data analysis
Senior Test Development Engineer at Cisco
◦ Test station development (hardware and software)
for 100G transceiver module
13. Challenges
Sheer amount of Data: >1TB
◦ Scoping the project: monthly time bucket (as opposed to daily or weekly)
◦ Filter foreign language subreddits
◦ Spark tuning
S3 rate limit: Process data on file-per-file basis