Embed presentation
Download to read offline






The document loads a text file of people data, filters the data into those aged 25 and under (A) and over 25 (B), maps values to those groups, performs reduce by key operations on fields 3 of A and B to sum values, and saves the results as text files. It also shows the various Spark stages involved in the distributed processing.




