Something really exciting and largely unnoticed is going on in the Spark ecosystem. As data scientists and engineers learn Spark, they’re actually all implicitly learning a much older, more general topic: typed functional programming. While Spark itself was built on an accumulation of powerful computer science concepts from functional programming and other areas, developers are often encountering these ideas in the context of Spark for the first time. It turns out that Spark makes an excellent platform for learning concepts like immutability, higher order and anonymous functions, laziness, and monadic operators.
This talk will discuss how Spark can be used as teaching tool, to build skills in areas like typed functional programming. We’ll explore a skill-building curriculum that can be used with a data scientist or engineer who only has experience in imperative, dynamically-typed languages like Python. This curriculum introduces the core concepts of functional programming and type theory, while providing learners the opportunity to immediately apply their skills at massive scale, using the power of Spark’s painless scalability and resilience.
Based on the experience of building machine learning teams at x.ai and other data-centric startups, this curriculum is the foundation of building poly-skilled, highly autonomous team members who can build scalable intelligent systems. We’ll work from foundational concepts of Scala and functional programming towards a fully implemented machine learning pipeline, all using Spark and MLlib. Unique new features of Spark like Datasets and Structured Streaming will be particularly useful in this effort. Using this approach, teams can help members in all roles learn how to use sophisticated programming techniques that ensure correctness at scale. With these skills in their toolbox, data scientists and engineers often find that building powerful machine learning systems is intuitive, easy, and even fun.