The document discusses why Scala is an interesting language to learn, especially for working with data. It notes that Scala combines functional programming and object-oriented concepts, runs on the JVM, and its features like immutability and pattern matching make it well-suited for data processing. The document also highlights trends around data engineering and increasing amounts of data that will need to be managed.
1. Why you should I care
about Scala?
December 15th, 2017
Evaldas Miliauskas
@evaldasw
Co-Founder @ StackTome
2. Want to make it clear
I'm not a priest, and no Scala is not better than
Python, Java, Javascript, C# and many others...
Though having said that I do enjoy this video:
3.
4.
5. A bit about myself
● Evaldas Miliauskas (from Lithuania) - been in software dev for 10+ years,
in big corps like ibm and in small ones that you wouldn’t know the name
even if I tell you
● Right now co-Founder of a startup StackTome, working on eCommerce
data problems/apps
● In terms of usage we use Scala for data related stuff, like data pipelines,
micro-services
6. How I came to know Scala?
● Listening to podcasts from 2010+ first time heard about it
● You could attribute to the FOMO - fear of missing out - "oh
there are these cool people doing some cool stuff"
● Had to start another company to start using it
7.
8. Why Scala to begin with?
● - social proof - companies like Linkedin, Twitter and even
newspaper company like "The Guardian"
● - large "cool" open source projects - Spark, Kafka, Akka
● - salary - Scala is 1st in US by highest paid jobs and 1-2 tied in US
based on 2017 stackoverflow survey
● - tiobe moved from 50+ to 23
10. What's the catch?
- it's not easy
- 2 words - learning curve
- Main thing - there is no 1 way of doing things (reason
enterprises are reluctant to adopt it)
- jargon - immutability, traits, "pure functions", monads,
algebriac data types, implicit class params, tail recursion,
futures, actors, event/reactive streams (nothing to do with
react.js), throttling
13. So why I'm interested in FP and you should
too?
● how well does shared state scales in distributed env?
● did you ever had a method/function that when you pass object
and it "does something" then you spend 2 hours debugging what
that something was? Like
var res = optimize(configObj); // not sure how it works
displayResults(res);
● FP matches well the nature of data processing, especially events
as they are immutable by design
14. Pure functions
A pure function is a contract which a compiler can verify. The
contract doesn't specify everything about the function, but it can help you to
resolve the majority of the really boring problems at compile time.
https://dev.to/kspeakman/what-is-the-benefit-of-functional-programming
15. So what are the benefits of Scala lang?
● Immutability by design
● Both FP & OO concepts
● Statically typed but has optional typing - compiler helps you here
● Can be as concise as Python
● Runs on JVM
16. Some language feat. highlights
● Immutable collections - no need to think about state changes
● Tail recursion - doesn’t blow up the stack
● Object decomposition with pattern matching - switch/case on
steroids
● Implicit class/function params - no need for DI
● Traits - multi inheritance for specific type of functionality
● Type inference - compile safe type omitting
● Options - eliminates null pointes with explicit handling
● Futures - makes it easier to compose async functions
19. Data engineering
“Data engineers build tools, infrastructure, frameworks, and
services.” - The Rise of the Data Engineer by Maxime Beauchemin
(founder of Airflow)
As data is becoming more and more centric to every company it’s
becoming critical to account for data management and all related
infrastructure in the same fashion as code and it’s implementing
applications.
20. Data engineering
● 80-90% of time is spent in data cleansing
● Data is hard, messy and you cannot debug sql! But one thing for
sure we will have a lot more of it coming in future
22. Share of data
- I still remember the days when having 13GB my first PC 17 years
ago - hard drive felt enormous, though I can buy a flash disk ~20
times the size cheaper - here is some hopeful trend when you think
about it
- In 2021, global consumer IP traffic is expected to reach 232,655
petabytes per month at a 24 percent
- Now it’s around 52,678 PB per month