SlideShare a Scribd company logo
1 of 25
Why you should I care
about Scala?
December 15th, 2017
Evaldas Miliauskas
@evaldasw
Co-Founder @ StackTome
Want to make it clear
I'm not a priest, and no Scala is not better than
Python, Java, Javascript, C# and many others...
Though having said that I do enjoy this video:
A bit about myself
● Evaldas Miliauskas (from Lithuania) - been in software dev for 10+ years,
in big corps like ibm and in small ones that you wouldn’t know the name
even if I tell you
● Right now co-Founder of a startup StackTome, working on eCommerce
data problems/apps
● In terms of usage we use Scala for data related stuff, like data pipelines,
micro-services
How I came to know Scala?
● Listening to podcasts from 2010+ first time heard about it
● You could attribute to the FOMO - fear of missing out - "oh
there are these cool people doing some cool stuff"
● Had to start another company to start using it
Why Scala to begin with?
● - social proof - companies like Linkedin, Twitter and even
newspaper company like "The Guardian"
● - large "cool" open source projects - Spark, Kafka, Akka
● - salary - Scala is 1st in US by highest paid jobs and 1-2 tied in US
based on 2017 stackoverflow survey
● - tiobe moved from 50+ to 23
Stackoverflow survey 2017
What's the catch?
- it's not easy
- 2 words - learning curve
- Main thing - there is no 1 way of doing things (reason
enterprises are reluctant to adopt it)
- jargon - immutability, traits, "pure functions", monads,
algebriac data types, implicit class params, tail recursion,
futures, actors, event/reactive streams (nothing to do with
react.js), throttling
And of course the code...
1. def getRecommendations[A, B <: GetRecommendationListRequest[A]](
2. targetKey: A,
3. reqApply: (A, Map[String, String]) => B,
4. getRecs: B=>Future[Option[PagedResult[RecommendationRecord]]])(
5. implicit validator: Validator[B]): Route =
6. prepParameterMap { map =>
7. complete {
8. validate(reqApply(targetKey, map)) { req =>
9. ToResponseMarshallable {
10. getRecs(req).map { recs =>
11. handleEmpty(recs) { res =>
12. EntityListResponse[GetRecommendationResponse, GetRecommendationListResponse](
13. EntityPagedResult(res, (v: RecommendationRecord) => GetRecommendationResponse(v)),
14. GetRecommendationListResponse.apply,
15. req.eventId
16. )
17. }
18. }
19. }
20. }
21. }
22. }
...you can end up like this
So why I'm interested in FP and you should
too?
● how well does shared state scales in distributed env?
● did you ever had a method/function that when you pass object
and it "does something" then you spend 2 hours debugging what
that something was? Like
var res = optimize(configObj); // not sure how it works
displayResults(res);
● FP matches well the nature of data processing, especially events
as they are immutable by design
Pure functions
A pure function is a contract which a compiler can verify. The
contract doesn't specify everything about the function, but it can help you to
resolve the majority of the really boring problems at compile time.
https://dev.to/kspeakman/what-is-the-benefit-of-functional-programming
So what are the benefits of Scala lang?
● Immutability by design
● Both FP & OO concepts
● Statically typed but has optional typing - compiler helps you here
● Can be as concise as Python
● Runs on JVM
Some language feat. highlights
● Immutable collections - no need to think about state changes
● Tail recursion - doesn’t blow up the stack
● Object decomposition with pattern matching - switch/case on
steroids
● Implicit class/function params - no need for DI
● Traits - multi inheritance for specific type of functionality
● Type inference - compile safe type omitting
● Options - eliminates null pointes with explicit handling
● Futures - makes it easier to compose async functions
Trends - 2 keywords with data
Data engineering
“Data engineers build tools, infrastructure, frameworks, and
services.” - The Rise of the Data Engineer by Maxime Beauchemin
(founder of Airflow)
As data is becoming more and more centric to every company it’s
becoming critical to account for data management and all related
infrastructure in the same fashion as code and it’s implementing
applications.
Data engineering
● 80-90% of time is spent in data cleansing
● Data is hard, messy and you cannot debug sql! But one thing for
sure we will have a lot more of it coming in future
When it comes to data there is always more
to come
Share of data
- I still remember the days when having 13GB my first PC 17 years
ago - hard drive felt enormous, though I can buy a flash disk ~20
times the size cheaper - here is some hopeful trend when you think
about it
- In 2021, global consumer IP traffic is expected to reach 232,655
petabytes per month at a 24 percent
- Now it’s around 52,678 PB per month
Questions?
We’re Hiring
Contact HR@stacktome.com
Coding time!

More Related Content

Similar to Why Scala is ideal for data engineering

Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7Paul Lo
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science ChallengeMark Nichols, P.E.
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to productionGeorg Heiler
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsDatabricks
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017ajay_ei
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...Marcin Bielak
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...StampedeCon
 
WSO2 Workshop Sydney 2016 - Analytics
WSO2 Workshop Sydney 2016 -  AnalyticsWSO2 Workshop Sydney 2016 -  Analytics
WSO2 Workshop Sydney 2016 - AnalyticsDassana Wijesekara
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamGreg Goltsov
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
CoverPage_Resume V2
CoverPage_Resume V2CoverPage_Resume V2
CoverPage_Resume V2Gary Lewis
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion
 

Similar to Why Scala is ideal for data engineering (20)

Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017
 
Databases for Data Science
Databases for Data ScienceDatabases for Data Science
Databases for Data Science
 
Evolution of Spark APIs
Evolution of Spark APIsEvolution of Spark APIs
Evolution of Spark APIs
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
 
WSO2 Workshop Sydney 2016 - Analytics
WSO2 Workshop Sydney 2016 -  AnalyticsWSO2 Workshop Sydney 2016 -  Analytics
WSO2 Workshop Sydney 2016 - Analytics
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
CoverPage_Resume V2
CoverPage_Resume V2CoverPage_Resume V2
CoverPage_Resume V2
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 

Recently uploaded

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Recently uploaded (20)

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

Why Scala is ideal for data engineering

  • 1. Why you should I care about Scala? December 15th, 2017 Evaldas Miliauskas @evaldasw Co-Founder @ StackTome
  • 2. Want to make it clear I'm not a priest, and no Scala is not better than Python, Java, Javascript, C# and many others... Though having said that I do enjoy this video:
  • 3.
  • 4.
  • 5. A bit about myself ● Evaldas Miliauskas (from Lithuania) - been in software dev for 10+ years, in big corps like ibm and in small ones that you wouldn’t know the name even if I tell you ● Right now co-Founder of a startup StackTome, working on eCommerce data problems/apps ● In terms of usage we use Scala for data related stuff, like data pipelines, micro-services
  • 6. How I came to know Scala? ● Listening to podcasts from 2010+ first time heard about it ● You could attribute to the FOMO - fear of missing out - "oh there are these cool people doing some cool stuff" ● Had to start another company to start using it
  • 7.
  • 8. Why Scala to begin with? ● - social proof - companies like Linkedin, Twitter and even newspaper company like "The Guardian" ● - large "cool" open source projects - Spark, Kafka, Akka ● - salary - Scala is 1st in US by highest paid jobs and 1-2 tied in US based on 2017 stackoverflow survey ● - tiobe moved from 50+ to 23
  • 10. What's the catch? - it's not easy - 2 words - learning curve - Main thing - there is no 1 way of doing things (reason enterprises are reluctant to adopt it) - jargon - immutability, traits, "pure functions", monads, algebriac data types, implicit class params, tail recursion, futures, actors, event/reactive streams (nothing to do with react.js), throttling
  • 11. And of course the code... 1. def getRecommendations[A, B <: GetRecommendationListRequest[A]]( 2. targetKey: A, 3. reqApply: (A, Map[String, String]) => B, 4. getRecs: B=>Future[Option[PagedResult[RecommendationRecord]]])( 5. implicit validator: Validator[B]): Route = 6. prepParameterMap { map => 7. complete { 8. validate(reqApply(targetKey, map)) { req => 9. ToResponseMarshallable { 10. getRecs(req).map { recs => 11. handleEmpty(recs) { res => 12. EntityListResponse[GetRecommendationResponse, GetRecommendationListResponse]( 13. EntityPagedResult(res, (v: RecommendationRecord) => GetRecommendationResponse(v)), 14. GetRecommendationListResponse.apply, 15. req.eventId 16. ) 17. } 18. } 19. } 20. } 21. } 22. }
  • 12. ...you can end up like this
  • 13. So why I'm interested in FP and you should too? ● how well does shared state scales in distributed env? ● did you ever had a method/function that when you pass object and it "does something" then you spend 2 hours debugging what that something was? Like var res = optimize(configObj); // not sure how it works displayResults(res); ● FP matches well the nature of data processing, especially events as they are immutable by design
  • 14. Pure functions A pure function is a contract which a compiler can verify. The contract doesn't specify everything about the function, but it can help you to resolve the majority of the really boring problems at compile time. https://dev.to/kspeakman/what-is-the-benefit-of-functional-programming
  • 15. So what are the benefits of Scala lang? ● Immutability by design ● Both FP & OO concepts ● Statically typed but has optional typing - compiler helps you here ● Can be as concise as Python ● Runs on JVM
  • 16. Some language feat. highlights ● Immutable collections - no need to think about state changes ● Tail recursion - doesn’t blow up the stack ● Object decomposition with pattern matching - switch/case on steroids ● Implicit class/function params - no need for DI ● Traits - multi inheritance for specific type of functionality ● Type inference - compile safe type omitting ● Options - eliminates null pointes with explicit handling ● Futures - makes it easier to compose async functions
  • 17. Trends - 2 keywords with data
  • 18.
  • 19. Data engineering “Data engineers build tools, infrastructure, frameworks, and services.” - The Rise of the Data Engineer by Maxime Beauchemin (founder of Airflow) As data is becoming more and more centric to every company it’s becoming critical to account for data management and all related infrastructure in the same fashion as code and it’s implementing applications.
  • 20. Data engineering ● 80-90% of time is spent in data cleansing ● Data is hard, messy and you cannot debug sql! But one thing for sure we will have a lot more of it coming in future
  • 21. When it comes to data there is always more to come
  • 22. Share of data - I still remember the days when having 13GB my first PC 17 years ago - hard drive felt enormous, though I can buy a flash disk ~20 times the size cheaper - here is some hopeful trend when you think about it - In 2021, global consumer IP traffic is expected to reach 232,655 petabytes per month at a 24 percent - Now it’s around 52,678 PB per month