SlideShare a Scribd company logo
TIM: Large-scale Forecasting
for Energy in Julia
Ján Dolinský, Tangent Works #PyDataBA19. 2. 2018
(PyData Bratislava Meetup #6, Nervosa)
The Two Language Problem
The Philosofical Part
Ján Dolinský
jan.dolinsky@tangent.works
Math & Stats
Information Geometry
Computer
Science
Domain
Expertise
Machine
Learning
Traditional
Software
Traditional
Research
VALUE
Math & Stats
Information Geometry
Computer
Science
Domain
Expertise
Machine
Learning
Traditional
Software
Traditional
Research
VALUE
Data Science Stack:
R, Matlab, Python, Ruby, ...
Production Stack:
C++, Java, Fortran, ...
The Two Language Problem
Prototyping Production
Mathematicians & Data Scientists
Getting the applied math right.
Performance engineers & Architects.
The Two Language Problem: Tangent Works in 2014
Prototyping Production
The Data Science Stack:
R, Matlab, Octave, Python, Ruby, ….
Production stack:
C++, Fortran, Java, ...
A lot of effort ...
The Two Language Problem: Tangent Works in 2018
Prototyping Production
Julia
Julia
Mathematicians
Getting the applied math right.
Performance engineers & Architects.
TIM: Automatic Model Building for
Time-series in Energy
The Product Part
Ján Dolinský
jan.dolinsky@tangent.works
TIM
Tangent Information Modeller
Ján Dolinský
jan.dolinsky@tangent.works
Content
• Time-series Problems in Energy Industry
• Data Science Process and Automatic Model Building
• Live Modeling Demonstration
• Large-scale Forecasting Systems & Why Automation Matters
• Julia Language
• GefCom 2014, 2017 results & Summary
Time-series Problems in Energy Industry
• Electricity Load (aggregated and individual)
• Technical Losses
• System Imbalance
• Gas Consumption (District Heating)
• District Cooling
• Wind Production
• Solar Production
• Electricity Price
Consumption side Production side
1. Collect 2. Store 3. Transform 4. Model 5. Predict 6. Present
Scada Systems, Databases, Time Series Management
Oracle, MySQL, SAP HANA, Hadoop, ...
SAS, SPSS, R, ML libraries, Matlab Power BI, SAP,
QlickView etc ..
Automation Using TIM
Data Science Process
4. Model 5. Predict
SAS, SPSS, R, ML libraries, Matlab
Automation Using TIM
Data Science Process
●
feature engineering
●
model selection
●
tedious
●
costs money
Data Science Process
y(k)=f (Temp(k−3),Temp(k−22)∗Wind(k−1), y(k−24))
NN, SVM, MARS, LASSO, RF, … ?
Which features and how many ?
// optional
Data Science Process
y(k)=f (Temp(k−3),Temp(k−22)∗Wind(k−1), y(k−24))
Temp(k-1), Temp(k-2), …, Temp(k-24)
Wind(k-1), Wind(k-2), …, Wind(k-24)
y(k-1), y(k-2), …, y(k-24)
Temp(k-1), Temp(k-2), …, Temp(k-96)
Wind(k-1), Wind(k-2), …, Wind(k-96)
y(k-1), y(k-2), …, y(k-96)
24 + 24 + 24 + 24*24 = 24*27 = 648 96 + 96 + 96 + 96*96 = 96*99 = 9504
// optional
TIM: As a Large-Scale Model Building Engine
TIM: As a Data Science Tool
Business User Mode produces high-quality models.
Advanced User Mode allows to explore new scenarios quickly.
Fine-tuning possible for critical assets.
Live Demonstration of TIM
GefCom 2017 ex-ante results
• Qualifying Match: 1st
place out of 177 teams
• Final Match: 2nd
place out of 12 pre-selected teams
Andritz Hackathon 2017
1st
place out of 7 ML companies
Large-scale Forecasting Systems
Platforms where thousands of different consumption and production models are produced.
This is not possible without full automation.
TIM Worker 1
CAPI
JAPI
TIM
WS
REST/SOAP
API
TIM ConnectorsData Source
fs2TIM_WS
fs2TIM_CAPI*
fs2TIM_JAPI*
JuliaDB2TIM_WS
...
File System
HAKOM TSM
JuliaDB
InfluxDB
MySQL
PosgreSQL
Oracle
TIM Worker N
CAPI
JAPI
TIM
....
Asset data &
model management
Weather data
Visualization & UI
Grafana
Genie
...
Julia Language
Walks like Python runs like C
●
A single platform for prototyping and production → significant gains in efficiency
of product development
●
Vectorized code runs equally fast as de-vectorized
●
Vectorization on a single thread level
●
SIMD out of the box
●
Direct calls to BLAS
●
Distributed parallel computation (multiple threads)
Summary
• Fully automated model building
• Accuracy often outperforms manually build models
• Quick insights into a time-series of interest
• Tedious model building is an option not necessity
Thank you.
jan.dolinsky@tangent.works
Tangent Works
Na Slavíne 1
81106 Bratislava
www.tangent.works
The Key Concepts in Julia
Julia Lectures for Tangent Works Employees
Ján Dolinský
jan.dolinsky@tangent.works
Lectures Block 1: Vectorization vs. De-vectorization
• Iterators: The Storage Concept
• On Demand Iteration: Generators
• MapReduce principle
• Broadcasting
• Syntactic Loop Fusion
 Why is vectorized code fast ?
 Why it is not as fast as it could be ?
Lectures Block 2: Linear Algebra
• Matrices storage and its consequences
• Column-major & SIMD
• BLAS calls
Lectures Block 3: Language Design Elements
• Multiple-dispatch
• Struct
• Mutable & Immutable concept
• Modules & Packages generation
• JIT and AOT compilation
Julia Language
Walks like Python runs like C
●
A single platform for prototyping and production → significant gains in efficiency
of product development
●
Vectorized code runs equally fast as de-vectorized
●
Vectorization on a single thread level
●
SIMD out of the box
●
Direct calls to BLAS
●
Distributed parallel computation (multiple threads)
2 Years of Using Julia
• It is a real programming language
• In v0.2 to v0.4 still in heavy development
• v0.5 and v0.6 are very useful; static compilation possible
• Prototyping and production development is done using the same platform
• IDE not fully matured but already quite useful
• Debugger

More Related Content

What's hot

Fyp presentation final
Fyp presentation finalFyp presentation final
Fyp presentation final
home
 
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...
Riccardo Tommasini
 
single source shorest path
single source shorest pathsingle source shorest path
single source shorest path
sowfi
 
presentation
presentationpresentation
presentation
jie ren
 

What's hot (20)

Fyp presentation final
Fyp presentation finalFyp presentation final
Fyp presentation final
 
A calculus of mobile Real-Time processes
A calculus of mobile Real-Time processesA calculus of mobile Real-Time processes
A calculus of mobile Real-Time processes
 
Information security Seminar #3
Information security Seminar #3 Information security Seminar #3
Information security Seminar #3
 
Inductive Triple Graphs: A purely functional approach to represent RDF
Inductive Triple Graphs: A purely functional approach to represent RDFInductive Triple Graphs: A purely functional approach to represent RDF
Inductive Triple Graphs: A purely functional approach to represent RDF
 
Data science : R Basics Harvard University
Data science : R Basics Harvard UniversityData science : R Basics Harvard University
Data science : R Basics Harvard University
 
Os Urner
Os UrnerOs Urner
Os Urner
 
Nor Implement
Nor ImplementNor Implement
Nor Implement
 
Topoloical sort
Topoloical sortTopoloical sort
Topoloical sort
 
Topological sort
Topological sortTopological sort
Topological sort
 
Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24
 
MATLAB
MATLABMATLAB
MATLAB
 
CatBoost intro
CatBoost   introCatBoost   intro
CatBoost intro
 
Discrete Logarithmic Problem- Basis of Elliptic Curve Cryptosystems
Discrete Logarithmic Problem- Basis of Elliptic Curve CryptosystemsDiscrete Logarithmic Problem- Basis of Elliptic Curve Cryptosystems
Discrete Logarithmic Problem- Basis of Elliptic Curve Cryptosystems
 
HHHHHH
HHHHHHHHHHHH
HHHHHH
 
TMPA-2017: Static Checking of Array Objects in JavaScript
TMPA-2017: Static Checking of Array Objects in JavaScriptTMPA-2017: Static Checking of Array Objects in JavaScript
TMPA-2017: Static Checking of Array Objects in JavaScript
 
18103010 algorithm complexity (iterative)
18103010 algorithm complexity (iterative)18103010 algorithm complexity (iterative)
18103010 algorithm complexity (iterative)
 
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing E...
 
single source shorest path
single source shorest pathsingle source shorest path
single source shorest path
 
Temporal logic and functional reactive programming
Temporal logic and functional reactive programmingTemporal logic and functional reactive programming
Temporal logic and functional reactive programming
 
presentation
presentationpresentation
presentation
 

Similar to TIM: Large-scale Energy Forecasting in Julia

Automated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with DaskAutomated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with Dask
ASI Data Science
 
Intel realtime analytics_spark
Intel realtime analytics_sparkIntel realtime analytics_spark
Intel realtime analytics_spark
Geetanjali G
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 

Similar to TIM: Large-scale Energy Forecasting in Julia (20)

Linux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and EngineeringLinux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and Engineering
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Automated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with DaskAutomated Data Exploration: Building efficient analysis pipelines with Dask
Automated Data Exploration: Building efficient analysis pipelines with Dask
 
Intel realtime analytics_spark
Intel realtime analytics_sparkIntel realtime analytics_spark
Intel realtime analytics_spark
 
Lecture 1 (bce-7)
Lecture   1 (bce-7)Lecture   1 (bce-7)
Lecture 1 (bce-7)
 
MDR Corporation Description at AWS Summit Tokyo
MDR Corporation Description at AWS Summit TokyoMDR Corporation Description at AWS Summit Tokyo
MDR Corporation Description at AWS Summit Tokyo
 
computer architecture.
computer architecture.computer architecture.
computer architecture.
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
Mathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakesMathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakes
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 
Cv jeanlucbordessoule
Cv jeanlucbordessouleCv jeanlucbordessoule
Cv jeanlucbordessoule
 
Alresume2010.Pdf
Alresume2010.PdfAlresume2010.Pdf
Alresume2010.Pdf
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Warsaw Data Science - Factorization Machines Introduction
Warsaw Data Science -  Factorization Machines IntroductionWarsaw Data Science -  Factorization Machines Introduction
Warsaw Data Science - Factorization Machines Introduction
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
Resume
ResumeResume
Resume
 
Srinivas Muddana Resume
Srinivas Muddana ResumeSrinivas Muddana Resume
Srinivas Muddana Resume
 
Srinivas Muddana Resume
Srinivas Muddana ResumeSrinivas Muddana Resume
Srinivas Muddana Resume
 
Srinivas Muddana Resume
Srinivas Muddana ResumeSrinivas Muddana Resume
Srinivas Muddana Resume
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 

Recently uploaded

Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
zahraomer517
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 

Recently uploaded (20)

Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 

TIM: Large-scale Energy Forecasting in Julia

  • 1. TIM: Large-scale Forecasting for Energy in Julia Ján Dolinský, Tangent Works #PyDataBA19. 2. 2018 (PyData Bratislava Meetup #6, Nervosa)
  • 2.
  • 3. The Two Language Problem The Philosofical Part Ján Dolinský jan.dolinsky@tangent.works
  • 4. Math & Stats Information Geometry Computer Science Domain Expertise Machine Learning Traditional Software Traditional Research VALUE
  • 5. Math & Stats Information Geometry Computer Science Domain Expertise Machine Learning Traditional Software Traditional Research VALUE Data Science Stack: R, Matlab, Python, Ruby, ... Production Stack: C++, Java, Fortran, ...
  • 6. The Two Language Problem Prototyping Production Mathematicians & Data Scientists Getting the applied math right. Performance engineers & Architects.
  • 7. The Two Language Problem: Tangent Works in 2014 Prototyping Production The Data Science Stack: R, Matlab, Octave, Python, Ruby, …. Production stack: C++, Fortran, Java, ... A lot of effort ...
  • 8. The Two Language Problem: Tangent Works in 2018 Prototyping Production Julia Julia Mathematicians Getting the applied math right. Performance engineers & Architects.
  • 9. TIM: Automatic Model Building for Time-series in Energy The Product Part Ján Dolinský jan.dolinsky@tangent.works
  • 10. TIM Tangent Information Modeller Ján Dolinský jan.dolinsky@tangent.works
  • 11. Content • Time-series Problems in Energy Industry • Data Science Process and Automatic Model Building • Live Modeling Demonstration • Large-scale Forecasting Systems & Why Automation Matters • Julia Language • GefCom 2014, 2017 results & Summary
  • 12. Time-series Problems in Energy Industry • Electricity Load (aggregated and individual) • Technical Losses • System Imbalance • Gas Consumption (District Heating) • District Cooling • Wind Production • Solar Production • Electricity Price Consumption side Production side
  • 13. 1. Collect 2. Store 3. Transform 4. Model 5. Predict 6. Present Scada Systems, Databases, Time Series Management Oracle, MySQL, SAP HANA, Hadoop, ... SAS, SPSS, R, ML libraries, Matlab Power BI, SAP, QlickView etc .. Automation Using TIM Data Science Process
  • 14. 4. Model 5. Predict SAS, SPSS, R, ML libraries, Matlab Automation Using TIM Data Science Process ● feature engineering ● model selection ● tedious ● costs money
  • 15. Data Science Process y(k)=f (Temp(k−3),Temp(k−22)∗Wind(k−1), y(k−24)) NN, SVM, MARS, LASSO, RF, … ? Which features and how many ? // optional
  • 16. Data Science Process y(k)=f (Temp(k−3),Temp(k−22)∗Wind(k−1), y(k−24)) Temp(k-1), Temp(k-2), …, Temp(k-24) Wind(k-1), Wind(k-2), …, Wind(k-24) y(k-1), y(k-2), …, y(k-24) Temp(k-1), Temp(k-2), …, Temp(k-96) Wind(k-1), Wind(k-2), …, Wind(k-96) y(k-1), y(k-2), …, y(k-96) 24 + 24 + 24 + 24*24 = 24*27 = 648 96 + 96 + 96 + 96*96 = 96*99 = 9504 // optional
  • 17. TIM: As a Large-Scale Model Building Engine TIM: As a Data Science Tool Business User Mode produces high-quality models. Advanced User Mode allows to explore new scenarios quickly. Fine-tuning possible for critical assets.
  • 19. GefCom 2017 ex-ante results • Qualifying Match: 1st place out of 177 teams • Final Match: 2nd place out of 12 pre-selected teams Andritz Hackathon 2017 1st place out of 7 ML companies
  • 20. Large-scale Forecasting Systems Platforms where thousands of different consumption and production models are produced. This is not possible without full automation.
  • 21. TIM Worker 1 CAPI JAPI TIM WS REST/SOAP API TIM ConnectorsData Source fs2TIM_WS fs2TIM_CAPI* fs2TIM_JAPI* JuliaDB2TIM_WS ... File System HAKOM TSM JuliaDB InfluxDB MySQL PosgreSQL Oracle TIM Worker N CAPI JAPI TIM .... Asset data & model management Weather data Visualization & UI Grafana Genie ...
  • 22. Julia Language Walks like Python runs like C ● A single platform for prototyping and production → significant gains in efficiency of product development ● Vectorized code runs equally fast as de-vectorized ● Vectorization on a single thread level ● SIMD out of the box ● Direct calls to BLAS ● Distributed parallel computation (multiple threads)
  • 23. Summary • Fully automated model building • Accuracy often outperforms manually build models • Quick insights into a time-series of interest • Tedious model building is an option not necessity
  • 24. Thank you. jan.dolinsky@tangent.works Tangent Works Na Slavíne 1 81106 Bratislava www.tangent.works
  • 25.
  • 26. The Key Concepts in Julia Julia Lectures for Tangent Works Employees Ján Dolinský jan.dolinsky@tangent.works
  • 27. Lectures Block 1: Vectorization vs. De-vectorization • Iterators: The Storage Concept • On Demand Iteration: Generators • MapReduce principle • Broadcasting • Syntactic Loop Fusion  Why is vectorized code fast ?  Why it is not as fast as it could be ?
  • 28. Lectures Block 2: Linear Algebra • Matrices storage and its consequences • Column-major & SIMD • BLAS calls
  • 29. Lectures Block 3: Language Design Elements • Multiple-dispatch • Struct • Mutable & Immutable concept • Modules & Packages generation • JIT and AOT compilation
  • 30. Julia Language Walks like Python runs like C ● A single platform for prototyping and production → significant gains in efficiency of product development ● Vectorized code runs equally fast as de-vectorized ● Vectorization on a single thread level ● SIMD out of the box ● Direct calls to BLAS ● Distributed parallel computation (multiple threads)
  • 31. 2 Years of Using Julia • It is a real programming language • In v0.2 to v0.4 still in heavy development • v0.5 and v0.6 are very useful; static compilation possible • Prototyping and production development is done using the same platform • IDE not fully matured but already quite useful • Debugger