SlideShare a Scribd company logo
1 of 49
Download to read offline
File Imports w/
Elixir, PostgreSQL & file_fdw
Florian Kraft, Architect kloeckner.i
Hi, I am Florian!
@carhillion | floriank.github.io
Data.
CSV & Files / API Calls
How to import data?
(from external systems)
Strategies
tried out and employed
at kloeckner.i
● APIs
● file_fdw
● Producer-Consumer Queueing
● GenStage / Flow
● Import scripts
Strategies
tried out and employed
at kloeckner.i
to varying degrees of success
● APIs
● file_fdw
● Producer-Consumer Queueing
● GenStage / Flow
● Import scripts
Strategies
tried out and employed
at kloeckner.i
● APIs
● file_fdw
● Producer-Consumer Queueing
● GenStage / Flow
● Import scripts
Foreign data wrappers in PostgreSQL
FDW
Foreign (file) data wrappers in PostgreSQL
FDW
CSV
XML
pg_dump
JSON
● There is file_fdw
✓ and it reads CSV Files.
Mechanism Overview
ERP
System
Transformations*
* Clean up
Synchronisation of a data source … or importing?
Create an extension!
defmodule MyImporter.Repo.Migrations.AddFileFdwExtension do
use Ecto.Migration
def up do
execute("CREATE extension file_fdw;" )
end
def down do
execute("DROP extension file_fdw;" )
end
end
Create a virtual file server!
defmodule MyImporter.Repo.Migrations.AddForeignFileServer do
use Ecto.Migration
@server_name "files"
def up do
execute("CREATE SERVER #{ @server_name} FOREIGN DATA WRAPPER file_fdw;" )
end
def down do
execute("DROP SERVER #{ @server_name};")
end
end
Create one Table per file
defmodule MyImporter.Repo.Migrations.AddForeignCompaniesTable do
use Ecto.Migration
@up ~s"""
CREATE FOREIGN TABLE companies (
company_id text,
name text
) SERVER files
OPTIONS ( filename '/files/Company.csv', format 'csv');
"""
def change do
execute(@up, “DROP FOREIGN TABLE companies;“)
end
end
Create one Table per file
defmodule MyImporter.Repo.Migrations.AddForeignCompaniesTable do
use Ecto.Migration
@up ~s"""
CREATE FOREIGN TABLE companies (
company_id text,
name text
) SERVER files
OPTIONS ( filename '/files/Company.csv', format 'csv');
"""
def change do
execute(@up, “DROP FOREIGN TABLE companies;“)
end
end
Oh boi, do I love me an RFC
We never get what we want
This is what we get
Give me some options!
defmodule MyImporter.Repo.Migrations.AddForeignCompaniesTable do
use Ecto.Migration
@up ~s"""
CREATE FOREIGN TABLE companies (
company_id text,
name text
) SERVER files
OPTIONS ( filename '/files/Company.csv', format 'csv', header ‘on’,
delimiter ‘|’);
"""
def change do
execute(@up, “DROP FOREIGN TABLE companies;“)
end
end
This is why we cannot have nice things
Give me some (more) options!
defmodule MyImporter.Repo.Migrations.AddForeignCompaniesTable do
use Ecto.Migration
@up ~s"""
CREATE FOREIGN TABLE companies (
company_id text,
name text
) SERVER files
OPTIONS ( filename '/files/Company.csv',
format 'csv', header ‘on’, delimiter ‘|’, quote E‘x01’);
"""
def change do
execute(@up, “DROP FOREIGN TABLE companies;“)
end
end
And that works!
..but we still have not imported anything :(
Selecting from CSV Files directly
> SELECT name, company_id FROM companies
name company_id
ACME Corp. 11012
Wonka Industries 22133
Stark Industries 55251
[...] [...]
Using PostgreSQL’s built in functions
> SELECT TRIM(name), company_id, MD5(CONCAT(TRIM(name), company_id)) AS hash
FROM companies
name company_id hash
ACME Corp. 11012 b4fa5d3e03248e285c6cc57ac4f4862e
Wonka Industries 22133 9256bbfa403aee8a35bf3bb4c08f3500
Stark
Industries
55251 91c113bef46e20ab167a8d4633bc0901
name company_id hash
ACME Corp.1 11012 b4fa5dge03238e285c6cc57ac4f3822e
Wonka Industries 22133 9256bbfa403aee8a35bf3bb4c08f3500
Stark
Industries
55251 91c113bef46e20ab167a8d4633bc0901
name company_id hash
ACME Corp. 11012 b4fa5d3e0348e285c6cc57ac4f4862e2
Wonka Industries 22133 9256bbfa403aee8a35bf3bb4c08f3500
Stark
Industries
55251 91c113bef46e20ab167a8d4633bc0901
External CSV as table Internal
name company_id
ACME Corp.1 11012
LEFT JOIN
Using JOINs
> SELECT external.company_id, external.name
FROM companies external
LEFT JOIN imported_companies imported
ON MD5(CONCAT(external.company_id, TRIM(external.name)))
= MD5(CONCAT(imported.external_id, TRIM(imported.name)))
WHERE imported.external_id IS NULL
name company_id
ACME Corp.1 11012
SQL
Module
Runs SQL periodically
Import
Module
Imports the changed data
Data
Show me some Elixir, already!
Show me some Elixir, already!
defmodule Synchronize.Companies.SQLModule do
def sync do
find_companies() |> upsert()
end
def upsert(companies) do
Importer.run(companies, &map/ 1, &import_batch/ 1)
end
end
Show me some Elixir code, already!
defmodule Synchronize.Companies.SQLModule do
def find_companies do
SQL.stream(Repo,
"""
SELECT external.company_id, external.name
FROM companies external
LEFT JOIN internal_companies internal
ON MD5(CONCAT(external.company_id, TRIM(external.name)))
= MD5(CONCAT(internal.external_id, TRIM(internal.name)))
WHERE internal.external_id IS NULL
"""
)
end
end
Show me some elixir code - SQL Module (cont.)
defmodule MyImporter.Companies.SQLModule do
defp map([external_id, name]) do
%{
name: String.trim(name),
external_id: external_id,
inserted_at: DateTime.utc_now(),
updated_at: DateTime.utc_now()
}
end
defp import_batch(batch) do
Repo.insert_all(Company, batch, on_conflict: :replace_all,
conflict_target: :external_id)
end
end
Show me some elixir code - Import Module 2
defmodule MyImporter.Companies.ImportModule do
defp sync(source, item_mapper, batch_processor) do
processor = fn batch ->
batch_processor.(batch)
batch
End
Repo.transaction(fn ->
source
|> Stream.flat_map(source, fn %{rows: rows} -> rows end)
|> Stream.map(item_mapper) |> Stream.chunk_every(2000)
|> Stream.flat_map(processor)
|> Enum.count()
end)
end
end
Trigger mechanism
SQL
Module
Runs SQL periodically
Import
Module
Imports the changed data
Data
Trigger
Check a timestamp, save the
state in a GenServer
Supervisor
Why Elixir?
Why not to use
this strategy?
Like… all the time?
● Business logic lives in the
database
○ Harder to change
○ Database Server needs to know
about the files
● Foreign Table is tied to the file
● Requires SQL knowledge*
● It’s actually a bit harder to test
than a single import script
○ It doesn’t make sense for single
imports
Why to use this
strategy?
Like… some of the time?
● It’s fast
○ The database itself is the limiting
factor
○ We effectively run COPY on query
● It’s really nice for comparing
state (thank you, SQL!)
○ Easy to get diffs
○ Easy to join files in memory to
create a substate
● It’s straightforward to
implement a synching
mechanism
30x faster
By switching strategies for importing customer data
(Full disclosure: By switching queue based strategy to synching strategy)
Learnings
● Do not treat your database as dumb storage
○ Leverage its capabilities!
○ Read the docs
● There is more than one way to do things
● “Making it go fast” == “Not doing a lot of things”
Special thanks
We’re hiring!
Thanks!
...also for letting me sneak in a talk about PostgreSQL at an Elixir meetup

More Related Content

What's hot

WINDOWS ADMINISTRATION AND WORKING WITH OBJECTS : PowerShell ISE
WINDOWS ADMINISTRATION AND WORKING WITH OBJECTS : PowerShell ISEWINDOWS ADMINISTRATION AND WORKING WITH OBJECTS : PowerShell ISE
WINDOWS ADMINISTRATION AND WORKING WITH OBJECTS : PowerShell ISEHitesh Mohapatra
 
Currying and Partial Function Application (PFA)
Currying and Partial Function Application (PFA)Currying and Partial Function Application (PFA)
Currying and Partial Function Application (PFA)Dhaval Dalal
 
Into the ZF2 Service Manager
Into the ZF2 Service ManagerInto the ZF2 Service Manager
Into the ZF2 Service ManagerChris Tankersley
 
Header files of c++ unit 3 -topic 3
Header files of c++ unit 3 -topic 3Header files of c++ unit 3 -topic 3
Header files of c++ unit 3 -topic 3MOHIT TOMAR
 
DRYing to Monad in Java8
DRYing to Monad in Java8DRYing to Monad in Java8
DRYing to Monad in Java8Dhaval Dalal
 
SPFx: Working with SharePoint Content
SPFx: Working with SharePoint ContentSPFx: Working with SharePoint Content
SPFx: Working with SharePoint ContentVladimir Medina
 
SPFx working with SharePoint data
SPFx working with SharePoint dataSPFx working with SharePoint data
SPFx working with SharePoint dataVladimir Medina
 
Oracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakesOracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakesJim Mlodgenski
 
Python - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave ParkPython - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave Parkpointstechgeeks
 
React Native One Day
React Native One DayReact Native One Day
React Native One DayTroy Miles
 
pytest로 파이썬 코드 테스트하기
pytest로 파이썬 코드 테스트하기pytest로 파이썬 코드 테스트하기
pytest로 파이썬 코드 테스트하기Yeongseon Choe
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: RebirthJohn De Goes
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
 
Practical PHP 5.3
Practical PHP 5.3Practical PHP 5.3
Practical PHP 5.3Nate Abele
 

What's hot (19)

WINDOWS ADMINISTRATION AND WORKING WITH OBJECTS : PowerShell ISE
WINDOWS ADMINISTRATION AND WORKING WITH OBJECTS : PowerShell ISEWINDOWS ADMINISTRATION AND WORKING WITH OBJECTS : PowerShell ISE
WINDOWS ADMINISTRATION AND WORKING WITH OBJECTS : PowerShell ISE
 
Php Intermediate
Php IntermediatePhp Intermediate
Php Intermediate
 
Currying and Partial Function Application (PFA)
Currying and Partial Function Application (PFA)Currying and Partial Function Application (PFA)
Currying and Partial Function Application (PFA)
 
Into the ZF2 Service Manager
Into the ZF2 Service ManagerInto the ZF2 Service Manager
Into the ZF2 Service Manager
 
Header files of c++ unit 3 -topic 3
Header files of c++ unit 3 -topic 3Header files of c++ unit 3 -topic 3
Header files of c++ unit 3 -topic 3
 
DRYing to Monad in Java8
DRYing to Monad in Java8DRYing to Monad in Java8
DRYing to Monad in Java8
 
SPFx: Working with SharePoint Content
SPFx: Working with SharePoint ContentSPFx: Working with SharePoint Content
SPFx: Working with SharePoint Content
 
SPFx working with SharePoint data
SPFx working with SharePoint dataSPFx working with SharePoint data
SPFx working with SharePoint data
 
RxSwift to Combine
RxSwift to CombineRxSwift to Combine
RxSwift to Combine
 
Oracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakesOracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakes
 
RxSwift to Combine
RxSwift to CombineRxSwift to Combine
RxSwift to Combine
 
Python - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave ParkPython - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave Park
 
React Native One Day
React Native One DayReact Native One Day
React Native One Day
 
Docopt
DocoptDocopt
Docopt
 
pytest로 파이썬 코드 테스트하기
pytest로 파이썬 코드 테스트하기pytest로 파이썬 코드 테스트하기
pytest로 파이썬 코드 테스트하기
 
Reason and GraphQL
Reason and GraphQLReason and GraphQL
Reason and GraphQL
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: Rebirth
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Practical PHP 5.3
Practical PHP 5.3Practical PHP 5.3
Practical PHP 5.3
 

Similar to File Imports w/ Elixir, PostgreSQL & file_fdw

FMK2019 being an optimist in a pessimistic world by vincenzo menanno
FMK2019 being an optimist in a pessimistic world by vincenzo menannoFMK2019 being an optimist in a pessimistic world by vincenzo menanno
FMK2019 being an optimist in a pessimistic world by vincenzo menannoVerein FM Konferenz
 
KScope14 Jython Scripting
KScope14 Jython ScriptingKScope14 Jython Scripting
KScope14 Jython ScriptingAlithya
 
Data driven testing using Integrant & Spec
Data driven testing using Integrant & SpecData driven testing using Integrant & Spec
Data driven testing using Integrant & SpecLeon Mergen
 
Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12Michelangelo van Dam
 
SymfonyCon Berlin 2016 - Symfony Plugin for PhpStorm - 3 years later
SymfonyCon Berlin 2016 - Symfony Plugin for PhpStorm - 3 years laterSymfonyCon Berlin 2016 - Symfony Plugin for PhpStorm - 3 years later
SymfonyCon Berlin 2016 - Symfony Plugin for PhpStorm - 3 years laterHaehnchen
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Michelangelo van Dam
 
Phoenix for laravel developers
Phoenix for laravel developersPhoenix for laravel developers
Phoenix for laravel developersLuiz Messias
 
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...Flink Forward
 
Working With The Symfony Admin Generator
Working With The Symfony Admin GeneratorWorking With The Symfony Admin Generator
Working With The Symfony Admin GeneratorJohn Cleveley
 
Incredible Machine with Pipelines and Generators
Incredible Machine with Pipelines and GeneratorsIncredible Machine with Pipelines and Generators
Incredible Machine with Pipelines and Generatorsdantleech
 
Yaetos Tech Overview
Yaetos Tech OverviewYaetos Tech Overview
Yaetos Tech Overviewprevota
 
Lean Php Presentation
Lean Php PresentationLean Php Presentation
Lean Php PresentationAlan Pinstein
 
Integration-Monday-Stateful-Programming-Models-Serverless-Functions
Integration-Monday-Stateful-Programming-Models-Serverless-FunctionsIntegration-Monday-Stateful-Programming-Models-Serverless-Functions
Integration-Monday-Stateful-Programming-Models-Serverless-FunctionsBizTalk360
 
Designing Tools and Implementing Workflows to Enhance Serials EDI
Designing Tools and Implementing Workflows to Enhance Serials EDIDesigning Tools and Implementing Workflows to Enhance Serials EDI
Designing Tools and Implementing Workflows to Enhance Serials EDIChristian Burris
 
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersMSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersDave Bost
 
Adding a modern twist to legacy web applications
Adding a modern twist to legacy web applicationsAdding a modern twist to legacy web applications
Adding a modern twist to legacy web applicationsJeff Durta
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTPMustafa TURAN
 
The why and how of moving to php 5.4/5.5
The why and how of moving to php 5.4/5.5The why and how of moving to php 5.4/5.5
The why and how of moving to php 5.4/5.5Wim Godden
 

Similar to File Imports w/ Elixir, PostgreSQL & file_fdw (20)

FMK2019 being an optimist in a pessimistic world by vincenzo menanno
FMK2019 being an optimist in a pessimistic world by vincenzo menannoFMK2019 being an optimist in a pessimistic world by vincenzo menanno
FMK2019 being an optimist in a pessimistic world by vincenzo menanno
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
KScope14 Jython Scripting
KScope14 Jython ScriptingKScope14 Jython Scripting
KScope14 Jython Scripting
 
Data driven testing using Integrant & Spec
Data driven testing using Integrant & SpecData driven testing using Integrant & Spec
Data driven testing using Integrant & Spec
 
Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12
 
SymfonyCon Berlin 2016 - Symfony Plugin for PhpStorm - 3 years later
SymfonyCon Berlin 2016 - Symfony Plugin for PhpStorm - 3 years laterSymfonyCon Berlin 2016 - Symfony Plugin for PhpStorm - 3 years later
SymfonyCon Berlin 2016 - Symfony Plugin for PhpStorm - 3 years later
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012
 
Phoenix for laravel developers
Phoenix for laravel developersPhoenix for laravel developers
Phoenix for laravel developers
 
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
Flink Forward Berlin 2017: Maciek Próchniak - TouK Nussknacker - creating Fli...
 
Working With The Symfony Admin Generator
Working With The Symfony Admin GeneratorWorking With The Symfony Admin Generator
Working With The Symfony Admin Generator
 
Incredible Machine with Pipelines and Generators
Incredible Machine with Pipelines and GeneratorsIncredible Machine with Pipelines and Generators
Incredible Machine with Pipelines and Generators
 
Yaetos Tech Overview
Yaetos Tech OverviewYaetos Tech Overview
Yaetos Tech Overview
 
Having Fun with Play
Having Fun with PlayHaving Fun with Play
Having Fun with Play
 
Lean Php Presentation
Lean Php PresentationLean Php Presentation
Lean Php Presentation
 
Integration-Monday-Stateful-Programming-Models-Serverless-Functions
Integration-Monday-Stateful-Programming-Models-Serverless-FunctionsIntegration-Monday-Stateful-Programming-Models-Serverless-Functions
Integration-Monday-Stateful-Programming-Models-Serverless-Functions
 
Designing Tools and Implementing Workflows to Enhance Serials EDI
Designing Tools and Implementing Workflows to Enhance Serials EDIDesigning Tools and Implementing Workflows to Enhance Serials EDI
Designing Tools and Implementing Workflows to Enhance Serials EDI
 
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersMSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
 
Adding a modern twist to legacy web applications
Adding a modern twist to legacy web applicationsAdding a modern twist to legacy web applications
Adding a modern twist to legacy web applications
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTP
 
The why and how of moving to php 5.4/5.5
The why and how of moving to php 5.4/5.5The why and how of moving to php 5.4/5.5
The why and how of moving to php 5.4/5.5
 

Recently uploaded

Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixingviprabot1
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 

Recently uploaded (20)

Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixing
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 

File Imports w/ Elixir, PostgreSQL & file_fdw

  • 1. File Imports w/ Elixir, PostgreSQL & file_fdw Florian Kraft, Architect kloeckner.i
  • 2. Hi, I am Florian! @carhillion | floriank.github.io
  • 3.
  • 4.
  • 5.
  • 6.
  • 8. CSV & Files / API Calls
  • 9. How to import data? (from external systems)
  • 10. Strategies tried out and employed at kloeckner.i ● APIs ● file_fdw ● Producer-Consumer Queueing ● GenStage / Flow ● Import scripts
  • 11. Strategies tried out and employed at kloeckner.i to varying degrees of success ● APIs ● file_fdw ● Producer-Consumer Queueing ● GenStage / Flow ● Import scripts
  • 12. Strategies tried out and employed at kloeckner.i ● APIs ● file_fdw ● Producer-Consumer Queueing ● GenStage / Flow ● Import scripts
  • 13.
  • 14.
  • 15. Foreign data wrappers in PostgreSQL FDW
  • 16. Foreign (file) data wrappers in PostgreSQL FDW CSV XML pg_dump JSON
  • 17. ● There is file_fdw ✓ and it reads CSV Files.
  • 19. Synchronisation of a data source … or importing?
  • 20. Create an extension! defmodule MyImporter.Repo.Migrations.AddFileFdwExtension do use Ecto.Migration def up do execute("CREATE extension file_fdw;" ) end def down do execute("DROP extension file_fdw;" ) end end
  • 21. Create a virtual file server! defmodule MyImporter.Repo.Migrations.AddForeignFileServer do use Ecto.Migration @server_name "files" def up do execute("CREATE SERVER #{ @server_name} FOREIGN DATA WRAPPER file_fdw;" ) end def down do execute("DROP SERVER #{ @server_name};") end end
  • 22. Create one Table per file defmodule MyImporter.Repo.Migrations.AddForeignCompaniesTable do use Ecto.Migration @up ~s""" CREATE FOREIGN TABLE companies ( company_id text, name text ) SERVER files OPTIONS ( filename '/files/Company.csv', format 'csv'); """ def change do execute(@up, “DROP FOREIGN TABLE companies;“) end end
  • 23. Create one Table per file defmodule MyImporter.Repo.Migrations.AddForeignCompaniesTable do use Ecto.Migration @up ~s""" CREATE FOREIGN TABLE companies ( company_id text, name text ) SERVER files OPTIONS ( filename '/files/Company.csv', format 'csv'); """ def change do execute(@up, “DROP FOREIGN TABLE companies;“) end end
  • 24. Oh boi, do I love me an RFC
  • 25. We never get what we want
  • 26. This is what we get
  • 27. Give me some options! defmodule MyImporter.Repo.Migrations.AddForeignCompaniesTable do use Ecto.Migration @up ~s""" CREATE FOREIGN TABLE companies ( company_id text, name text ) SERVER files OPTIONS ( filename '/files/Company.csv', format 'csv', header ‘on’, delimiter ‘|’); """ def change do execute(@up, “DROP FOREIGN TABLE companies;“) end end
  • 28. This is why we cannot have nice things
  • 29. Give me some (more) options! defmodule MyImporter.Repo.Migrations.AddForeignCompaniesTable do use Ecto.Migration @up ~s""" CREATE FOREIGN TABLE companies ( company_id text, name text ) SERVER files OPTIONS ( filename '/files/Company.csv', format 'csv', header ‘on’, delimiter ‘|’, quote E‘x01’); """ def change do execute(@up, “DROP FOREIGN TABLE companies;“) end end
  • 30. And that works! ..but we still have not imported anything :(
  • 31. Selecting from CSV Files directly > SELECT name, company_id FROM companies name company_id ACME Corp. 11012 Wonka Industries 22133 Stark Industries 55251 [...] [...]
  • 32. Using PostgreSQL’s built in functions > SELECT TRIM(name), company_id, MD5(CONCAT(TRIM(name), company_id)) AS hash FROM companies name company_id hash ACME Corp. 11012 b4fa5d3e03248e285c6cc57ac4f4862e Wonka Industries 22133 9256bbfa403aee8a35bf3bb4c08f3500 Stark Industries 55251 91c113bef46e20ab167a8d4633bc0901
  • 33. name company_id hash ACME Corp.1 11012 b4fa5dge03238e285c6cc57ac4f3822e Wonka Industries 22133 9256bbfa403aee8a35bf3bb4c08f3500 Stark Industries 55251 91c113bef46e20ab167a8d4633bc0901 name company_id hash ACME Corp. 11012 b4fa5d3e0348e285c6cc57ac4f4862e2 Wonka Industries 22133 9256bbfa403aee8a35bf3bb4c08f3500 Stark Industries 55251 91c113bef46e20ab167a8d4633bc0901 External CSV as table Internal name company_id ACME Corp.1 11012 LEFT JOIN
  • 34. Using JOINs > SELECT external.company_id, external.name FROM companies external LEFT JOIN imported_companies imported ON MD5(CONCAT(external.company_id, TRIM(external.name))) = MD5(CONCAT(imported.external_id, TRIM(imported.name))) WHERE imported.external_id IS NULL name company_id ACME Corp.1 11012
  • 35. SQL Module Runs SQL periodically Import Module Imports the changed data Data Show me some Elixir, already!
  • 36. Show me some Elixir, already! defmodule Synchronize.Companies.SQLModule do def sync do find_companies() |> upsert() end def upsert(companies) do Importer.run(companies, &map/ 1, &import_batch/ 1) end end
  • 37. Show me some Elixir code, already! defmodule Synchronize.Companies.SQLModule do def find_companies do SQL.stream(Repo, """ SELECT external.company_id, external.name FROM companies external LEFT JOIN internal_companies internal ON MD5(CONCAT(external.company_id, TRIM(external.name))) = MD5(CONCAT(internal.external_id, TRIM(internal.name))) WHERE internal.external_id IS NULL """ ) end end
  • 38. Show me some elixir code - SQL Module (cont.) defmodule MyImporter.Companies.SQLModule do defp map([external_id, name]) do %{ name: String.trim(name), external_id: external_id, inserted_at: DateTime.utc_now(), updated_at: DateTime.utc_now() } end defp import_batch(batch) do Repo.insert_all(Company, batch, on_conflict: :replace_all, conflict_target: :external_id) end end
  • 39. Show me some elixir code - Import Module 2 defmodule MyImporter.Companies.ImportModule do defp sync(source, item_mapper, batch_processor) do processor = fn batch -> batch_processor.(batch) batch End Repo.transaction(fn -> source |> Stream.flat_map(source, fn %{rows: rows} -> rows end) |> Stream.map(item_mapper) |> Stream.chunk_every(2000) |> Stream.flat_map(processor) |> Enum.count() end) end end
  • 40. Trigger mechanism SQL Module Runs SQL periodically Import Module Imports the changed data Data Trigger Check a timestamp, save the state in a GenServer Supervisor
  • 42. Why not to use this strategy? Like… all the time? ● Business logic lives in the database ○ Harder to change ○ Database Server needs to know about the files ● Foreign Table is tied to the file ● Requires SQL knowledge* ● It’s actually a bit harder to test than a single import script ○ It doesn’t make sense for single imports
  • 43. Why to use this strategy? Like… some of the time? ● It’s fast ○ The database itself is the limiting factor ○ We effectively run COPY on query ● It’s really nice for comparing state (thank you, SQL!) ○ Easy to get diffs ○ Easy to join files in memory to create a substate ● It’s straightforward to implement a synching mechanism
  • 44. 30x faster By switching strategies for importing customer data (Full disclosure: By switching queue based strategy to synching strategy)
  • 45. Learnings ● Do not treat your database as dumb storage ○ Leverage its capabilities! ○ Read the docs ● There is more than one way to do things ● “Making it go fast” == “Not doing a lot of things”
  • 48.
  • 49. Thanks! ...also for letting me sneak in a talk about PostgreSQL at an Elixir meetup