SlideShare a Scribd company logo
1 of 23
Download to read offline
Data ingestion
with PyEXASOL and UDFs
Vitaly Markov
What is Badoo?
❖ Dating-focused
❖ Social network
❖ Multiple brands
❖ ~ 190 countries
❖ ~ 47 languages
❖ ~ 400,000,000 users
Exasol setup
❖ Bare-metal hardware
❖ 20 (+1) nodes:
➢ ~ 480 CPU cores total
➢ ~ 14 Tb RAM total
➢ ~ 350 Tb raw hot data
❖ ~ 20,000,000,000 events daily
❖ ~ 98% cache hit rate
Python sadness
❖ Network-bound (WiFi, VPN)
❖ Benefits from compression
❖ CPU-bound, but many cores
❖ Benefits from multiprocessing
Person with laptop Server in datacenter
PyEXASOL
❖ Websocket protocol
❖ Python 3.6+
❖ HTTP transport, compression
❖ Multiprocessing
❖ Feature-rich
❖ For analysts with laptops
❖ For parallel machine learning
What's wrong with ODBC?
ODBC
driver
Python
objects
Pandas
DataFrame
slow
HTTP Transport
Pandas
DataFrame
(C reader)
➢ compression
➢ encryption
Parallel HTTP Transport
200M users are scored in 3 minutes
100% CPU usage on all cores
Progress tracking:
How it looks in Exasol
EXPORT my_table INTO CSV
AT 'http://27.1.0.30:33601' FILE '000.csv'
AT 'http://27.1.0.31:41733' FILE '001.csv'
AT 'http://27.1.0.32:45014' FILE '002.csv'
AT 'http://27.1.0.33:42071' FILE '003.csv'
AT 'http://27.1.0.34:36669' FILE '004.csv'
AT 'http://27.1.0.35:36794' FILE '005.csv'
Other features
❖ Multi-host connection strings
❖ SQL formatter
❖ SSL encryption
❖ Local config (.ini file)
❖ Metadata requests
❖ Profiling of last query
❖ Built-in UDF script output server
GitHub: https://github.com/badoo/pyexasol/
PyPI: https://pypi.org/project/pyexasol/
> pip install pyexasol
Available now
Take a breather
UDF Framework
❖ Run anything in Exasol
❖ In massively parallel manner
❖ Very flexible
❖ Great for IMPORT / EXPORT
❖ But:
➢ Requires some skill
➢ Lack of practical manual
Parallel import from MySQL
❖ MariaDB JDBC driver
❖ Protocol compression
❖ ~ 200 MySQL hosts
❖ ~ 376,400 shards
❖ Up to 800 parallel threads
❖ 500,000,000 rows in 10m
Hadoop ORC support
❖ Columnar format
❖ Native Java API
❖ VectorizedRowBatch (fast!)
❖ Exasol can pre-sort data
❖ Optional bloom filters
❖ Many custom settings
Logging best practices
INF VM:139966036683480 B:1 I:16 2018-09-22 18:30:57.718 [Start BATCH]
{"total_memory":61,"db_host":"dbs156.mlan","max_memory":494}
INF VM:139966036683480 B:1 I:16 2018-09-22 18:30:57.939 [Finish spot]
{"spot_id":"189607","db_host":"dbs156.mlan","row_count":16}
INF VM:139966036683480 B:1 I:16 2018-09-22 18:30:57.976 [Finish spot]
{"spot_id":"189614","db_host":"dbs156.mlan","row_count":24}
INF VM:139966036683480 B:1 I:16 2018-09-22 18:31:00.788 [Finish BATCH]
{"total_memory":61,"total_duration":3.067,"row_count_total":2924,"spot_count_total
":174,"fetch_duration":0.695,"db_host":"dbs156.mlan","statement_duration":1.99,"co
mpression":"ON","bytes_sent":"797967","max_memory":494}
PyEXASOL function: execute_udf_output()
Visual execution graph
Memory pitfall
Exasol DB RAM Cluster OS UDF
Pre-allocated during startup
Set in ExaOperations: DB RAM, Huge Pages
Required for
OS
Tiny!
❖ Always monitor memory usage
❖ Exasol may lose parallelism if >4Gb RAM used per node
❖ Increase this limit if you need it (contact Customer Support)
Long transaction pitfall
Process 2
Process 1
Process 3
Process 4
❖ Some groups are slow
❖ Not obvious
❖ Hard to profile
❖ Holding locks
❖ Holding resources
❖ QUERY_TIMEOUT helps
Gists
❖ Exasol UDF: Import from MySQL
❖ Exasol UDF: Import from Hadoop ORC
❖ Exasol UDF: Export to Hadoop ORC
Icons from the Noun Project
Thank you!
https://tinyurl.com/badoo-exa
v.markov@corp.badoo.com

More Related Content

What's hot

Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports Netronome
 
SPDY быстрее на 146% (Валентин Бартенев)
SPDY быстрее на 146% (Валентин Бартенев)SPDY быстрее на 146% (Валентин Бартенев)
SPDY быстрее на 146% (Валентин Бартенев)Ontico
 
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server ArchitectureUsing Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server ArchitectureNetronome
 
Like loggly using open source
Like loggly using open sourceLike loggly using open source
Like loggly using open sourceThomas Alrin
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Cocoa + HTTP
Cocoa + HTTPCocoa + HTTP
Cocoa + HTTPrentzsch
 
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014N Masahiro
 
Mongo db3.0 wired_tiger_storage_engine
Mongo db3.0 wired_tiger_storage_engineMongo db3.0 wired_tiger_storage_engine
Mongo db3.0 wired_tiger_storage_engineKenny Gorman
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...Hisham Mardam-Bey
 
FFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemFFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemAnnika Wickert
 
system requirement for network simulator projects
 system requirement for network simulator projects system requirement for network simulator projects
system requirement for network simulator projectsparry prabhu
 
AS201701 - Building an Internet backbone with pure 1he servers and Linux
AS201701 - Building an Internet backbone with pure 1he servers and LinuxAS201701 - Building an Internet backbone with pure 1he servers and Linux
AS201701 - Building an Internet backbone with pure 1he servers and LinuxMaximilan Wilhelm
 
WUG #009 - OpenVNet 0.7 presentation
WUG #009 - OpenVNet 0.7 presentationWUG #009 - OpenVNet 0.7 presentation
WUG #009 - OpenVNet 0.7 presentationAxsh Co. LTD
 
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haimharryvanhaaren
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsSpeedment, Inc.
 

What's hot (20)

Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports
 
SPDY быстрее на 146% (Валентин Бартенев)
SPDY быстрее на 146% (Валентин Бартенев)SPDY быстрее на 146% (Валентин Бартенев)
SPDY быстрее на 146% (Валентин Бартенев)
 
OpenStreetMap Belarus Tile Server
OpenStreetMap Belarus Tile ServerOpenStreetMap Belarus Tile Server
OpenStreetMap Belarus Tile Server
 
RouterOS v6
RouterOS v6RouterOS v6
RouterOS v6
 
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server ArchitectureUsing Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
 
Like loggly using open source
Like loggly using open sourceLike loggly using open source
Like loggly using open source
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Cocoa + HTTP
Cocoa + HTTPCocoa + HTTP
Cocoa + HTTP
 
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014
 
Mongo db3.0 wired_tiger_storage_engine
Mongo db3.0 wired_tiger_storage_engineMongo db3.0 wired_tiger_storage_engine
Mongo db3.0 wired_tiger_storage_engine
 
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...Type safe, versioned, and rewindable stream processing  with  Apache {Avro, K...
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
 
FFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemFFMEET: running a non-profit conference system
FFMEET: running a non-profit conference system
 
system requirement for network simulator projects
 system requirement for network simulator projects system requirement for network simulator projects
system requirement for network simulator projects
 
Fluentd meetup
Fluentd meetupFluentd meetup
Fluentd meetup
 
EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)
 
IoT Project postmortem
IoT Project postmortemIoT Project postmortem
IoT Project postmortem
 
AS201701 - Building an Internet backbone with pure 1he servers and Linux
AS201701 - Building an Internet backbone with pure 1he servers and LinuxAS201701 - Building an Internet backbone with pure 1he servers and Linux
AS201701 - Building an Internet backbone with pure 1he servers and Linux
 
WUG #009 - OpenVNet 0.7 presentation
WUG #009 - OpenVNet 0.7 presentationWUG #009 - OpenVNet 0.7 presentation
WUG #009 - OpenVNet 0.7 presentation
 
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haim
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
 

Similar to Exasol User Group, 8 Oct 2018: PyEXASOL and UDF

QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...QNIB Solutions
 
Newsql 2015-150213024325-conversion-gate01
Newsql 2015-150213024325-conversion-gate01Newsql 2015-150213024325-conversion-gate01
Newsql 2015-150213024325-conversion-gate01Jagadeesha DG
 
NewSQL overview, Feb 2015
NewSQL overview, Feb 2015NewSQL overview, Feb 2015
NewSQL overview, Feb 2015Ivan Glushkov
 
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)Ontico
 
Hadoop Networking at Datasift
Hadoop Networking at DatasiftHadoop Networking at Datasift
Hadoop Networking at Datasifthuguk
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
Experiences with Microservices at Tuenti
Experiences with Microservices at TuentiExperiences with Microservices at Tuenti
Experiences with Microservices at TuentiAndrés Viedma Peláez
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] IO Visor Project
 
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangAccelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangCeph Community
 
Stupid Boot Tricks: using ipxe and chef to get to boot management bliss
Stupid Boot Tricks: using ipxe and chef to get to boot management blissStupid Boot Tricks: using ipxe and chef to get to boot management bliss
Stupid Boot Tricks: using ipxe and chef to get to boot management blissmacslide
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good? Alkin Tezuysal
 
ODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Workgroup
 
Recent Developments in Donard
Recent Developments in DonardRecent Developments in Donard
Recent Developments in DonardPMC-Sierra Inc.
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...ScyllaDB
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machinesinside-BigData.com
 

Similar to Exasol User Group, 8 Oct 2018: PyEXASOL and UDF (20)

QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...
 
Newsql 2015-150213024325-conversion-gate01
Newsql 2015-150213024325-conversion-gate01Newsql 2015-150213024325-conversion-gate01
Newsql 2015-150213024325-conversion-gate01
 
NewSQL overview, Feb 2015
NewSQL overview, Feb 2015NewSQL overview, Feb 2015
NewSQL overview, Feb 2015
 
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
 
Hadoop Networking at Datasift
Hadoop Networking at DatasiftHadoop Networking at Datasift
Hadoop Networking at Datasift
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Experiences with Microservices at Tuenti
Experiences with Microservices at TuentiExperiences with Microservices at Tuenti
Experiences with Microservices at Tuenti
 
Design review
Design reviewDesign review
Design review
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangAccelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
 
XS Boston 2008 Network Topology
XS Boston 2008 Network TopologyXS Boston 2008 Network Topology
XS Boston 2008 Network Topology
 
Stupid Boot Tricks: using ipxe and chef to get to boot management bliss
Stupid Boot Tricks: using ipxe and chef to get to boot management blissStupid Boot Tricks: using ipxe and chef to get to boot management bliss
Stupid Boot Tricks: using ipxe and chef to get to boot management bliss
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good?
 
ODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & Feeds
 
Recent Developments in Donard
Recent Developments in DonardRecent Developments in Donard
Recent Developments in Donard
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machines
 

Recently uploaded

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 

Recently uploaded (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 

Exasol User Group, 8 Oct 2018: PyEXASOL and UDF

  • 1. Data ingestion with PyEXASOL and UDFs Vitaly Markov
  • 2. What is Badoo? ❖ Dating-focused ❖ Social network ❖ Multiple brands ❖ ~ 190 countries ❖ ~ 47 languages ❖ ~ 400,000,000 users
  • 3. Exasol setup ❖ Bare-metal hardware ❖ 20 (+1) nodes: ➢ ~ 480 CPU cores total ➢ ~ 14 Tb RAM total ➢ ~ 350 Tb raw hot data ❖ ~ 20,000,000,000 events daily ❖ ~ 98% cache hit rate
  • 5. ❖ Network-bound (WiFi, VPN) ❖ Benefits from compression ❖ CPU-bound, but many cores ❖ Benefits from multiprocessing Person with laptop Server in datacenter
  • 6. PyEXASOL ❖ Websocket protocol ❖ Python 3.6+ ❖ HTTP transport, compression ❖ Multiprocessing ❖ Feature-rich ❖ For analysts with laptops ❖ For parallel machine learning
  • 7. What's wrong with ODBC? ODBC driver Python objects Pandas DataFrame slow
  • 9. Parallel HTTP Transport 200M users are scored in 3 minutes
  • 10. 100% CPU usage on all cores Progress tracking:
  • 11. How it looks in Exasol EXPORT my_table INTO CSV AT 'http://27.1.0.30:33601' FILE '000.csv' AT 'http://27.1.0.31:41733' FILE '001.csv' AT 'http://27.1.0.32:45014' FILE '002.csv' AT 'http://27.1.0.33:42071' FILE '003.csv' AT 'http://27.1.0.34:36669' FILE '004.csv' AT 'http://27.1.0.35:36794' FILE '005.csv'
  • 12. Other features ❖ Multi-host connection strings ❖ SQL formatter ❖ SSL encryption ❖ Local config (.ini file) ❖ Metadata requests ❖ Profiling of last query ❖ Built-in UDF script output server
  • 15. UDF Framework ❖ Run anything in Exasol ❖ In massively parallel manner ❖ Very flexible ❖ Great for IMPORT / EXPORT ❖ But: ➢ Requires some skill ➢ Lack of practical manual
  • 16. Parallel import from MySQL ❖ MariaDB JDBC driver ❖ Protocol compression ❖ ~ 200 MySQL hosts ❖ ~ 376,400 shards ❖ Up to 800 parallel threads ❖ 500,000,000 rows in 10m
  • 17. Hadoop ORC support ❖ Columnar format ❖ Native Java API ❖ VectorizedRowBatch (fast!) ❖ Exasol can pre-sort data ❖ Optional bloom filters ❖ Many custom settings
  • 18. Logging best practices INF VM:139966036683480 B:1 I:16 2018-09-22 18:30:57.718 [Start BATCH] {"total_memory":61,"db_host":"dbs156.mlan","max_memory":494} INF VM:139966036683480 B:1 I:16 2018-09-22 18:30:57.939 [Finish spot] {"spot_id":"189607","db_host":"dbs156.mlan","row_count":16} INF VM:139966036683480 B:1 I:16 2018-09-22 18:30:57.976 [Finish spot] {"spot_id":"189614","db_host":"dbs156.mlan","row_count":24} INF VM:139966036683480 B:1 I:16 2018-09-22 18:31:00.788 [Finish BATCH] {"total_memory":61,"total_duration":3.067,"row_count_total":2924,"spot_count_total ":174,"fetch_duration":0.695,"db_host":"dbs156.mlan","statement_duration":1.99,"co mpression":"ON","bytes_sent":"797967","max_memory":494} PyEXASOL function: execute_udf_output()
  • 20. Memory pitfall Exasol DB RAM Cluster OS UDF Pre-allocated during startup Set in ExaOperations: DB RAM, Huge Pages Required for OS Tiny! ❖ Always monitor memory usage ❖ Exasol may lose parallelism if >4Gb RAM used per node ❖ Increase this limit if you need it (contact Customer Support)
  • 21. Long transaction pitfall Process 2 Process 1 Process 3 Process 4 ❖ Some groups are slow ❖ Not obvious ❖ Hard to profile ❖ Holding locks ❖ Holding resources ❖ QUERY_TIMEOUT helps
  • 22. Gists ❖ Exasol UDF: Import from MySQL ❖ Exasol UDF: Import from Hadoop ORC ❖ Exasol UDF: Export to Hadoop ORC Icons from the Noun Project Thank you!