The document outlines PgLoader, a parallel ETL tool for PostgreSQL. It discusses PgLoader's architecture which includes main components like file readers, line processors and COPY handlers. It loads data in parallel by using multiple threads or processes that read files, process lines and COPY data to PostgreSQL simultaneously. The document also covers configuration, handling errors, logging and the benefits of a parallel loading approach.
This document provides an overview of new features being introduced in Java 8, with a focus on lambda expressions, default methods, and bulk data operations. It discusses the syntax and usage of lambda expressions, how they are represented at runtime using functional interfaces, and how variable capturing works. New functional interfaces being added to Java 8 are presented, including examples like Consumer and Function. The document also explores how lambda expressions are compiled, showing how invokedynamic bytecode instructions are used to dispatch lambda calls at runtime. In summary, the document serves as an introduction to key new language features in Java 8 that improve support for functional programming and parallel operations.
The document discusses developing a benchmark suite called Benchmark::Perl::Formance to measure and compare the performance of different versions and configurations of Perl. It outlines challenges in building the benchmark infrastructure, running automated tests on different Perls over time, and presenting results through a Tapper test reporting system and Codespeed graphing application.
This document provides an overview of Java 8 including:
- Java 8 has approximately 9 million developers and Oracle supports versions 6-8.
- New features include default methods, lambda expressions, streams, and parallel processing capabilities.
- JavaScript integration allows JavaScript code to be run from Java, enabling database and other connections from JavaScript.
- Potential issues with Java 8 include more complex debugging due to lambda expressions and increased abstraction.
Java 11 and Java 12 include new features that improve performance and developer productivity. Java 11 focuses on stability and is the long term support version, while Java 12 contains newer experimental features. Some key additions in Java 11 include lambda parameters, nested access control, and improved string handling. Java 12 features include switch expressions, class data sharing archives, and garbage collection improvements. While Java 12 contains the latest features, Java 11 is generally better for deployment in production environments due to its long term support and stability. Both versions aim to enhance the Java development experience.
This document provides an overview of OpenMP, a programming model for parallel programming on shared memory architectures. It discusses key OpenMP concepts like parallel regions, data sharing attributes, worksharing constructs like parallel for loops and sections, tasks, and synchronization. The document outlines an agenda and provides examples to illustrate OpenMP directives and clauses for parallelizing loops and sections of code.
Updates on the current status of Graal VM, a platform dedicated to run multiple programming languages at excellent performance. Experimental binaries are available from http://www.oracle.com/technetwork/oracle-labs/program-languages/overview/index.html.
This document provides an overview of Graal, a high-performance dynamic compiler for Java written in Java. It discusses key features such as support for speculative optimizations and deoptimization. It also covers Graal's intermediate representation, optimization phases, and how it can be used for custom compilations and static analysis. The document aims to provide insight into Graal and its capabilities as a research compiler.
This document provides an overview of new features being introduced in Java 8, with a focus on lambda expressions, default methods, and bulk data operations. It discusses the syntax and usage of lambda expressions, how they are represented at runtime using functional interfaces, and how variable capturing works. New functional interfaces being added to Java 8 are presented, including examples like Consumer and Function. The document also explores how lambda expressions are compiled, showing how invokedynamic bytecode instructions are used to dispatch lambda calls at runtime. In summary, the document serves as an introduction to key new language features in Java 8 that improve support for functional programming and parallel operations.
The document discusses developing a benchmark suite called Benchmark::Perl::Formance to measure and compare the performance of different versions and configurations of Perl. It outlines challenges in building the benchmark infrastructure, running automated tests on different Perls over time, and presenting results through a Tapper test reporting system and Codespeed graphing application.
This document provides an overview of Java 8 including:
- Java 8 has approximately 9 million developers and Oracle supports versions 6-8.
- New features include default methods, lambda expressions, streams, and parallel processing capabilities.
- JavaScript integration allows JavaScript code to be run from Java, enabling database and other connections from JavaScript.
- Potential issues with Java 8 include more complex debugging due to lambda expressions and increased abstraction.
Java 11 and Java 12 include new features that improve performance and developer productivity. Java 11 focuses on stability and is the long term support version, while Java 12 contains newer experimental features. Some key additions in Java 11 include lambda parameters, nested access control, and improved string handling. Java 12 features include switch expressions, class data sharing archives, and garbage collection improvements. While Java 12 contains the latest features, Java 11 is generally better for deployment in production environments due to its long term support and stability. Both versions aim to enhance the Java development experience.
This document provides an overview of OpenMP, a programming model for parallel programming on shared memory architectures. It discusses key OpenMP concepts like parallel regions, data sharing attributes, worksharing constructs like parallel for loops and sections, tasks, and synchronization. The document outlines an agenda and provides examples to illustrate OpenMP directives and clauses for parallelizing loops and sections of code.
Updates on the current status of Graal VM, a platform dedicated to run multiple programming languages at excellent performance. Experimental binaries are available from http://www.oracle.com/technetwork/oracle-labs/program-languages/overview/index.html.
This document provides an overview of Graal, a high-performance dynamic compiler for Java written in Java. It discusses key features such as support for speculative optimizations and deoptimization. It also covers Graal's intermediate representation, optimization phases, and how it can be used for custom compilations and static analysis. The document aims to provide insight into Graal and its capabilities as a research compiler.
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Thomas Wuerthinger
Multi-language runtimes providing simultaneously high performance for several programming languages still remain an illusion. Industrial-strength managed language runtimes are built with a focus on one language (e.g., Java or C#). Other languages may compile to the bytecode formats of those managed language runtimes. However, the performance characteristics of the bytecode generation approach are often lagging behind compared to language runtimes specialized for a specific language. The performance of JavaScript is for example still orders of magnitude better on specialized runtimes (e.g., V8 or SpiderMonkey).
We present a solution to this problem by providing guest languages with a new way of interfacing with the host runtime. The semantics of the guest language is communicated to the host runtime not via generating bytecodes, but via an interpreter written in the host language. This gives guest languages a simple way to express the semantics of their operations including language-specific mechanisms for collecting profiling feedback. The efficient machine code is derived from the interpreter via automatic partial evaluation. The main components reused from the underlying runtime are the compiler and the garbage collector. They are both agnostic to the executed guest languages.
The host compiler derives the optimized machine code for hot parts of the guest language application via partial evaluation of the guest language interpreter. The interpreter definition can guide the host compiler to generate deoptimization points, i.e., exits from the compiled code. This allows guest language operations to use speculations: An operation could for example speculate that the type of an incoming parameter is constant. Furthermore, the guest language interpreter can use global assumptions about the system state that are registered with the compiled code. Finally, part of the interpreter's code can be excluded from the partial evaluation and remain shared across the system. This is useful for avoiding code explosion and appropriate for infrequently executed paths of an operation. These basic mechanisms are provided by the underlying language-agnostic host runtime and allow separation of concerns between guest and host runtime.
We implemented Truffle, the guest language runtime framework, on top of the Graal compiler and the HotSpot virtual machine. So far, there are prototypes for C, J, Python, JavaScript, R, Ruby, and Smalltalk running on top of the Truffle framework. The prototypes are still incomplete with respect to language semantics. However, most of them can run non-trivial benchmarks to demonstrate the core promise of the Truffle system: Multiple languages within one runtime system at competitive performance.
Measuring the time spent on small individual fractions of program code is a common technique for analysing performance behavior and detecting performance bottlenecks. The benefits of the approach include a detailed individual attribution of performance and understandable feedback loops when experimenting with different code versions. There are however severe pitfalls when following this approach that can lead to vastly misleading results. Modern dynamic compilers use complex optimisation techniques that take a large part of the program into account. There can be therefore unexpected side-effects when combining different code snippets or even when running a presumably unrelated part of the code. This talk will present performance paradoxes with examples from the domain of dynamic compilation of Java programs. Furthermore, it will discuss an alternative approach to modelling code performance characteristics that takes the challenges of complex optimising compilers into account.
The document discusses Apache Avro, a data serialization framework. It provides an overview of Avro's history and capabilities. Key points include that Avro supports schema evolution, multiple languages, and interoperability with other formats like Protobuf and Thrift. The document also covers implementing Avro, including using the generic, specific and reflect data types, and examples of writing and reading data. Performance is addressed, finding that Avro size is competitive while speed is in the top half.
Graal is a dynamic meta-circular research compiler for Java that is designed for extensibility and modularity. One of its main distinguishing elements is the handling of optimistic assumptions obtained via profiling feedback and the representation of deoptimization guards in the compiled code. Truffle is a self-optimizing runtime system on top of Graal that uses partial evaluation to derive compiled code from interpreters. Truffle is suitable for creating high-performance implementations for dynamic languages with only moderate effort. The presentation includes a description of the Truffle multi-language API and performance comparisons within the industry of current prototype Truffle language implementations (JavaScript, Ruby, and R). Both Graal and Truffle are open source and form themselves research platforms in the area of virtual machine and programming language implementation (http://openjdk.java.net/projects/graal/).
The document discusses various CPU benchmarks used to evaluate performance, including their pros and cons. It notes that synthetic benchmarks like Dhrystone and Whetstone have limitations and are outdated. Better benchmarks measure real applications or standardized workloads like CoreMark, which aims to replace Dhrystone by testing common algorithms like linked lists and matrices. The document also cautions that benchmarks can be manipulated and advocates for transparency in benchmarking methodology and results.
The document discusses Java bytecode and the Java Virtual Machine (JVM). It provides details on:
- Bytecode is machine language for the JVM and is stored in class files. Each method has its own bytecode stream.
- Bytecode instructions consist of opcodes and operands that are executed by the JVM. Common opcodes include iconst_0, istore_0, iinc, iload_0, etc.
- The JVM has various components like the class loader, runtime data areas (method area, heap, stacks), and execution engine that interprets or compiles bytecode to machine code.
This is an intermediate conversion course for C++, suitable for second year computing students who may have learned Java or another language in first year.
UFT- New features and comparison with QTPRita Singh
UFT is an advanced version of QTP that provides unified functional testing of both graphical user interfaces and application programming interfaces. It combines the features of QTP and HP Service Test into a single tool. Some key new features of UFT include support for additional browsers, operating systems, and programming languages. It also features faster installation, improved debugging tools, and the ability to perform GUI, API, and business process testing within one interface.
The document outlines a CEO creed and coaching exercises for artists and entrepreneurs. The creed includes treating yourself as a CEO by taking responsibility, having clear goals and accountability. The exercises guide setting SMART goals, assessing strengths/weaknesses, and creating 3-6 month action plans to achieve the goals. Participants are encouraged to evaluate progress regularly and adjust plans accordingly.
The document summarizes a presentation about a day in the life of a cybercrime boss. It describes his daily schedule, which includes checking emails, analyzing log files, meeting with associates to discuss operations and rewards, handling disciplinary issues, exploring new business opportunities, and reconciling bank statements and log files. It also discusses how to disrupt such operations, such as by looking for absent information, following the money trail, encouraging public-private partnerships and information sharing, and focusing on prevention over prosecution.
1) The document discusses the importance of reading for children and proposes holding a book drive to excite children about reading. It notes that many school-age children are no longer interested in reading.
2) It includes quotes about the importance of reading, results from a children's survey about reading preferences, and statistics about the impacts of poor reading skills.
3) The book drive was held on April 18, 2009 to benefit a church children's ministry and collected a variety of books including some for a one year old.
Durante la Guerra Civil Española, los carteles propagandísticos apelaban a diferentes ideologías políticas como el socialismo, el nacionalismo y el Frente Popular de izquierda, así como a la lucha del pueblo español y a la participación de las mujeres en el conflicto.
This document summarizes a white paper on cyber crime within the South African government. It discusses the growing problem of identity theft and hacking within government systems. It outlines the Cool Frog Cyber Project, which was authorized to target and disrupt international crime syndicates hijacking identities to enable financial crimes. The project profiled 4 linked syndicates, analyzed threats, and achieved several arrests and convictions. It discusses investigative methods like searches, hardware key loggers, and working with partners like law enforcement. The document recommends improving security measures, vetting, and training to better address cyber crimes within government.
Cyber Crime 101: The Impact of Cyber Crime on Higher Education in South AfricaJacqueline Fick
The document discusses cyber crime and its impact on higher education in South Africa. It introduces a hypothetical case study of "Jack le Hack", a university student who engages in various cyber crimes like unauthorized access of data, computer-related fraud, and intimidation. The document then defines cyber crime and common types seen in South Africa. It outlines strategies for organizations to protect their data and implement proactive cyber security practices. The document concludes with practical cyber security tips for users.
Spring Batch is a lightweight framework for batch processing that provides reusable functions for transaction management, error handling, parallel processing, and more. It allows developers to focus on the business logic rather than infrastructure concerns. Spring Batch is lightweight and can be easily embedded into existing applications. It uses a POJO-based programming model with dependency injection and supports reading from and writing to various data sources. Jobs in Spring Batch are comprised of steps that can run sequentially or conditionally and process data in chunks for improved performance.
This document outlines the challenges of porting Oracle applications to PostgreSQL. It discusses porting SQL syntax, data types, functions, sequences, outer joins, and default parameters. Many changes are required such as modifying data types, functions like DECODE, sequence syntax, outer join syntax, and handling default parameters. The document emphasizes that porting projects are difficult and compatibility is limited, but success can be rewarding.
This document discusses an experiment to use PostgreSQL as the sole data access layer for a web application by replacing a traditional RESTful API server with PostgREST. PostgREST is a framework that provides a RESTful interface to any PostgreSQL database without requiring additional code or configuration. The document demonstrates PostgREST by connecting to a sample Pagila database and allowing full CRUD operations and filtering through SQL queries alone. It also shows how PostgREST handles authentication, authorization, relations, and versioning directly through PostgreSQL features.
Skype uses PostgreSQL for over 100 database servers and 200 databases to support its online services. It partitions data across multiple databases for scalability and uses tools like pgBouncer, PL/Proxy, SkyTools, PgQ, and Londiste to connect, load balance, and replicate data between databases. This complex database architecture has allowed Skype to grow significantly while remaining easy to manage.
A noETL Parallel Streaming Transformation Loader using Spark, Kafka & VerticaData Con LA
ETL, ELT and Lambda architectures have evolved into a [non]Streaming general purpose data ingestion pipeline, that is scalable through distributed processing, for Big Data Analytics over hybrid Data Warehouses in Hadoop and MPP Columnar stores like HPE-Vertica.
Bio: Jack Gudenkauf (https://www.linkedin.com/in/jackglinkedin) has over twenty-nine years of experience designing and implementing Internet scale distributed systems. Jack is currently the CEO & Founder of the startup BigDataInfra. He was previously; VP of Big Data at Playtika, a hands-on manager of the Twitter Analytics Data Warehouse team, spent 15 years at Microsoft shipping 15 products, and prior to Microsoft he managed his own consulting company after he began his career as an MIS Director of several startup companies.
This document describes a Parallel Streaming Transformation Loader (PSTL) that uses Kafka, Spark, and Vertica for real-time data ingestion and analytics. It summarizes the PSTL as follows:
1. The PSTL ingests streaming data from Kafka into Spark RDDs in parallel.
2. Spark is used to transform the data, including assigning IDs and hashing records to partitions.
3. The transformed data is written in parallel from the Spark partitions directly to Vertica for analytics and querying.
4. Vertica demonstrated impressive parallel copy performance of 2.42 billion rows in under 8 minutes using this approach.
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Thomas Wuerthinger
Multi-language runtimes providing simultaneously high performance for several programming languages still remain an illusion. Industrial-strength managed language runtimes are built with a focus on one language (e.g., Java or C#). Other languages may compile to the bytecode formats of those managed language runtimes. However, the performance characteristics of the bytecode generation approach are often lagging behind compared to language runtimes specialized for a specific language. The performance of JavaScript is for example still orders of magnitude better on specialized runtimes (e.g., V8 or SpiderMonkey).
We present a solution to this problem by providing guest languages with a new way of interfacing with the host runtime. The semantics of the guest language is communicated to the host runtime not via generating bytecodes, but via an interpreter written in the host language. This gives guest languages a simple way to express the semantics of their operations including language-specific mechanisms for collecting profiling feedback. The efficient machine code is derived from the interpreter via automatic partial evaluation. The main components reused from the underlying runtime are the compiler and the garbage collector. They are both agnostic to the executed guest languages.
The host compiler derives the optimized machine code for hot parts of the guest language application via partial evaluation of the guest language interpreter. The interpreter definition can guide the host compiler to generate deoptimization points, i.e., exits from the compiled code. This allows guest language operations to use speculations: An operation could for example speculate that the type of an incoming parameter is constant. Furthermore, the guest language interpreter can use global assumptions about the system state that are registered with the compiled code. Finally, part of the interpreter's code can be excluded from the partial evaluation and remain shared across the system. This is useful for avoiding code explosion and appropriate for infrequently executed paths of an operation. These basic mechanisms are provided by the underlying language-agnostic host runtime and allow separation of concerns between guest and host runtime.
We implemented Truffle, the guest language runtime framework, on top of the Graal compiler and the HotSpot virtual machine. So far, there are prototypes for C, J, Python, JavaScript, R, Ruby, and Smalltalk running on top of the Truffle framework. The prototypes are still incomplete with respect to language semantics. However, most of them can run non-trivial benchmarks to demonstrate the core promise of the Truffle system: Multiple languages within one runtime system at competitive performance.
Measuring the time spent on small individual fractions of program code is a common technique for analysing performance behavior and detecting performance bottlenecks. The benefits of the approach include a detailed individual attribution of performance and understandable feedback loops when experimenting with different code versions. There are however severe pitfalls when following this approach that can lead to vastly misleading results. Modern dynamic compilers use complex optimisation techniques that take a large part of the program into account. There can be therefore unexpected side-effects when combining different code snippets or even when running a presumably unrelated part of the code. This talk will present performance paradoxes with examples from the domain of dynamic compilation of Java programs. Furthermore, it will discuss an alternative approach to modelling code performance characteristics that takes the challenges of complex optimising compilers into account.
The document discusses Apache Avro, a data serialization framework. It provides an overview of Avro's history and capabilities. Key points include that Avro supports schema evolution, multiple languages, and interoperability with other formats like Protobuf and Thrift. The document also covers implementing Avro, including using the generic, specific and reflect data types, and examples of writing and reading data. Performance is addressed, finding that Avro size is competitive while speed is in the top half.
Graal is a dynamic meta-circular research compiler for Java that is designed for extensibility and modularity. One of its main distinguishing elements is the handling of optimistic assumptions obtained via profiling feedback and the representation of deoptimization guards in the compiled code. Truffle is a self-optimizing runtime system on top of Graal that uses partial evaluation to derive compiled code from interpreters. Truffle is suitable for creating high-performance implementations for dynamic languages with only moderate effort. The presentation includes a description of the Truffle multi-language API and performance comparisons within the industry of current prototype Truffle language implementations (JavaScript, Ruby, and R). Both Graal and Truffle are open source and form themselves research platforms in the area of virtual machine and programming language implementation (http://openjdk.java.net/projects/graal/).
The document discusses various CPU benchmarks used to evaluate performance, including their pros and cons. It notes that synthetic benchmarks like Dhrystone and Whetstone have limitations and are outdated. Better benchmarks measure real applications or standardized workloads like CoreMark, which aims to replace Dhrystone by testing common algorithms like linked lists and matrices. The document also cautions that benchmarks can be manipulated and advocates for transparency in benchmarking methodology and results.
The document discusses Java bytecode and the Java Virtual Machine (JVM). It provides details on:
- Bytecode is machine language for the JVM and is stored in class files. Each method has its own bytecode stream.
- Bytecode instructions consist of opcodes and operands that are executed by the JVM. Common opcodes include iconst_0, istore_0, iinc, iload_0, etc.
- The JVM has various components like the class loader, runtime data areas (method area, heap, stacks), and execution engine that interprets or compiles bytecode to machine code.
This is an intermediate conversion course for C++, suitable for second year computing students who may have learned Java or another language in first year.
UFT- New features and comparison with QTPRita Singh
UFT is an advanced version of QTP that provides unified functional testing of both graphical user interfaces and application programming interfaces. It combines the features of QTP and HP Service Test into a single tool. Some key new features of UFT include support for additional browsers, operating systems, and programming languages. It also features faster installation, improved debugging tools, and the ability to perform GUI, API, and business process testing within one interface.
The document outlines a CEO creed and coaching exercises for artists and entrepreneurs. The creed includes treating yourself as a CEO by taking responsibility, having clear goals and accountability. The exercises guide setting SMART goals, assessing strengths/weaknesses, and creating 3-6 month action plans to achieve the goals. Participants are encouraged to evaluate progress regularly and adjust plans accordingly.
The document summarizes a presentation about a day in the life of a cybercrime boss. It describes his daily schedule, which includes checking emails, analyzing log files, meeting with associates to discuss operations and rewards, handling disciplinary issues, exploring new business opportunities, and reconciling bank statements and log files. It also discusses how to disrupt such operations, such as by looking for absent information, following the money trail, encouraging public-private partnerships and information sharing, and focusing on prevention over prosecution.
1) The document discusses the importance of reading for children and proposes holding a book drive to excite children about reading. It notes that many school-age children are no longer interested in reading.
2) It includes quotes about the importance of reading, results from a children's survey about reading preferences, and statistics about the impacts of poor reading skills.
3) The book drive was held on April 18, 2009 to benefit a church children's ministry and collected a variety of books including some for a one year old.
Durante la Guerra Civil Española, los carteles propagandísticos apelaban a diferentes ideologías políticas como el socialismo, el nacionalismo y el Frente Popular de izquierda, así como a la lucha del pueblo español y a la participación de las mujeres en el conflicto.
This document summarizes a white paper on cyber crime within the South African government. It discusses the growing problem of identity theft and hacking within government systems. It outlines the Cool Frog Cyber Project, which was authorized to target and disrupt international crime syndicates hijacking identities to enable financial crimes. The project profiled 4 linked syndicates, analyzed threats, and achieved several arrests and convictions. It discusses investigative methods like searches, hardware key loggers, and working with partners like law enforcement. The document recommends improving security measures, vetting, and training to better address cyber crimes within government.
Cyber Crime 101: The Impact of Cyber Crime on Higher Education in South AfricaJacqueline Fick
The document discusses cyber crime and its impact on higher education in South Africa. It introduces a hypothetical case study of "Jack le Hack", a university student who engages in various cyber crimes like unauthorized access of data, computer-related fraud, and intimidation. The document then defines cyber crime and common types seen in South Africa. It outlines strategies for organizations to protect their data and implement proactive cyber security practices. The document concludes with practical cyber security tips for users.
Spring Batch is a lightweight framework for batch processing that provides reusable functions for transaction management, error handling, parallel processing, and more. It allows developers to focus on the business logic rather than infrastructure concerns. Spring Batch is lightweight and can be easily embedded into existing applications. It uses a POJO-based programming model with dependency injection and supports reading from and writing to various data sources. Jobs in Spring Batch are comprised of steps that can run sequentially or conditionally and process data in chunks for improved performance.
This document outlines the challenges of porting Oracle applications to PostgreSQL. It discusses porting SQL syntax, data types, functions, sequences, outer joins, and default parameters. Many changes are required such as modifying data types, functions like DECODE, sequence syntax, outer join syntax, and handling default parameters. The document emphasizes that porting projects are difficult and compatibility is limited, but success can be rewarding.
This document discusses an experiment to use PostgreSQL as the sole data access layer for a web application by replacing a traditional RESTful API server with PostgREST. PostgREST is a framework that provides a RESTful interface to any PostgreSQL database without requiring additional code or configuration. The document demonstrates PostgREST by connecting to a sample Pagila database and allowing full CRUD operations and filtering through SQL queries alone. It also shows how PostgREST handles authentication, authorization, relations, and versioning directly through PostgreSQL features.
Skype uses PostgreSQL for over 100 database servers and 200 databases to support its online services. It partitions data across multiple databases for scalability and uses tools like pgBouncer, PL/Proxy, SkyTools, PgQ, and Londiste to connect, load balance, and replicate data between databases. This complex database architecture has allowed Skype to grow significantly while remaining easy to manage.
A noETL Parallel Streaming Transformation Loader using Spark, Kafka & VerticaData Con LA
ETL, ELT and Lambda architectures have evolved into a [non]Streaming general purpose data ingestion pipeline, that is scalable through distributed processing, for Big Data Analytics over hybrid Data Warehouses in Hadoop and MPP Columnar stores like HPE-Vertica.
Bio: Jack Gudenkauf (https://www.linkedin.com/in/jackglinkedin) has over twenty-nine years of experience designing and implementing Internet scale distributed systems. Jack is currently the CEO & Founder of the startup BigDataInfra. He was previously; VP of Big Data at Playtika, a hands-on manager of the Twitter Analytics Data Warehouse team, spent 15 years at Microsoft shipping 15 products, and prior to Microsoft he managed his own consulting company after he began his career as an MIS Director of several startup companies.
This document describes a Parallel Streaming Transformation Loader (PSTL) that uses Kafka, Spark, and Vertica for real-time data ingestion and analytics. It summarizes the PSTL as follows:
1. The PSTL ingests streaming data from Kafka into Spark RDDs in parallel.
2. Spark is used to transform the data, including assigning IDs and hashing records to partitions.
3. The transformed data is written in parallel from the Spark partitions directly to Vertica for analytics and querying.
4. Vertica demonstrated impressive parallel copy performance of 2.42 billion rows in under 8 minutes using this approach.
Intro to Talend Open Studio for Data IntegrationPhilip Yurchuk
An overview of Talend Open Studio for Data Integration, along with some tips learned from building production jobs and a list of resources. Feel free to contact me for more information.
Context-aware Fast Food Recommendation with Ray on Apache Spark at Burger KingDatabricks
For fast food recommendation use cases, user behavior sequences and context features (such as time, weather, and location) are both important factors to be taken into consideration. At Burger King, we have developed a new state-of-the-art recommendation model called Transformer Cross Transformer (TxT). It applies Transformer encoders to capture both user behavior sequences and complicated context features and combines both transformers through the latent cross for joint context-aware fast food recommendations. Online A/B testings show not only the superiority of TxT comparing to existing methods results but also TxT can be successfully applied to other fast food recommendation use cases outside of Burger King.
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
Challenges of angular in production (Tasos Bekos) - GreeceJS #17GreeceJS
Modern web applications have constantly growing requirements and their API and complexity grows exponentially. In this session we'll look at a practical example of how to optimize solutions, like bundling, tree shaking, ahead of time compilation, lazy loading, etc. Also, we will get a glimpse of what it takes to switch a complex product to a modern stack, with Angular in its heart, and how the company's commitment is making it possible.
Prestogres is a PostgreSQL protocol gateway for Presto that allows Presto to be queried using standard BI tools through ODBC/JDBC. It works by rewriting queries at the pgpool-II middleware layer and executing the rewritten queries on Presto using PL/Python functions. This allows Presto to integrate with the existing BI tool ecosystem while avoiding the complexity of implementing the full PostgreSQL protocol. Key aspects of the Prestogres implementation include faking PostgreSQL system catalogs, handling multi-statement queries and errors, and security definition. Future work items include better supporting SQL syntax like casts and temporary tables.
PGDay Amsterdam 2018
Instead of using ETL Tools, which consume tons of memory on their own system, you will learn how to do ETL jobs directly in and with a database.
PostgreSQL Management of External Data (SQL/MED, http://www.iso.org/iso/catalogue_detail.htm?csnumber=38643) is also known as Foreign Data Wrapper (FDW). With FDW, there is nearly no limit of external data, that you could use directly inside a PostgreSQL database.
PGDay.Amsterdam 2018 - Stefanie Stoelting - PostgreSQL As Data Integration ToolPGDay.Amsterdam
Instead of using ETL Tools, which consume tons of memory on their own system, you will learn how to do ETL jobs directly in and with a database. With PostgreSQL Management of External Data, also known as Foreign Data Wrapper (FDW), there is nearly no limit of external data, that you could use directly inside a PostgreSQL database.
The document discusses parallel extensions to the .NET framework including the task parallel library (TPL), parallel LINQ (PLINQ), and new coordination data structures. The TPL allows developers to more easily write multithreaded and parallel code using tasks and higher-level constructs. PLINQ allows parallelizing LINQ queries. New collections like ConcurrentQueue and BlockingCollection help with thread-safe data access. The framework changes aim to simplify writing parallel and concurrent code as hardware increasingly uses multiple cores.
This presentation covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
This document provides an overview of advanced PostgreSQL administration topics covered in a presentation, including installation, initialization, configuration, starting and stopping the Postmaster, connections, authentication, security, data directories, shared memory sizing, the write-ahead log, and vacuum settings. The document includes configuration examples from postgresql.conf and discusses parameters for tuning memory usage, connections, authentication and security.
- PostgreSQL has many extension points including data types, functions, operators and more. This allows it to be extended with tools like Groonga.
- There are two main ways to integrate Groonga with PostgreSQL: textsearch_groonga for full text search and groonga_fdw, which allows Groonga to be accessed via foreign tables.
- textsearch_groonga adds Groonga's text search capabilities to PostgreSQL for features like morphological analysis and N-gram search. groonga_fdw treats a Groonga database as an external data source so it can be queried from PostgreSQL using standard SQL.
The document discusses database tools and techniques used by Skype to optimize database performance. It describes:
1) Skype's use of stored procedures to access databases for security and optimization reasons.
2) Tools like pgBouncer and PL/Proxy that minimize connections and allow remote queries to optimize resource usage and enable high availability.
3) The SkyTools framework for batch jobs that handles common tasks to simplify application development.
4) The PgQ queue system for asynchronous processing that provides transactional, efficient message handling between producers and consumers.
This document provides an overview of data pipelines and various technologies that can be used to build them. It begins with a brief history of pipelines and their origins in UNIX. It then discusses common pipeline concepts like decoupling of tasks, encapsulation of processing, and reuse of tasks. Several examples of graphical and programmatic pipeline solutions are presented, including Luigi, Piecepipe, Spring Batch, and workflow engines. Big data pipelines using Hadoop and technologies like Pig and Oozie are also covered. Finally, cloud-based pipeline technologies from AWS like Kinesis, Data Pipeline, Lambda, and EMR are described. Throughout the document, examples are provided to illustrate how different technologies can be used to specify and run data processing pipelines.
1. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
PgLoader, the parallel ETL for PostgreSQL
Dimitri Fontaine
October 17, 2008
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
2. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
Table of contents
1 Introduction
pgloader, the what?
2 Architecture
Main components
Parallel Organisation
3 Configuration examples & Usage
4 Current status & TODO
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
3. Outline
Introduction
Architecture pgloader, the what?
Configuration examples & Usage
Current status & TODO
ETL
Definition
An ETL process data to load into the database from a flat file.
1 Extract
2 Transform
3 Load
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
4. Outline
Introduction
Architecture pgloader, the what?
Configuration examples & Usage
Current status & TODO
pgloader’s features
PGLoader will:
Load CSV data
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
5. Outline
Introduction
Architecture pgloader, the what?
Configuration examples & Usage
Current status & TODO
pgloader’s features
PGLoader will:
Load CSV data
Load pretend-to-be CSV data
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
6. Outline
Introduction
Architecture pgloader, the what?
Configuration examples & Usage
Current status & TODO
pgloader’s features
PGLoader will:
Load CSV data
Load pretend-to-be CSV data
Continue loading when confronted to errors
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
7. Outline
Introduction
Architecture pgloader, the what?
Configuration examples & Usage
Current status & TODO
pgloader’s features
PGLoader will:
Load CSV data
Load pretend-to-be CSV data
Continue loading when confronted to errors
Apply user define transformation to data, on the fly
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
8. Outline
Introduction
Architecture pgloader, the what?
Configuration examples & Usage
Current status & TODO
pgloader’s features
PGLoader will:
Load CSV data
Load pretend-to-be CSV data
Continue loading when confronted to errors
Apply user define transformation to data, on the fly
Optionaly have all your cores participate into processing
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
9. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Configuration
We first parse the configuration, with templating system
Example
[simple]
use_template = simple_tmpl
table = simple
filename = simple/simple.data
columns = a:1, b:3, c:2
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
10. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Loading: file reading
PGLoader supports many input formats, even if they all look like
CSV, the rough time is parsing data:
Read files one line at a time
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
11. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Loading: file reading
PGLoader supports many input formats, even if they all look like
CSV, the rough time is parsing data:
Read files one line at a time
Parse physical lines into logical lines
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
12. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Loading: file reading
PGLoader supports many input formats, even if they all look like
CSV, the rough time is parsing data:
Read files one line at a time
Parse physical lines into logical lines
Supports several readers
csvreader
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
13. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Loading: file reading
PGLoader supports many input formats, even if they all look like
CSV, the rough time is parsing data:
Read files one line at a time
Parse physical lines into logical lines
Supports several readers
textreader
csvreader
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
14. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Loading: file reading
PGLoader supports many input formats, even if they all look like
CSV, the rough time is parsing data:
Read files one line at a time
Parse physical lines into logical lines
Supports several readers
textreader
csvreader
fixedreader
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
15. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Processing lines
Parsing data is the CPU intensive part of the job. You could even
have to guess where lines begin and end.
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
16. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Processing lines
Parsing data is the CPU intensive part of the job. You could even
have to guess where lines begin and end. Then you add:
columns restrictions
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
17. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Processing lines
Parsing data is the CPU intensive part of the job. You could even
have to guess where lines begin and end. Then you add:
columns restrictions
columns reordering
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
18. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Processing lines
Parsing data is the CPU intensive part of the job. You could even
have to guess where lines begin and end. Then you add:
columns restrictions
columns reordering
user defined columns (constants)
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
19. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Processing lines
Parsing data is the CPU intensive part of the job. You could even
have to guess where lines begin and end. Then you add:
columns restrictions
columns reordering
user defined columns (constants)
user defined reformating modules
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
20. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
COPYing to PostgreSQL
This is how we do it:
python cStringIO buffers
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
21. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
COPYing to PostgreSQL
This is how we do it:
python cStringIO buffers
configurable size (copy every)
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
22. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
COPYing to PostgreSQL
This is how we do it:
python cStringIO buffers
configurable size (copy every)
using copy expert() when available (CVS)
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
23. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
COPYing to PostgreSQL
This is how we do it:
python cStringIO buffers
configurable size (copy every)
using copy expert() when available (CVS)
dichotomic error search
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
24. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Handling of erroneous data input
PGLoader will continue processing your input when it contains
erroneous data.
reject data file
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
25. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Handling of erroneous data input
PGLoader will continue processing your input when it contains
erroneous data.
reject data file
reject log file, containing error messages
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
26. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Handling of erroneous data input
PGLoader will continue processing your input when it contains
erroneous data.
reject data file
reject log file, containing error messages
errors count in summary
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
27. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Error logging
PGLoader will continue processing your input when it contains
erroneous data, and will make it so that you know about the
failures.
log file
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
28. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Error logging
PGLoader will continue processing your input when it contains
erroneous data, and will make it so that you know about the
failures.
log file
console log level: client min messages
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
29. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Error logging
PGLoader will continue processing your input when it contains
erroneous data, and will make it so that you know about the
failures.
log file
console log level: client min messages
logfile log level: log min messages
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
30. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Why going parallel?
Loading is IO bound, not CPU bound, right?
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
31. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Why going parallel?
Loading is IO bound, not CPU bound, right?
for large disks array, not so much
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
32. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Why going parallel?
Loading is IO bound, not CPU bound, right?
for large disks array, not so much
with complex parsing, not so much
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
33. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Why going parallel?
Loading is IO bound, not CPU bound, right?
for large disks array, not so much
with complex parsing, not so much
with heavy user rewritting, not so much
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
34. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Ok... How?
mutli-threading is easy to start with in python
Example
class PGLoader(threading.Thread):
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
35. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Ok... How?
mutli-threading is easy to start with in python
then you add in dequeues and semaphores (critical sections)
and signals
Example
class PGLoader(threading.Thread):
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
36. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Ok... How?
mutli-threading is easy to start with in python
then you add in dequeues and semaphores (critical sections)
and signals
Giant Interpreter Lock
Example
class PGLoader(threading.Thread):
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
37. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Ok... How?
mutli-threading is easy to start with in python
then you add in dequeues and semaphores (critical sections)
and signals
Giant Interpreter Lock
fork() based reimplementation could be of interrest
Example
class PGLoader(threading.Thread):
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
38. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Parallelism choices
Has beed asked by some hackers, their use cases dictated two
different modes.
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
39. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Parallelism choices
Has beed asked by some hackers, their use cases dictated two
different modes.
The idea is to have a parallel pg restore testbed, interresting with
large input files (100GB to several TB). PGLoader’s can’t compete
to plain COPY, due to clientserver roundtrips compared to local file
reading, but with some more CPUs feeding the disk array, should
show up nice improvements.
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
40. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Parallelism choices
Has beed asked by some hackers, their use cases dictated two
different modes.
The idea is to have a parallel pg restore testbed, interresting with
large input files (100GB to several TB). PGLoader’s can’t compete
to plain COPY, due to clientserver roundtrips compared to local file
reading, but with some more CPUs feeding the disk array, should
show up nice improvements.
Testing and feeback more than welcome!
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
41. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Round robin reader
Parsing is all done by a single thread for all the content.
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
42. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Round robin reader
Parsing is all done by a single thread for all the content.
N readers are started and get each a queue where to fill this round
data, and issue COPY while main reader continue parsing.
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
43. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Round robin reader
Parsing is all done by a single thread for all the content.
N readers are started and get each a queue where to fill this round
data, and issue COPY while main reader continue parsing.
Example
[rrr]
section_threads = 3
split_file_reading = False
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
44. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Split file reader
The file is split into N blocks and there’s as much pgloader doing
the same job in parallel as there are blocks.
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
45. Outline
Introduction
Main components
Architecture
Parallel Organisation
Configuration examples & Usage
Current status & TODO
Split file reader
The file is split into N blocks and there’s as much pgloader doing
the same job in parallel as there are blocks.
Example
[rrr]
section_threads = 3
split_file_reading = True
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
46. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
Examples
PGLoader distribution comes with diverse examples, don’t forget
to see about them.
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
47. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
simple
That simple:
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
48. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
simple
That simple:
Example
[simple]
table = simple
filename = simple/simple.data
format = text
datestyle = dmy
field_sep = |
trailing_sep = True
columns = *
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
49. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
User defined columns
Constant columns added at parsing time.
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
50. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
User defined columns
Constant columns added at parsing time.
Use case: adding an origin server id field depending on the file to
get loaded, for data aggregation.
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
51. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
User defined columns
Constant columns added at parsing time.
Use case: adding an origin server id field depending on the file to
get loaded, for data aggregation.
Example
[server_A]
file = imports/A.csv
columns = b:2, d:1, x:3, y:4
udc_c = A
copy_columns = b, c, d
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
52. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
User defined Reformating modules
The basic idea is to avoid any pre-processing done with another
tool (sed, awk, you name it).
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
53. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
User defined Reformating modules
The basic idea is to avoid any pre-processing done with another
tool (sed, awk, you name it).
file has ’12131415’
Example
[fixed]
table = fixed
format = fixed
filename = fixed/fixed.data
columns = *
fixed_specs = a:0:10, b:10:8, c:18:8, d:26:17
reformat = c:pgtime:time
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
54. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
User defined Reformating modules
The basic idea is to avoid any pre-processing done with another
tool (sed, awk, you name it).
file has ’12131415’ we want ’12:13:14.15’
Example
def time(reject, input):
quot;quot;quot; Reformat str as a PostgreSQL time quot;quot;quot;
if len(input) != 8:
reject.log(mesg, input)
hour = input[0:2]
...
return ’%s:%s:%s.%s’ % (hour, min, secs, cents)
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
55. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
The fine manual says it all
At http://pgloader.projects.postgresql.org/ or man
pgloader
Example
> pgloader --help
> pgloader --version
> pgloader -DTsc pgloader.conf
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
56. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
TODO
http://pgloader.projects.postgresql.org/dev/TODO.html
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
57. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
TODO
http://pgloader.projects.postgresql.org/dev/TODO.html
Constraint Exclusion support
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
58. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
TODO
http://pgloader.projects.postgresql.org/dev/TODO.html
Constraint Exclusion support
Reject Behaviour
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
59. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
TODO
http://pgloader.projects.postgresql.org/dev/TODO.html
Constraint Exclusion support
Reject Behaviour
XML support with user defined XSLT StyleSheet
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
60. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
TODO
http://pgloader.projects.postgresql.org/dev/TODO.html
Constraint Exclusion support
Reject Behaviour
XML support with user defined XSLT StyleSheet
Facilities
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
61. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
TODO
http://pgloader.projects.postgresql.org/dev/TODO.html
Constraint Exclusion support
Reject Behaviour
XML support with user defined XSLT StyleSheet
Facilities
Don’t be shy and just ask for new features!
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
62. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
Resources and Users
pgfoundry, 1 developper, some users, no mailing list yet (no one
asking for one), some mails sometime, seldom bug reports (fixed)
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
63. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
Resources and Users
pgfoundry, 1 developper, some users, no mailing list yet (no one
asking for one), some mails sometime, seldom bug reports (fixed)
Support ongoing at #postgresql and #postgresqlfr
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL
64. Outline
Introduction
Architecture
Configuration examples & Usage
Current status & TODO
Resources and Users
pgfoundry, 1 developper, some users, no mailing list yet (no one
asking for one), some mails sometime, seldom bug reports (fixed)
Support ongoing at #postgresql and #postgresqlfr
packages for debian, FreeBSD, OpenBSD, CentOS, RHEL and
Fedora.
Dimitri Fontaine PgLoader, the parallel ETL for PostgreSQL