The document discusses using PL/pgSQL functions and stored procedures to implement fantasy football scoring rules and statistics in PostgreSQL. It provides examples of functions to calculate scores for rushing, passing, receiving and touchdowns. It also shows how to use these functions to calculate average yearly scores for players and debug the functions using RAISE notices. The end discusses using the PLProfiler extension to monitor and analyze the performance of PL/pgSQL functions.
SQL vs NoSQL, an experiment with MongoDBMarco Segato
A simple experiment with MongoDB compared to Oracle classic RDBMS database: what are NoSQL databases, when to use them, why to choose MongoDB and how we can play with it.
This presentation contains differences between Elasticsearch and relational Databases. Along with that it also has some Glossary Of Elasticsearch and its basic operation.
Intro to MongoDB
Get a jumpstart on MongoDB, use cases, and next steps for building your first app with Buzz Moschetti, MongoDB Enterprise Architect.
@BuzzMoschetti
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal HealthAmazon Web Services
Get a technical deep dive into Amazon Redshift and Redshift Spectrum. Learn best practices for taking advantage of Amazon Redshift’s columnar technology and parallel processing capabilities, to improve overall database performance. This session will explain how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use workload management and use Redshift Spectrum to query data directly in Amazon S3. The session will feature Jeff Battisti, Director Global Cloud BI&A Medical IT at Cardinal Health, and Greg Cantwell, Senior Consultant, Business Metrics / Analytics, who will provide lessons learned and best practices, from creating a new data warehouse to supporting Global Sales & Financial reporting in over 60 countries with Amazon Redshift.
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Amazon Web Services
Migrating data from the existing environments to AWS is a key part of the overall migration to Amazon RDS for most customers. Moving data into Amazon RDS from existing production systems in a reliable, synchronized manner with minimum downtime requires careful planning and the use of appropriate tools and technologies. Because each migration scenario is different, in terms of source and target systems, tools, and data sizes, you need to customize your data migration strategy to achieve the best outcome. In this session, we do a deep dive into various methods, tools, and technologies that you can put to use for a successful and timely data migration to Amazon RDS.
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Khai Tran
Metrics play an important role in data-driven companies like LinkedIn, where we leverage them extensively for reporting, experimentation, and in-product applications. We built an offline platform to help people define and produce metrics driven through their transformation code, mostly in Pig or Hive, and metadata-rich configurations. Many of our users would like to look at these metrics in a real-time fashion. To support this, we recently built an extension to the platform that auto-generates Samza real-time flow from existing offline transformation code with just a single command. Combining with the existing offline platform, we delivered Lambda architecture without maintaining multiple code bases.
In this talk, we will describe how we use Apache Calcite to translate our offline logic, served as the single source of truth, into both Samza code and configuration for real-time execution.
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
When interacting with analytics dashboards in order to achieve a smooth user experience, two major key requirements are sub-second response time and data freshness. Cluster computing frameworks such as Hadoop or Hive/Hbase work well for storing large volumes of data, although they are not optimized for ingesting streaming data and making it available for queries in realtime. Also, long query latencies make these systems sub-optimal choices for powering interactive dashboards and BI use-cases.
In this talk we will present Druid as a complementary solution to existing hadoop based technologies. Druid is an open-source analytics data store, designed from scratch, for OLAP and business intelligence queries over massive data streams. It provides low latency realtime data ingestion and fast sub-second adhoc flexible data exploration queries.
Many large companies are switching to Druid for analytics, and we will cover how druid is able to handle massive data streams and why it is a good fit for BI use cases.
Agenda -
1) Introduction and Ideal Use cases for Druid
2) Data Architecture
3) Streaming Ingestion with Kafka
4) Demo using Druid, Kafka and Superset.
5) Recent Improvements in Druid moving from lambda architecture to Exactly once Ingestion
6) Future Work
SQL vs NoSQL, an experiment with MongoDBMarco Segato
A simple experiment with MongoDB compared to Oracle classic RDBMS database: what are NoSQL databases, when to use them, why to choose MongoDB and how we can play with it.
This presentation contains differences between Elasticsearch and relational Databases. Along with that it also has some Glossary Of Elasticsearch and its basic operation.
Intro to MongoDB
Get a jumpstart on MongoDB, use cases, and next steps for building your first app with Buzz Moschetti, MongoDB Enterprise Architect.
@BuzzMoschetti
SRV405 Deep Dive Amazon Redshift & Redshift Spectrum at Cardinal HealthAmazon Web Services
Get a technical deep dive into Amazon Redshift and Redshift Spectrum. Learn best practices for taking advantage of Amazon Redshift’s columnar technology and parallel processing capabilities, to improve overall database performance. This session will explain how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use workload management and use Redshift Spectrum to query data directly in Amazon S3. The session will feature Jeff Battisti, Director Global Cloud BI&A Medical IT at Cardinal Health, and Greg Cantwell, Senior Consultant, Business Metrics / Analytics, who will provide lessons learned and best practices, from creating a new data warehouse to supporting Global Sales & Financial reporting in over 60 countries with Amazon Redshift.
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Amazon Web Services
Migrating data from the existing environments to AWS is a key part of the overall migration to Amazon RDS for most customers. Moving data into Amazon RDS from existing production systems in a reliable, synchronized manner with minimum downtime requires careful planning and the use of appropriate tools and technologies. Because each migration scenario is different, in terms of source and target systems, tools, and data sizes, you need to customize your data migration strategy to achieve the best outcome. In this session, we do a deep dive into various methods, tools, and technologies that you can put to use for a successful and timely data migration to Amazon RDS.
Conquering the Lambda architecture in LinkedIn metrics platform with Apache C...Khai Tran
Metrics play an important role in data-driven companies like LinkedIn, where we leverage them extensively for reporting, experimentation, and in-product applications. We built an offline platform to help people define and produce metrics driven through their transformation code, mostly in Pig or Hive, and metadata-rich configurations. Many of our users would like to look at these metrics in a real-time fashion. To support this, we recently built an extension to the platform that auto-generates Samza real-time flow from existing offline transformation code with just a single command. Combining with the existing offline platform, we delivered Lambda architecture without maintaining multiple code bases.
In this talk, we will describe how we use Apache Calcite to translate our offline logic, served as the single source of truth, into both Samza code and configuration for real-time execution.
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
When interacting with analytics dashboards in order to achieve a smooth user experience, two major key requirements are sub-second response time and data freshness. Cluster computing frameworks such as Hadoop or Hive/Hbase work well for storing large volumes of data, although they are not optimized for ingesting streaming data and making it available for queries in realtime. Also, long query latencies make these systems sub-optimal choices for powering interactive dashboards and BI use-cases.
In this talk we will present Druid as a complementary solution to existing hadoop based technologies. Druid is an open-source analytics data store, designed from scratch, for OLAP and business intelligence queries over massive data streams. It provides low latency realtime data ingestion and fast sub-second adhoc flexible data exploration queries.
Many large companies are switching to Druid for analytics, and we will cover how druid is able to handle massive data streams and why it is a good fit for BI use cases.
Agenda -
1) Introduction and Ideal Use cases for Druid
2) Data Architecture
3) Streaming Ingestion with Kafka
4) Demo using Druid, Kafka and Superset.
5) Recent Improvements in Druid moving from lambda architecture to Exactly once Ingestion
6) Future Work
by Darin Briskman, Technical Evangelist, AWS
Database Freedom means being able to use the database engine that’s right for you as your needs evolve. Being locked into a specific technology can prevent you from achieving your mission. Fortunately, AWS Database Migration Service makes it easy to switch between different database engines. We’ll look at how to use Schema Migration Tool with DMS to switch from a commercial database to open source. You’ll need a laptop with a Firefox or Chrome browser.
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
This presentation about Apache Spark covers all the basics that a beginner needs to know to get started with Spark. It covers the history of Apache Spark, what is Spark, the difference between Hadoop and Spark. You will learn the different components in Spark, and how Spark works with the help of architecture. You will understand the different cluster managers on which Spark can run. Finally, you will see the various applications of Spark and a use case on Conviva. Now, let's get started with what is Apache Spark.
Below topics are explained in this Spark presentation:
1. History of Spark
2. What is Spark
3. Hadoop vs Spark
4. Components of Apache Spark
5. Spark architecture
6. Applications of Spark
7. Spark usecase
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
Simplilearn’s Apache Spark and Scala certification training are designed to:
1. Advance your expertise in the Big Data Hadoop Ecosystem
2. Help you master essential Apache and Spark skills, such as Spark Streaming, Spark SQL, machine learning programming, GraphX programming and Shell Scripting Spark
3. Help you land a Hadoop developer job requiring Apache Spark expertise by giving you a real-life industry project coupled with 30 demos
What skills will you learn?
By completing this Apache Spark and Scala course you will be able to:
1. Understand the limitations of MapReduce and the role of Spark in overcoming these limitations
2. Understand the fundamentals of the Scala programming language and its features
3. Explain and master the process of installing Spark as a standalone cluster
4. Develop expertise in using Resilient Distributed Datasets (RDD) for creating applications in Spark
5. Master Structured Query Language (SQL) using SparkSQL
6. Gain a thorough understanding of Spark streaming features
7. Master and describe the features of Spark ML programming and GraphX programming
Who should take this Scala course?
1. Professionals aspiring for a career in the field of real-time big data analytics
2. Analytics professionals
3. Research professionals
4. IT developers and testers
5. Data scientists
6. BI and reporting professionals
7. Students who wish to gain a thorough understanding of Apache Spark
Learn more at https://www.simplilearn.com/big-data-and-analytics/apache-spark-scala-certification-training
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
Zheng Hu
We'll share some HBase experience at XiaoMi:
1. How did we tuning G1GC for HBase Clusters.
2. Development and performance of Async HBase Client.
hbaseconasia2017 hbasecon hbase xiaomi https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
Under The Hood Of A Shard-Per-Core Database ArchitectureScyllaDB
Most databases are based on architectures that pre-date advances to modern hardware. This results in performance issues, the need to overprovision, and a high total cost of ownership. In this webinar, we will discuss the advances to modern server technology and take a deep dive into ScyllaDB’s shard-per-core architecture and our asynchronous engine, the Seastar framework.
Join us to learn how Seastar (and ScyllaDB):
- Avoid locks and contention on the CPU level
- Bypass kernel bottlenecks
- Implement its per-core shared-nothing autosharding mechanism
- Utilize modern storage hardware
- Leverage NUMA to get the best RAM performance
- Balance your data across CPUs and nodes for the best and smoothest performance
Plus we’ll cover the advantages of unlocking vertical scalability.
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Searching big data is complex. Traditional keyword-based search methods are poorly suited for processing and analyzing unstructured data such as images, text, video, and audio that do not follow a predefined model.
In order to solve this problem, more and more people start to adopt vector similarity search which utilizes deep learning methods and ANN (approximate nearest neighbor) algorithms to make processing trillion-scale unstructured datasets faster and more accurate.
For those contemplating re-architecting or greenfields data lakes/data hubs/data warehouses in a cloud environment, talk to our Altis AWS Practice Lead - Guillaume Jaudouin about why you should be considering the "tour de force" combination of AWS and Snowflake.
A small and brief presentation for internship project at BEL on Data Visualization using Seaborn and matplotlib
Some sensitive information has been redacted.
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
If SQL is the universal language of data, why do we author our most important data applications (metrics, analytics, business intelligence) in languages other than SQL? Multidimensional databases and languages such as MDX, DAX and Tableau LOD solve these problems but introduce others: they require specialized knowledge, complicate the data pipeline and don’t integrate well. Is it possible to define and query business intelligence models in SQL?
Apache Calcite has extended SQL to support measures, evaluation context, and multidimensional expressions. With these concepts you can define data models that contain measures, use them in queries, and define new measures in queries.
A talk given by Julian Hyde to Apache Calcite's virtual meetup, 2023-03-15.
Faceted search is a powerful technique to let users easily navigate the search results. It can also be used to develop rich user interfaces, which give an analyst quick insights about the documents space. In this session I will introduce the Facets module, how to use it, under-the-hood details as well as optimizations and best practices. I will also describe advanced faceted search capabilities with Lucene Facets.
Slides for Data Syndrome one hour course on PySpark. Introduces basic operations, Spark SQL, Spark MLlib and exploratory data analysis with PySpark. Shows how to use pylab with Spark to create histograms.
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
In this webinar, learn from industry analyst and big data thought leader Mark Madsen about the future of big data and importance of the new Enterprise Data Lake reference architecture.
This webinar also covers what’s important when building a modern, multi-use data infrastructure, the difference between a Hadoop application and a Data Lake infrastructure, and an enterprise data lake reference architecture to get you started.
To learn more, visit: www.snaplogic.com/big-data
Across the globe energy systems are changing, creating unprecedented challenges for the organisations tasked with ensuring the lights stay on. In the UK, National Grid is facing shrinking margins, looming capacity shortages and unpredictable peaks and troughs in energy supply caused by increasing levels of renewable penetration. Open Energi uses its IoT technology to unlock demand-side capacity - from industrial equipment, co-generation and batery storage systems - creating a smarter grid; one that is cleaner, cheaper, more secure and more efficient.
I'll talk about how we use Apache Nifi to orchestrate and coordinate Machine Learning microservices that operate on streams of data coming from IoT devices, providing a layer of fault-tolerance and traceability. With built-in retry logic, backpressure and clustering, Nifi helps us keep hard problems away from our code. It comes with processors that integrate with our cloud provider of choice (Microsoft Azure), fitting seamlessly into our processing pipeline.Finally, its straightforward graphical interface makes it easy enough to use that any team member can step in and troubleshoot a flow with little training.
PL/pgSQL is a very robust development language allowing you to write complex business logic. The downside is as the complexity of your functions grows, how do you debug them? We have all used RAISE statements to print out the progress of our functions, but they can quickly overwhelm your logs becoming useless.
In this talk, we will: - Walk through the setup of 2 key PostgreSQL extensions, the PL/pgSQL Debugger and the PL Profiler - Demonstrate how the PL Profiler can identify problem areas in your functions - Setting breakpoints in functions and triggers - Stepping through PL/pgSQL functions - Discuss the performance impact of running the extensions on production environments
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineCitus Data
Most of the time we see finished SQL queries, either in code repositories, blog posts of talk slides. This talk focus on the process of how to write an SQL query, from a problem statement expressed in English to code review and long term maintenance of SQL code.
by Darin Briskman, Technical Evangelist, AWS
Database Freedom means being able to use the database engine that’s right for you as your needs evolve. Being locked into a specific technology can prevent you from achieving your mission. Fortunately, AWS Database Migration Service makes it easy to switch between different database engines. We’ll look at how to use Schema Migration Tool with DMS to switch from a commercial database to open source. You’ll need a laptop with a Firefox or Chrome browser.
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
This presentation about Apache Spark covers all the basics that a beginner needs to know to get started with Spark. It covers the history of Apache Spark, what is Spark, the difference between Hadoop and Spark. You will learn the different components in Spark, and how Spark works with the help of architecture. You will understand the different cluster managers on which Spark can run. Finally, you will see the various applications of Spark and a use case on Conviva. Now, let's get started with what is Apache Spark.
Below topics are explained in this Spark presentation:
1. History of Spark
2. What is Spark
3. Hadoop vs Spark
4. Components of Apache Spark
5. Spark architecture
6. Applications of Spark
7. Spark usecase
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
Simplilearn’s Apache Spark and Scala certification training are designed to:
1. Advance your expertise in the Big Data Hadoop Ecosystem
2. Help you master essential Apache and Spark skills, such as Spark Streaming, Spark SQL, machine learning programming, GraphX programming and Shell Scripting Spark
3. Help you land a Hadoop developer job requiring Apache Spark expertise by giving you a real-life industry project coupled with 30 demos
What skills will you learn?
By completing this Apache Spark and Scala course you will be able to:
1. Understand the limitations of MapReduce and the role of Spark in overcoming these limitations
2. Understand the fundamentals of the Scala programming language and its features
3. Explain and master the process of installing Spark as a standalone cluster
4. Develop expertise in using Resilient Distributed Datasets (RDD) for creating applications in Spark
5. Master Structured Query Language (SQL) using SparkSQL
6. Gain a thorough understanding of Spark streaming features
7. Master and describe the features of Spark ML programming and GraphX programming
Who should take this Scala course?
1. Professionals aspiring for a career in the field of real-time big data analytics
2. Analytics professionals
3. Research professionals
4. IT developers and testers
5. Data scientists
6. BI and reporting professionals
7. Students who wish to gain a thorough understanding of Apache Spark
Learn more at https://www.simplilearn.com/big-data-and-analytics/apache-spark-scala-certification-training
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
Zheng Hu
We'll share some HBase experience at XiaoMi:
1. How did we tuning G1GC for HBase Clusters.
2. Development and performance of Async HBase Client.
hbaseconasia2017 hbasecon hbase xiaomi https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
Under The Hood Of A Shard-Per-Core Database ArchitectureScyllaDB
Most databases are based on architectures that pre-date advances to modern hardware. This results in performance issues, the need to overprovision, and a high total cost of ownership. In this webinar, we will discuss the advances to modern server technology and take a deep dive into ScyllaDB’s shard-per-core architecture and our asynchronous engine, the Seastar framework.
Join us to learn how Seastar (and ScyllaDB):
- Avoid locks and contention on the CPU level
- Bypass kernel bottlenecks
- Implement its per-core shared-nothing autosharding mechanism
- Utilize modern storage hardware
- Leverage NUMA to get the best RAM performance
- Balance your data across CPUs and nodes for the best and smoothest performance
Plus we’ll cover the advantages of unlocking vertical scalability.
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2L4rPmM
This CloudxLab Basics of RDD tutorial helps you to understand Basics of RDD in detail. Below are the topics covered in this tutorial:
1) What is RDD - Resilient Distributed Datasets
2) Creating RDD in Scala
3) RDD Operations - Transformations & Actions
4) RDD Transformations - map() & filter()
5) RDD Actions - take() & saveAsTextFile()
6) Lazy Evaluation & Instant Evaluation
7) Lineage Graph
8) flatMap and Union
9) Scala Transformations - Union
10) Scala Actions - saveAsTextFile(), collect(), take() and count()
11) More Actions - reduce()
12) Can We Use reduce() for Computing Average?
13) Solving Problems with Spark
14) Compute Average and Standard Deviation with Spark
15) Pick Random Samples From a Dataset using Spark
Searching big data is complex. Traditional keyword-based search methods are poorly suited for processing and analyzing unstructured data such as images, text, video, and audio that do not follow a predefined model.
In order to solve this problem, more and more people start to adopt vector similarity search which utilizes deep learning methods and ANN (approximate nearest neighbor) algorithms to make processing trillion-scale unstructured datasets faster and more accurate.
For those contemplating re-architecting or greenfields data lakes/data hubs/data warehouses in a cloud environment, talk to our Altis AWS Practice Lead - Guillaume Jaudouin about why you should be considering the "tour de force" combination of AWS and Snowflake.
A small and brief presentation for internship project at BEL on Data Visualization using Seaborn and matplotlib
Some sensitive information has been redacted.
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
If SQL is the universal language of data, why do we author our most important data applications (metrics, analytics, business intelligence) in languages other than SQL? Multidimensional databases and languages such as MDX, DAX and Tableau LOD solve these problems but introduce others: they require specialized knowledge, complicate the data pipeline and don’t integrate well. Is it possible to define and query business intelligence models in SQL?
Apache Calcite has extended SQL to support measures, evaluation context, and multidimensional expressions. With these concepts you can define data models that contain measures, use them in queries, and define new measures in queries.
A talk given by Julian Hyde to Apache Calcite's virtual meetup, 2023-03-15.
Faceted search is a powerful technique to let users easily navigate the search results. It can also be used to develop rich user interfaces, which give an analyst quick insights about the documents space. In this session I will introduce the Facets module, how to use it, under-the-hood details as well as optimizations and best practices. I will also describe advanced faceted search capabilities with Lucene Facets.
Slides for Data Syndrome one hour course on PySpark. Introduces basic operations, Spark SQL, Spark MLlib and exploratory data analysis with PySpark. Shows how to use pylab with Spark to create histograms.
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
In this webinar, learn from industry analyst and big data thought leader Mark Madsen about the future of big data and importance of the new Enterprise Data Lake reference architecture.
This webinar also covers what’s important when building a modern, multi-use data infrastructure, the difference between a Hadoop application and a Data Lake infrastructure, and an enterprise data lake reference architecture to get you started.
To learn more, visit: www.snaplogic.com/big-data
Across the globe energy systems are changing, creating unprecedented challenges for the organisations tasked with ensuring the lights stay on. In the UK, National Grid is facing shrinking margins, looming capacity shortages and unpredictable peaks and troughs in energy supply caused by increasing levels of renewable penetration. Open Energi uses its IoT technology to unlock demand-side capacity - from industrial equipment, co-generation and batery storage systems - creating a smarter grid; one that is cleaner, cheaper, more secure and more efficient.
I'll talk about how we use Apache Nifi to orchestrate and coordinate Machine Learning microservices that operate on streams of data coming from IoT devices, providing a layer of fault-tolerance and traceability. With built-in retry logic, backpressure and clustering, Nifi helps us keep hard problems away from our code. It comes with processors that integrate with our cloud provider of choice (Microsoft Azure), fitting seamlessly into our processing pipeline.Finally, its straightforward graphical interface makes it easy enough to use that any team member can step in and troubleshoot a flow with little training.
PL/pgSQL is a very robust development language allowing you to write complex business logic. The downside is as the complexity of your functions grows, how do you debug them? We have all used RAISE statements to print out the progress of our functions, but they can quickly overwhelm your logs becoming useless.
In this talk, we will: - Walk through the setup of 2 key PostgreSQL extensions, the PL/pgSQL Debugger and the PL Profiler - Demonstrate how the PL Profiler can identify problem areas in your functions - Setting breakpoints in functions and triggers - Stepping through PL/pgSQL functions - Discuss the performance impact of running the extensions on production environments
How to write SQL queries | pgDay Paris 2019 | Dimitri FontaineCitus Data
Most of the time we see finished SQL queries, either in code repositories, blog posts of talk slides. This talk focus on the process of how to write an SQL query, from a problem statement expressed in English to code review and long term maintenance of SQL code.
Delivering Winning Results with Sports Analytics and HPCC SystemsHPCC Systems
The use of Big Data in the sports industry is increasing rapidly to uncover hidden insights into predicting game models, player data, and other attributes. In addition, sports clubs are collecting and analyzing game data to develop tactics and strategies, teams are tracking athlete data, fans are more interested in analytics in addition to statistics and individuals are tracking their own health via FitBit and similar devices.
In partnership with North Carolina State University, this presentation was from our meetup at Bleacher Report held on November 1, 2017.
Our speaker, Associate Professor Vincent Freeh from NCSU, presents various use cases related to sports analytics, discuss leveraging the HPCC Systems open source platform, and the benefits gleaned from the analysis.
Variants have been around in C++ for a long time and C++17 now has std::variant. We will compare inheritance and std::variant for their ability to model sum-types (a fancy name for tagged unions). We will visit std::visit and discuss how it helps us model the pattern matching idiom. Immutability is one of the core pillars of Functional Programming (FP). C++ now allows you to model deep immutability; we'll see a way to do that using the standard library. We'll explore if `return std::move(*this)` makes any sense in C++. Immutability may be a reason for that.
SCBCN17 - El camino hacia la programación declarativaGerard Madorell
A todos nos ha pasado que hemos leído un tutorial de programación declarativa (aka funcional),
pero después llegamos a un codigo real y no sabemos ni como empezar a usar esos conceptos. A nosotros
nos pasaba lo mismo. Después de mucho luchar, muchas iteraciones y pedir mucha ayuda a gente
más buena que nosotros, hemos aprendido a allanar el camino hacia la programación declarativa de manera
pragmática, poco a poco y sin irnos por las ramas. Llegados a este punto, creemos que otra gente
se beneficiaría de este conocimiento.
En esta charla vamos a refactorizar una aplicación con casos de uso reales partiendo
de una base imperativa. Nuestro objetivo será ir puliendo la lógica de éstos hasta
llegar a una implementación declarativa fácilmente entendible, y, a la vez, más robusta
ante los dichosos 'side-effects'.
Al terminar, queremos que los asistentes:
- Sepan las ventajas y desventajas de usar este estilo de programación.
- Entiendan que la curva de aprendizaje a nivel de lógica es menor, a cambio de una mayor complejidad
de implementación.
- Pierdan el miedo a conceptos matemáticos como Monad Transformers, explicándolos con pragmatismo.
- Vean que se puede usar la programación declarativa en cualquier caso de uso real, no solo en proyectos de juguete.
Mark and Wes will talk about Cypher optimization techniques based on real queries as well as the theoretical underlying processes. They'll start from the basics of "what not to do", and how to take advantage of indexes, and continue to the subtle ways of ordering MATCH/WHERE/WITH clauses for optimal performance as of the 2.0.0 release.
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015Sergii Khomenko
Puppet is widely known in DevOps community, but not so popular in data teams. Nevertheless, Puppet could easily empower your data teams. In the talk presented hands-on experience of using Puppet for different data topics starting from configuring Windows machine for Business Intelligence and finishing with advanced ranking infrastructures based on Puppet.
The talk will walk you through the process of setting up a standalone Puppet configuration, that used for provisioning Windows machine to be utilized for Business Intelligence purposes like Tableau and Talend Big Data configurations, ETL scheduling etc. Second part of the talk will cover a use-case of Puppet for enabling a lean ranking infrastructure.
Abstract: Oracle released a feature in 10g Release 2 they thought worthy of facilitating in previous versions via patch sets - so I thought it was worthy enough for a closer look.
Conditional compilation isn't a foreign concept in the programming world, and for the developer aficionado it's a wonderful paradigm to explore.
Conditional compilation was designed with the main intention of being able to create database version specific code. With the recent advent of 11g, developers can actually start adding 11g features to their 10g code today!
However it provides the savvy PL/SQL developer to enhance their code in more ways than just gearing up for the next release… Dust of your software engineering hats and discover how to utilise conditional compilation to explore concepts such as latent self tracing code; latent assertions; and enhanced prototyping for your unit tests.
This seminar will illustrate several examples of conditional compilation that will open your mind; ultimately benefit your users; and can be implemented as far back as 9.2!
Les tests unitaires se sont pas limités au code des applications, des tests peuvent également être effectués sur les données et les schémas des bases de données.
Conférence donnée lors du meetup PostgreSQL le 22 juin 2016 à Nantes
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
As more and more people are moving to PostgreSQL from Oracle, a pattern of mistakes is emerging. They can be caused by the tools being used or just not understanding how PostgreSQL is different than Oracle. In this talk we will discuss the top mistakes people generally make when moving to PostgreSQL from Oracle and what the correct course of action.
Triggers are those little bits of code running in your database that gets executed when something happens that you care about. Whether you are a developer who puts all of your business logic inside of PL/pgSQL functions or someone who uses an ORM and wants to stay away from database code, you will likely end up using triggers at some point. The fact that the most recommend way of implementing table partitioning in PostgreSQL uses triggers accounts for the importance of understanding triggers.
In this talk, we will step through examples of writing various types of triggers using practical uses cases like partitioning and auditing.
The structure of a trigger
BEFORE vs AFTER triggers
Statement Level vs Row Level triggers
Conditional triggers
Event triggers
Debugging triggers
Performance overhead of triggers
All of the examples will be done using PL/pgSQL so in addition to getting an overview of triggers, you will also get a good understanding of how to code in PL/pgSQL.
PostgreSQL Procedural Languages: Tips, Tricks and GotchasJim Mlodgenski
One of the most powerful features of PostgreSQL is its diversity of procedural languages, but with that diversity comes a lot of options.
Did you ever wonder:
- What all of those options are on the CREATE FUNCTION statement?
- How do they affect my application?
- Does my choice of procedural language affect the performance of my statements?
- Should I create a single trigger with IF statements or several simple triggers?
- How do I debug my code?
- Can I tell which line in my function is taking all of the time?
This is a introduction to PostgreSQL that provides a brief overview of PostgreSQL's architecture, features and ecosystem. It was delivered at NYLUG on Nov 24, 2014.
http://www.meetup.com/nylug-meetings/events/180533472/
As more and more alternative data stores come into use, the problem of being able to easily use and report on the data scattered across those data stores becomes increasingly difficult. PostgreSQL has a feature called Foreign Data Wrappers that allows external data sources to be queried from PostgreSQL and look like a standard table. Using Foreign Data Wrappers, users can create a report that joins data residing in Oracle, Hadoop and MongoDB all in a single query.
In this talk, we'll discuss how to set up a Foreign Data Wrapper for various data sources and the pros and cons using them. We'll also discuss the growing ecosystem of Foreign Data Wrapper and a little about how to write one.
Leveraging Hadoop in your PostgreSQL EnvironmentJim Mlodgenski
This talk will begin with a discussion of the strengths of PostgreSQL and Hadoop. We will then lead into a high level overview of Hadoop and its community of projects like Hive, Flume and Sqoop. Finally, we will dig down into various use cases detailing how you can leverage Hadoop technologies for your PostgreSQL databases today. The use cases will range from using HDFS for simple database backups to using PostgreSQL and Foreign Data Wrappers to do low latency analytics on your Big Data.
ne of the most sought after features in PostgreSQL is a scalable multi-master replication solution. While there does exists some tools to create multi-master clusters such as Bucardo and pgpool-II, they may not be the right fit for an application. In this session, you will learn some of the strengths and weaknesses of these more popular multi-master solutions for PostgreSQL and how they compare to using Slony for your multi-master needs. We will explore the types of deployments best suited for a Slony deployment and the steps necessary to configure a multi-master solution for PostgreSQL.
GridSQL is commonly thought of as a replication solution along the likes of Slony and Bucardo, but the open source GridSQL project actually allows PostgreSQL queries to be parallelized across many servers allowing performance to scale nearly linearly. In this session, we will discuss the advantages to using GridSQL for large multi-terabyte data warehouses and how to design your PostgreSQL schemas and queries to leverage GridSQL. We will dig into how GridSQL plans a query capable of spanning multiple PostgreSQL servers and executes across those nodes. We will delve into some performance expectations and where GridSQL should be deployed.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
2. Who Am I?
● Jim Mlodgenski
– jimm@openscg.com
– @jim_mlodgenski
● Director
– United States PostgreSQL (www.postgresql.us)
● Co-organizer of
– Philly PUG (www.phlpug.org)
– NYC PUG (www.nycpug.org)
● CTO, OpenSCG
– www.openscg.com
3. Why?
● More web developers using PostgreSQL
– Use programming techniques inside the
database
● Wrong assumptions
– Using stored procedures prevent SQL Injection
attacks
● Migrations
– Its how Oracle has been advocating database
development for years
4. Example
● Find the youngest
rookie to rush for 100
yards against the
Dallas Cowboys
first_name | last_name | p_year | p_week | p_age
------------+-----------+--------+--------+-------
Edgerrin | James | 1999 | 8 | 21
Jamal | Lewis | 2000 | 12 | 21
(2 rows)
5. Database Developer
CREATE OR REPLACE FUNCTION get_youngest_rookie_backs_against_team_min_yards_simple(
p_team VARCHAR, p_yards integer, OUT first_name VARCHAR, OUT last_name VARCHAR,
OUT p_year integer, OUT p_week integer, OUT p_age integer)
RETURNS SETOF record AS
$$
BEGIN
RETURN QUERY
WITH rookies AS (
SELECT p.first_name, p.last_name, r.player_key,
g.year, g.week, (g.year - p.birth_year) AS age,
first_value(g.year - p.birth_year) OVER w AS youngest
FROM games g, rushing r, player p
WHERE g.game_id = r.game_id
AND r.player_key = p.player_key
AND (g.team = p_team OR g.opponent = p_team)
AND r.yards >= p_yards
AND g.year = p.debut_year
AND r.player_key NOT IN (SELECT s.player_key
FROM seasons s
WHERE s.year = g.year
AND s.team = p_team)
WINDOW w AS (ORDER BY (g.year - p.birth_year)))
SELECT k.first_name, k.last_name, k.year, k.week, k.age
FROM rookies k
WHERE k.age = k.youngest;
END;
$$ LANGUAGE plpgsql;
6. Web Developer
CREATE OR REPLACE FUNCTION get_backs_against_team_min_yards(
p_team VARCHAR, p_yards integer, OUT p_player_key VARCHAR,
OUT p_year integer, OUT p_week integer)
RETURNS SETOF record AS
$BODY$
DECLARE
g record;
r record;
BEGIN
FOR g IN SELECT game_id, year, week
FROM games
WHERE team = p_team
OR opponent = p_team
LOOP
FOR r IN SELECT player_key, yards
FROM rushing
WHERE game_id = g.game_id
LOOP
IF r.yards >= p_yards AND
NOT is_on_team(p_team, r.player_key, g.year) THEN
p_player_key := r.player_key;
p_year := g.year;
p_week := g.week;
RETURN NEXT;
END IF;
END LOOP;
END LOOP;
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
7. Web Developer
CREATE OR REPLACE FUNCTION is_on_team(
p_team VARCHAR, p_player_key VARCHR, p_year integer)
RETURNS boolean AS
$BODY$
DECLARE
r record;
BEGIN
FOR r IN SELECT team
FROM seasons
WHERE player_key = p_player_key
AND year = p_year
LOOP
IF r.team = p_team THEN
RETURN true;
END IF;
END LOOP;
RETURN false;
END;
$BODY$
LANGUAGE plpgsql;
8. Web Developer
CREATE OR REPLACE FUNCTION get_rookie_backs_against_team_min_yards(
p_team VARCHAR, p_yards integer, OUT p1_player_key VARCHAR,
OUT p1_year integer, OUT p1_week integer)
RETURNS SETOF record AS
$BODY$
DECLARE
r record;
BEGIN
FOR r IN SELECT p_player_key, p_year, p_week
FROM get_backs_against_team_min_yards(p_team, p_yards)
LOOP
IF is_rookie_year(r.p_player_key, r.p_year) THEN
p1_player_key := r.p_player_key;
p1_year := r.p_year;
p1_week := r.p_week;
RETURN NEXT;
END IF;
END LOOP;
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
9. Web Developer
CREATE OR REPLACE FUNCTION is_rookie_year(
p_player_key VARCHAR, p_year integer)
RETURNS boolean AS
$BODY$
DECLARE
l_year integer;
BEGIN
SELECT debut_year
INTO l_year
FROM player
WHERE player_key = p_player_key;
IF l_year = p_year THEN
RETURN true;
END IF;
RETURN false;
END;
$BODY$
LANGUAGE plpgsql;
10. Web Developer
CREATE OR REPLACE FUNCTION get_youngest_rookie_backs_against_team_min_yards(
...
BEGIN
l_age := 99;
FOR r IN SELECT p1_player_key, p1_year, p1_week
FROM get_rookie_backs_against_team_min_yards(p_team, p_yards)
LOOP
IF get_player_age_in_year(r.p1_player_key, r.p1_year) < l_age THEN
l_age := get_player_age_in_year(r.p1_player_key, r.p1_year);
END IF;
END LOOP;
FOR r IN SELECT p1_player_key, p1_year, p1_week
FROM get_rookie_backs_against_team_min_yards(p_team, p_yards)
LOOP
IF get_player_age_in_year(r.p1_player_key, r.p1_year) = l_age THEN
SELECT firstname, lastname
INTO l_fn, l_ln
FROM get_player_name(r.p1_player_key);
first_name := l_fn;
last_name := l_ln;
p_year := r.p1_year;
p_week := r.p1_week;
p_age := l_age;
RETURN NEXT;
END IF;
END LOOP;
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
11. Web Developer
CREATE OR REPLACE FUNCTION get_player_age_in_year(
p_player_key character varying, p_year integer)
RETURNS integer AS
$BODY$
DECLARE
l_year integer;
BEGIN
SELECT birth_year
INTO l_year
FROM player
WHERE player_key = p_player_key;
RETURN p_year - l_year;
END;
$BODY$
LANGUAGE plpgsql;
12. Web Developer
CREATE OR REPLACE FUNCTION get_player_name(
p_player_key VARCHAR, OUT firstname VARCHAR,
OUT lastname VARCHAR)
RETURNS record AS
$BODY$
BEGIN
SELECT first_name, last_name
INTO firstname, lastname
FROM seasons
WHERE player_key = p_player_key
LIMIT 1;
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
13. How?
● The code rarely starts
so complex, but
growing functional
requirements and
short time lines
contribute to short cuts
● Developers know
enough PostgreSQL to
be dangerous
14. Fantasy Football
● Fantasy football is a
statistical game in
which players compete
against each other by
managing groups of
real players or position
units selected from
American football
teams.
17. Procedures
CREATE OR REPLACE FUNCTION rushing_score(p_player_key VARCHAR,
p_year int,
p_week int)
RETURNS INT AS
$$
DECLARE
score INT;
BEGIN
-- 1 point for every 10 yards rushing
SELECT r.yards/10
INTO score
FROM rushing r, games g
WHERE r.game_id = g.game_id
AND g.year = p_year
AND g.week = p_week
AND r.player_key = p_player_key;
IF score IS NULL THEN
RETURN 0;
END IF;
RETURN score;
END;
$$ LANGUAGE plpgsql;
18. Procedures
CREATE OR REPLACE FUNCTION player_game_score(p_player_key VARCHAR, p_year int, p_week int)
...
BEGIN
score := 0;
-- Get the position of the player
SELECT position
INTO l_position
FROM player
WHERE player_key = p_player_key;
IF l_position = 'qb' THEN
score := score + passing_score(p_player_key, p_year, p_week);
score := score + rushing_score(p_player_key, p_year, p_week);
score := score + td_score(p_player_key, p_year, p_week);
ELSIF l_position = 'rb' THEN
score := score + rushing_score(p_player_key, p_year, p_week);
score := score + td_score(p_player_key, p_year, p_week);
ELSIF l_position = 'wr' THEN
score := score + receiving_score(p_player_key, p_year, p_week);
score := score + td_score(p_player_key, p_year, p_week);
ELSIF l_position = 'te' THEN
score := score + receiving_score(p_player_key, p_year, p_week);
score := score + td_score(p_player_key, p_year, p_week);
ELSE
return 0;
END IF;
return score;
END;
$$ LANGUAGE plpgsql;
19. Procedures
CREATE OR REPLACE FUNCTION avg_yearly_score(
p_player_key VARCHAR, p_year int)
RETURNS REAL AS
$$
DECLARE
score INT;
i INT;
BEGIN
score := 0;
FOR i IN 1..17 LOOP
score := score + player_game_score(p_player_key,
p_year, i);
END LOOP;
RETURN score/16.0;
END;
$$ LANGUAGE plpgsql;
20. Debugging: RAISE
CREATE OR REPLACE FUNCTION passing_score(p_player_key VARCHAR, p_year int, p_week
int)
RETURNS INT AS
$$
...
BEGIN
score := 0;
-- 1 point for every 25 yards passing
SELECT p.pass_yards/25
INTO yardage_score
FROM passing p, games g
WHERE p.game_id = g.game_id
AND g.year = p_year
AND g.week = p_week
AND p.player_key = p_player_key;
IF yardage_score IS NULL THEN
yardage_score := 0;
END IF;
RAISE NOTICE 'Passing Yards Score: %', yardage_score;
22. Debugging: RAISE
nfl=# SELECT p.first_name, p.last_name,
p.position,
avg_yearly_score(p.player_key, 2006) AS score
FROM player p
WHERE p.player_key IN (SELECT *
FROM yearly_player(2005))
AND avg_yearly_score(p.player_key, 2006) > 10
ORDER BY 4 DESC;
23. Debugging: RAISE
CONTEXT: PL/pgSQL function player_game_score(character varying,integer,integer) line 15 at assignment
PL/pgSQL function avg_yearly_score(character varying,integer) line 9 at assignment
NOTICE: Passing TD Score: 8
CONTEXT: PL/pgSQL function player_game_score(character varying,integer,integer) line 15 at assignment
PL/pgSQL function avg_yearly_score(character varying,integer) line 9 at assignment
NOTICE: Interception Score: -2
CONTEXT: PL/pgSQL function player_game_score(character varying,integer,integer) line 15 at assignment
PL/pgSQL function avg_yearly_score(character varying,integer) line 9 at assignment
NOTICE: Passing Yards Score: 9
CONTEXT: PL/pgSQL function player_game_score(character varying,integer,integer) line 15 at assignment
PL/pgSQL function avg_yearly_score(character varying,integer) line 9 at assignment
NOTICE: Passing TD Score: 4
CONTEXT: PL/pgSQL function player_game_score(character varying,integer,integer) line 15 at assignment
PL/pgSQL function avg_yearly_score(character varying,integer) line 9 at assignment
NOTICE: Interception Score: 0
CONTEXT: PL/pgSQL function player_game_score(character varying,integer,integer) line 15 at assignment
PL/pgSQL function avg_yearly_score(character varying,integer) line 9 at assignment
NOTICE: Passing Yards Score: 10
CONTEXT: PL/pgSQL function player_game_score(character varying,integer,integer) line 15 at assignment
PL/pgSQL function avg_yearly_score(character varying,integer) line 9 at assignment
NOTICE: Passing TD Score: 8
CONTEXT: PL/pgSQL function player_game_score(character varying,integer,integer) line 15 at assignment
PL/pgSQL function avg_yearly_score(character varying,integer) line 9 at assignment
NOTICE: Interception Score: -4
CONTEXT: PL/pgSQL function player_game_score(character varying,integer,integer) line 15 at assignment
PL/pgSQL function avg_yearly_score(character varying,integer) line 9 at assignment
NOTICE: Passing Yards Score: 7
CONTEXT: PL/pgSQL function player_game_score(character varying,integer,integer) line 15 at assignment
PL/pgSQL function avg_yearly_score(character varying,integer) line 9 at assignment
NOTICE: Passing TD Score: 4
CONTEXT: PL/pgSQL function player_game_score(character varying,integer,integer) line 15 at assignment
PL/pgSQL function avg_yearly_score(character varying,integer) line 9 at assignment
NOTICE: Interception Score: -2
...
26. Profiler
$ plprofiler help
usage: plprofiler COMMAND [OPTIONS]
plprofiler is a command line tool to control the plprofiler extension
for PostgreSQL.
The input of this utility are the call and execution statistics, the
plprofiler extension collects. The final output is an HTML report of
the statistics gathered. There are several ways to collect the data,
save the data permanently and even transport it from a production
system to a lab system for offline analysis.
Use
plprofiler COMMAND --help
for detailed information about one of the commands below.
27. Profiler
$ plprofiler help
…
GENERAL OPTIONS:
All commands implement the following command line options to specify
the target database:
-h, --host=HOST The host name of the database server.
-p, --port=PORT The PostgreSQL port number.
-U, --user=USER The PostgreSQL user name to connect as.
-d, --dbname=DB The PostgreSQL database name or the DSN.
plprofiler currently uses psycopg2 to connect
to the target database. Since that is based
on libpq, all the above parameters can also
be specified in this option with the usual
conninfo string or URI formats.
--help Print the command specific help information
and exit.
28. Profiler
$ plprofiler help
…
TERMS:
The following terms are used in the text below and the help output of
individual commands:
in-memory-data The plprofiler extension collects run-time data in
per-backend hashtables (in-memory). This data is only
accessible in the current session and is lost when the
session ends or the hash tables are explicitly reset.
collected-data The plprofiler extension can copy the in-memory-data
into global tables, to make the statistics available
to other sessions. See the "monitor" command for
details. This data relies on the local database's
system catalog to resolve Oid values into object
definitions.
saved-dataset The in-memory-data as well as the collected-data can
be turned into a named, saved dataset. These sets
can be exported and imported onto other machines.
The saved datasets are independent of the system
catalog, so a report can be generated again later,
even even on a different system.
29. Profiler
$ plprofiler help
…
COMMANDS:
run Runs one or more SQL statements with the plprofiler
extension enabled and creates a saved-dataset and/or
an HTML report from the in-memory-data.
monitor Monitors a running application for a requested time
and creates a saved-dataset and/or an HTML report from
the resulting collected-data.
reset-data Deletes the collected-data.
save Saves the current collected-data as a saved-dataset.
list Lists the available saved-datasets.
edit Edits the metadata of one saved-dataset. The metadata
is used in the generation of the HTML reports.
report Generates an HTML report from either a saved-dataset
or the collected-data.
delete Deletes a saved-dataset.
export Exports one or all saved-datasets into a JSON file.
import Imports the saved-datasets from a JSON file, created
with the export command.
30. Profiler
nfl=# SELECT p.first_name, p.last_name, p.position,
nfl-# avg_yearly_score(p.player_key, 2006) AS score
nfl-# FROM player p
nfl-# WHERE p.player_key IN (SELECT * FROM
yearly_player(2005))
nfl-# AND avg_yearly_score(p.player_key, 2006) > 10
nfl-# ORDER BY 4 DESC;
first_name | last_name | position | score
------------+----------------+----------+---------
LaDainian | Tomlinson | rb | 22.5
Peyton | Manning | qb | 18.875
Larry | Johnson | rb | 17.875
Michael | Vick | qb | 15.875
Drew | Brees | qb | 15.75
Marc | Bulger | qb | 15.5625
Steven | Jackson | rb | 15.1875
Carson | Palmer | qb | 15.125
Willie | Parker | rb | 14.9375
31. Profiler
$ plprofiler run -h localhost -d nfl
-c "SELECT p.first_name, p.last_name, p.position,
avg_yearly_score(p.player_key, 2006) AS score
FROM player p WHERE p.player_key
IN (SELECT *
FROM yearly_player(2005))
AND avg_yearly_score(p.player_key, 2006) > 10
ORDER BY 4 DESC"
--name="Run 1" --title="PGDay"
--desc="PGDay Russia"
--output="run1.html"
37. Summary
● Be careful running these tools on
production systems
– They do have some performance impact
● Building the extensions are simple, but still
require a development environment