Presto Elasticsearch Connector at Presto Summit

•

3 likes•2,097 views

This document contains confidential and proprietary information belonging to Uber Technologies. It outlines that the document and any enclosed information is intended solely for the individual or entity addressed, and that any recipients are notified this contains confidential Uber information. Recipients are not authorized to disseminate or disclose the document or enclosed information without permission from Uber.

Thank you
Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be
reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any
information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the
use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise
exempt from disclosure under applicable law. All recipients of this document are notified that the information contained
herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any
way disclose this document or any of the enclosed information to any person other than employees of addressee to the
extent necessary for consultations with authorized personnel of Uber.

EDB Reference Architectures are designed to help new and existing users alike to quickly design a deployment architecture that suits their needs. They can be used as either the blueprint for a deployment, or as the basis for a design that enhances and extends the functionality and features offered. Add-on architectures allow users to easily extend their core database server deployment to add additional features and functionality "building block" style. In this webinar, we will review the following architectures: - Single Node - Multi Node with Asynchronous Replication - Multi Node with Synchronous Replication - Add-on Architectures Speaker: Michael Willer Sales Engineer, EDB

Scaling paypal workloads with oracle rac ss

Anil Nair

AIOUG : OTNYathra - Troubleshooting and Diagnosing Oracle Database 12.2 and O...

Sandesh Rao

hive HBase Metastore - Improving Hive with a Big Data Metadata Storage

DataWorks Summit/Hadoop Summit

AIOUG-GroundBreakers-Jul 2019 - 19c RAC

Sandesh Rao

Federated Queries Across Both Different Storage Mediums and Different Data En...

VMware Tanzu

Make Your Application “Oracle RAC Ready” & Test For It

Markus Michalewicz

RubiX

Shubham Tagra

Whether migrating a database or application from Oracle to Postgres, as a first step, we need to analyze the database objects(DDLs), to find out the incompatibilities between both the databases and estimate the time and cost required for the migration. In schema migration, having a good knowledge of Oracle and Postgres helps to identify incompatibilities and choose the right tool for analysis/conversion. In this webinar, we will discuss schema incompatibility hurdles when migrating from Oracle to Postgres and how to overcome them. What you will learn in this webinar: - How you identify if your oracle schema is compatible with PostgreSQL - Incompatibility hurdles and identifying them with Migration tools - How to Overcome incompatibility hurdles - Available tools for conversion - Post migration activities - functional testing, performance analysis, data migration, application switchover

PostgreSQL HA

haroonm

Presto as a Service - Tips for operation and monitoringTaro L. Saito

Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud

Markus Michalewicz

This presentation discusses the support guidelines for using Oracle Real Application Clusters (RAC) in virtualized environments, for which general Oracle Database support guidelines are discussed shortly first. First presented during DOAG 2021 User Conference, this presentation replaces its predecessor from 2016: https://www.slideshare.net/MarkusMichalewicz/how-to-use-oracle-rac-in-a-cloud-a-support-question

Oracle Database in-Memory Overivew

Maria Colgan

Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin

Databricks

Over the past few years, Python has become the default language for data scientists. Packages such as pandas, numpy, statsmodel, and scikit-learn have gained great adoption and become the mainstream toolkits. At the same time, Apache Spark has become the de facto standard in processing big data. Spark ships with a Python interface, aka PySpark, however, because Spark’s runtime is implemented on top of JVM, using PySpark with native Python library sometimes results in poor performance and usability. In this talk, we introduce a new type of PySpark UDF designed to solve this problem – Vectorized UDF. Vectorized UDF is built on top of Apache Arrow and bring you the best of both worlds – the ability to define easy to use, high performance UDFs and scale up your analysis with Spark.

File Format Benchmark - Avro, JSON, ORC & Parquet

DataWorks Summit/Hadoop Summit

Oracle Extended Clusters for Oracle RAC

Markus Michalewicz

"Extended" or "Stretched" Oracle RAC has been available as a concept for a while. Oracle RAC 12c Release 2 introduces an Oracle Extended Cluster configuration, in which the cluster understands the concept of sites and extended setups. This knowledge is used to more efficiently manage "Extended Oracle RAC", whether the nodes are 0.1 mile or 10 miles apart. The presentation was last updated on August 7th 2017 to add a reference to the new MAA White Paper: "Installing Oracle Extended Clusters on Exadata Database Machine" - http://www.oracle.com/technetwork/database/availability/maa-extclusters-installguide-3748227.pdf and to correct some minor details.

How to use 23c AHF AIOPS to protect Oracle Databases 23c

Sandesh Rao

Voldemort : Prototype to Production

Vinoth Chandar

The Volcano/Cascades Optimizer

宇傅

Re-imagine Data Monitoring with whylogs and Spark

Databricks

In the era of microservices, decentralized ML architectures and complex data pipelines, data quality has become a bigger challenge than ever. When data is involved in complex business processes and decisions, bad data can, and will, affect the bottom line. As a result, ensuring data quality across the entire ML pipeline is both costly, and cumbersome while data monitoring is often fragmented and performed ad hoc. To address these challenges, we built whylogs, an open source standard for data logging. It is a lightweight data profiling library that enables end-to-end data profiling across the entire software stack. The library implements a language and platform agnostic approach to data quality and data monitoring. It can work with different modes of data operations, including streaming, batch and IoT data. In this talk, we will provide an overview of the whylogs architecture, including its lightweight statistical data collection approach and various integrations. We will demonstrate how the whylogs integration with Apache Spark achieves large scale data profiling, and we will show how users can apply this integration into existing data and ML pipelines.

Data Infrastructure at LinkedIn

Amy W. Tang

The basics of fluentd

Treasure Data, Inc.

Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...

Amazon Web Services

Amazon Aurora with PostgreSQL Compatibility is a relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. We review the functionality in order to understand the architectural differences that contribute to improved scalability, availability, and durability. We also dive deep into the capabilities of the service and review the latest available features. Finally, we walk through the techniques that can be used to migrate to Amazon Aurora.

Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...

HostedbyConfluent

Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Ethan Guo | Current 2022 Back in 2016, Apache Hudi brought transactions, change capture on top of data lakes, what is today referred to as the Lakehouse architecture. In this session, we first introduce Apache Hudi and the key technology gaps it fills in the modern data architecture. Bridging traditional data lakes and warehouses, Hudi helps realize the Lakehouse vision, by bringing transactions, optimized table metadata to data lakes and powerful storage layout optimizations, moving them closer to cloud warehouses of today. Viewed from a data engineering lens, Hudi also plays a key unifying role between the batch and stream processing worlds, by acting as a columnar, server-less ""state store"" for batch jobs, ushering in what we call the incremental processing model, where batch jobs can consume new data, update/delete intermediate results in a Hudi table, instead of re-computing/re-write entire output like old-school big batch jobs. Rest of talk focusses on a deep dive into the some of the time-tested design choices and tradeoffs in Hudi, that helps power some of the largest transactional data lakes on the planet today. We will start by describing a tour of the storage format design, including data, metadata layouts and of course Hudi's timeline, an event log that is central to implementing ACID transactions and concurrency control. We will delve deeper into the practical concurrency control pitfalls in data lakes, and show how Hudi's hybrid approach combining MVCC with optimistic concurrency control, lowers contention and unlocks minute-level near real-time commits to Hudi tables. We will conclude with code examples that showcase Hudi's rich set of table services that perform vital table management such as cleaning older file versions, compaction of delta logs into base files, dynamic re-clustering for faster query performance, or the more recently introduced indexing service that maintains Hudi's multi-modal indexing capabilities.

Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...

Milomir Vojvodic

"It can always get worse!" – Lessons Learned in over 20 years working with Or...

Markus Michalewicz

Automating a PostgreSQL High Availability Architecture with Ansible

EDB

Highly available databases are essential to organizations depending on mission-critical, 24/7 access to data. Postgres is widely recognized as an excellent open-source database, with critical maturity and features that allow organizations to scale and achieve high availability. EDB reference architectures are designed to help new and existing users alike to quickly design a deployment architecture that suits their needs. Users can use these reference architectures as a blueprint or as the basis for a design that enhances and extends the functionality and features offered. This webinar will explore: - Concepts of High Availability - Quick review of EDB reference architectures - EDB tools to create a highly available PostgreSQL architecture - Options for automating the deployment of reference architectures - EDB Ansible® roles helping in automating the deployment of reference architectures - Features and capabilities of Ansible roles - Automating the provisioning of the resources in the cloud using Terraform™

How to build a streaming Lakehouse with Flink, Kafka, and Hudi

Flink Forward

Flink Forward San Francisco 2022. With a real-time processing engine like Flink and a transactional storage layer like Hudi, it has never been easier to build end-to-end low-latency data platforms connecting sources like Kafka to data lake storage. Come learn how to blend Lakehouse architectural patterns with real-time processing pipelines with Flink and Hudi. We will dive deep on how Flink can leverage the newest features of Hudi like multi-modal indexing that dramatically improves query and write performance, data skipping that reduces the query latency by 10x for large datasets, and many more innovations unique to Flink and Hudi. by Ethan Guo & Kyle Weller

Hive on Spark at Uber Scale

Sahil Takiar

Migrating a data intensive microservice from Python to Go

Nikolay Stoitsev

As Uber is hyper-growing as a company so does our need for scalable and resilient systems. In this talk, I’m going to tell the story of how my team migrated from Python to Go, a microservice that processes millions of events every day. First, we are going to start with the rationale behind the migration. Then we are going to go over the Python and Go tech stacks that we use. Last but not least, I’m also going to share our approach for migrating the service while running in production, adding new features and making sure there are no regressions.

What's hot

Oracle to Postgres Schema Migration Hustle

EDB

PostgreSQL HA

haroonm

Presto as a Service - Tips for operation and monitoringTaro L. Saito

Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud

Markus Michalewicz

Oracle Database in-Memory Overivew

Maria Colgan

Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin

Databricks

File Format Benchmark - Avro, JSON, ORC & Parquet

DataWorks Summit/Hadoop Summit

Oracle Extended Clusters for Oracle RAC

Markus Michalewicz

How to use 23c AHF AIOPS to protect Oracle Databases 23c

Sandesh Rao

Voldemort : Prototype to Production

Vinoth Chandar

The Volcano/Cascades Optimizer

宇傅

Re-imagine Data Monitoring with whylogs and Spark

Databricks

Data Infrastructure at LinkedIn

Amy W. Tang

The basics of fluentd

Treasure Data, Inc.

Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...

Amazon Web Services

Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...

HostedbyConfluent

Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...

Milomir Vojvodic

"It can always get worse!" – Lessons Learned in over 20 years working with Or...

Markus Michalewicz

Automating a PostgreSQL High Availability Architecture with Ansible

EDB

How to build a streaming Lakehouse with Flink, Kafka, and Hudi

Flink Forward

What's hot (20)

Oracle to Postgres Schema Migration Hustle

PostgreSQL HA

Presto as a Service - Tips for operation and monitoring

Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud

Oracle Database in-Memory Overivew

Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin

File Format Benchmark - Avro, JSON, ORC & Parquet

Oracle Extended Clusters for Oracle RAC

How to use 23c AHF AIOPS to protect Oracle Databases 23c

Voldemort : Prototype to Production

The Volcano/Cascades Optimizer

Re-imagine Data Monitoring with whylogs and Spark

Data Infrastructure at LinkedIn

The basics of fluentd

Deep Dive on Amazon Aurora with PostgreSQL Compatibility (DAT305-R1) - AWS re...

Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...

Oracle real time replica solution (Oracle GoldenGate) in Telco and FSI vertic...

"It can always get worse!" – Lessons Learned in over 20 years working with Or...

Automating a PostgreSQL High Availability Architecture with Ansible

How to build a streaming Lakehouse with Flink, Kafka, and Hudi

Similar to Presto Elasticsearch Connector at Presto Summit

Hive on Spark at Uber Scale

Sahil Takiar

Migrating a data intensive microservice from Python to Go

Nikolay Stoitsev

Uber @ Career Days 2017 (Sofia University)

Marin Dimitrov

Machine Learning @ Uber

Marin Dimitrov

Ai in logistics at uber

Ankit Jain

Launch X431 CReader VII+ User Manual

Tim Miller

Uber @ Telerik Academy 2018

Marin Dimitrov

Similar to Presto Elasticsearch Connector at Presto Summit (7)

Hive on Spark at Uber Scale

Migrating a data intensive microservice from Python to Go

Uber @ Career Days 2017 (Sofia University)

Machine Learning @ Uber

Ai in logistics at uber

Launch X431 CReader VII+ User Manual

Uber @ Telerik Academy 2018

Recently uploaded

Latest trends in computer networking.pptx

JungkooksNonexistent

原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样

3ipehhoa

原版纸张【微信：741003700 】【(uob毕业证书)英国伯明翰大学毕业证】【微信：741003700 】学位证，留信认证（真实可查，永久存档）offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原海外各大学 Bachelor Diploma degree, Master Degree Diploma 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx

TristanJasperRamos

1.Wireless Communication System_Wireless communication is a broad term that i...

JeyaPerumal1

Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors. Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices. Features of Wireless Communication The evolution of wireless technology has brought many advancements with its effective features. The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication). Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.

How to Use Contact Form 7 Like a Pro.pptx

Gal Baras

ER(Entity Relationship) Diagram for online shopping - TAE

Himani415946

guildmasters guide to ravnica Dungeons & Dragons 5...

Rogerio Filho

Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines

Sanjeev Rampal

Talk presented at Kubernetes Community Day, New York, May 2024. Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics. 1) Key patterns for Multi-cluster architectures 2) Architectural comparison of several OSS/ CNCF projects to address these patterns 3) Evolution trends for the APIs of these projects 4) Some design recommendations & guidelines for adopting/ deploying these solutions.

1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样

3ipehhoa

原版纸张【微信：741003700 】【(bath毕业证书)英国巴斯大学毕业证学位证】【微信：741003700 】学位证，留信认证（真实可查，永久存档）offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原海外各大学 Bachelor Diploma degree, Master Degree Diploma 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样

3ipehhoa

原版纸张【微信：741003700 】【(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单】【微信：741003700 】学位证，留信认证（真实可查，永久存档）offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原海外各大学 Bachelor Diploma degree, Master Degree Diploma 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

Output determination SAP S4 HANA SAP SD CC

ShahulHameed54211

BASIC C++ lecture NOTE C++ lecture 3.pptx

natyesu

test test test test testtest test testtest test testtest test testtest test ...

Arif0071

History+of+E-commerce+Development+in+China-www.cfye-commerce.shop

laozhuseo02

The+Prospects+of+E-Commerce+in+China.pptx

laozhuseo02

This 7-second Brain Wave Ritual Attracts Money To You.!

nirahealhty

Recently uploaded (16)

Latest trends in computer networking.pptx

原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样

Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx

1.Wireless Communication System_Wireless communication is a broad term that i...

How to Use Contact Form 7 Like a Pro.pptx

ER(Entity Relationship) Diagram for online shopping - TAE

guildmasters guide to ravnica Dungeons & Dragons 5...

Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines

1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样

急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样

Output determination SAP S4 HANA SAP SD CC

BASIC C++ lecture NOTE C++ lecture 3.pptx

test test test test testtest test testtest test testtest test testtest test ...

History+of+E-commerce+Development+in+China-www.cfye-commerce.shop

The+Prospects+of+E-Commerce+in+China.pptx

This 7-second Brain Wave Ritual Attracts Money To You.!

Presto Elasticsearch Connector at Presto Summit

6. ● ○ ■ ■ ■ ○ ■ ■ ■ ● ○ ●

7. ● ● ● ●

8. ● ○ ○ ● ○ ○ ● ○ ○ ○

10. ● ○ ● ○ ● ○ ○ ● ○

11.

12. ● ● ● ○

13.

14.

15. ● ○ ○ ● ● ● ● ●

16. ● ○ ○ ● ● ●

17. Thank you Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber.

Presto Elasticsearch Connector at Presto Summit

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Presto Elasticsearch Connector at Presto Summit

Similar to Presto Elasticsearch Connector at Presto Summit (7)

More from Zhenxiao Luo

More from Zhenxiao Luo (11)

Recently uploaded

Recently uploaded (16)

Presto Elasticsearch Connector at Presto Summit