A brief history of Instagram's adoption cycle of the open source distributed database Apache Cassandra, in addition to details about it's use case and implementation. This was presented at the San Francisco Cassandra Meetup at the Disqus HQ in August 2013.
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
Join the Altinity experts as we dig into ClickHouse sharding and replication, showing how they enable clusters that deliver fast queries over petabytes of data. We’ll start with basic definitions of each, then move to practical issues. This includes the setup of shards and replicas, defining schema, choosing sharding keys, loading data, and writing distributed queries. We’ll finish up with tips on performance optimization.
#ClickHouse #datasets #ClickHouseTutorial #opensource #ClickHouseCommunity #Altinity
-----------------
Join ClickHouse Meetups: https://www.meetup.com/San-Francisco-...
Check out more ClickHouse resources: https://altinity.com/resources/
Visit the Altinity Documentation site: https://docs.altinity.com/
Contribute to ClickHouse Knowledge Base: https://kb.altinity.com/
Join the ClickHouse Reddit community: https://www.reddit.com/r/Clickhouse/
----------------
Learn more about Altinity!
Site: https://www.altinity.com
LinkedIn: https://www.linkedin.com/company/alti...
Twitter: https://twitter.com/AltinityDB
A brief history of Instagram's adoption cycle of the open source distributed database Apache Cassandra, in addition to details about it's use case and implementation. This was presented at the San Francisco Cassandra Meetup at the Disqus HQ in August 2013.
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
Join the Altinity experts as we dig into ClickHouse sharding and replication, showing how they enable clusters that deliver fast queries over petabytes of data. We’ll start with basic definitions of each, then move to practical issues. This includes the setup of shards and replicas, defining schema, choosing sharding keys, loading data, and writing distributed queries. We’ll finish up with tips on performance optimization.
#ClickHouse #datasets #ClickHouseTutorial #opensource #ClickHouseCommunity #Altinity
-----------------
Join ClickHouse Meetups: https://www.meetup.com/San-Francisco-...
Check out more ClickHouse resources: https://altinity.com/resources/
Visit the Altinity Documentation site: https://docs.altinity.com/
Contribute to ClickHouse Knowledge Base: https://kb.altinity.com/
Join the ClickHouse Reddit community: https://www.reddit.com/r/Clickhouse/
----------------
Learn more about Altinity!
Site: https://www.altinity.com
LinkedIn: https://www.linkedin.com/company/alti...
Twitter: https://twitter.com/AltinityDB
EXPLAIN ANALYZE is a new query profiling tool first released in MySQL 8.0.18. This presentation covers how this new feature works, both on the surface and on the inside, and how you can use it to better understand your queries, to improve them and make them go faster.
This presentation is for everyone who has ever had to understand why a query is executed slower than anticipated, and for everyone who wants to learn more about query plans and query execution in MySQL.
ProxySQL is a popular database proxy for MySQL/MariaDB servers. This focuses on the possible High availability options for ProxySQL and operations of inbuilt clustering feature in ProxySQL. This tech talk was presented at Mydbops Database Meetup on 27-04-2019 by Aakash M, Database Administrator with Mydbops and Vignesh Prabhu, Database Administrator with Mydbops.
Debezium is a Kafka Connect plugin that performs Change Data Capture from your database into Kafka. This talk demonstrates how this can be leveraged to move your data from one database platform such as MySQL to PostgreSQL. A working example is available on GitHub (github.com/gh-mlfowler/debezium-demo).
This presentation will recount the story of Macys.com (and Bloomingdales.com)'s selection and migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
We'll start with a mercifully brief backgrounder on our website and our business. Then we will go over the various technologies that we considered, as well as our use case-based performance benchmarks that led to the decision to go with Cassandra.
We'll cover the various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks.
One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
And, finally, we will wrap up with our "lessons learned" and a brief look at our future plans.
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
MySQL Administrator
Basic course
- MySQL 개요
- MySQL 설치 / 설정
- MySQL 아키텍처 - MySQL 스토리지 엔진
- MySQL 관리
- MySQL 백업 / 복구
- MySQL 모니터링
Advanced course
- MySQL Optimization
- MariaDB / Percona
- MySQL HA (High Availability)
- MySQL troubleshooting
네오클로바
http://neoclova.co.kr/
A comprehensive walkthrough of how to manage infrastructure-as-code using Terraform. This presentation includes an introduction to Terraform, a discussion of how to manage Terraform state, how to use Terraform modules, an overview of best practices (e.g. isolation, versioning, loops, if-statements), and a list of gotchas to look out for.
For a written and more in-depth version of this presentation, check out the "Comprehensive Guide to Terraform" blog post series: https://blog.gruntwork.io/a-comprehensive-guide-to-terraform-b3d32832baca
Building Distributed Applications with AWS Step FunctionsAmazon Web Services
This talk covers how you can use the new AWS Step Functions service to coordinate different components of your application, maintain state, and build sophisticated serverless solutions.
고승범(peter.ko) / kakao corp.(인프라2팀)
---
카카오에서는 빅데이터 분석, 처리부터 모든 개발 플랫폼을 이어주는 솔루션으로 급부상한 카프카(kafka)를 전사 공용 서비스로 운영하고 있습니다. 전사 공용 카프카를 직접 운영하면서 경험한 트러블슈팅과 운영 노하우 등을 공유하고자 합니다. 특히 카프카를 처음 접하시는 분들이나 이미 사용 중이신 분들이 많이 궁금해하는 프로듀서와 컨슈머 사용 시의 주의점 등에 대해서도 설명합니다.
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This second block of lightning talks will cover the following topics:
- New IO Scheduler and Disk Parallelism
- Per-Service-Level Timeouts
- Better Workload Estimation for Backpressure and Out-of-Memory Conditions
- Large Partition Handling Improvements
- Optimizing Reverse Queries
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
EXPLAIN ANALYZE is a new query profiling tool first released in MySQL 8.0.18. This presentation covers how this new feature works, both on the surface and on the inside, and how you can use it to better understand your queries, to improve them and make them go faster.
This presentation is for everyone who has ever had to understand why a query is executed slower than anticipated, and for everyone who wants to learn more about query plans and query execution in MySQL.
ProxySQL is a popular database proxy for MySQL/MariaDB servers. This focuses on the possible High availability options for ProxySQL and operations of inbuilt clustering feature in ProxySQL. This tech talk was presented at Mydbops Database Meetup on 27-04-2019 by Aakash M, Database Administrator with Mydbops and Vignesh Prabhu, Database Administrator with Mydbops.
Debezium is a Kafka Connect plugin that performs Change Data Capture from your database into Kafka. This talk demonstrates how this can be leveraged to move your data from one database platform such as MySQL to PostgreSQL. A working example is available on GitHub (github.com/gh-mlfowler/debezium-demo).
This presentation will recount the story of Macys.com (and Bloomingdales.com)'s selection and migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax.
We'll start with a mercifully brief backgrounder on our website and our business. Then we will go over the various technologies that we considered, as well as our use case-based performance benchmarks that led to the decision to go with Cassandra.
We'll cover the various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks.
One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations.
And, finally, we will wrap up with our "lessons learned" and a brief look at our future plans.
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
MySQL Administrator
Basic course
- MySQL 개요
- MySQL 설치 / 설정
- MySQL 아키텍처 - MySQL 스토리지 엔진
- MySQL 관리
- MySQL 백업 / 복구
- MySQL 모니터링
Advanced course
- MySQL Optimization
- MariaDB / Percona
- MySQL HA (High Availability)
- MySQL troubleshooting
네오클로바
http://neoclova.co.kr/
A comprehensive walkthrough of how to manage infrastructure-as-code using Terraform. This presentation includes an introduction to Terraform, a discussion of how to manage Terraform state, how to use Terraform modules, an overview of best practices (e.g. isolation, versioning, loops, if-statements), and a list of gotchas to look out for.
For a written and more in-depth version of this presentation, check out the "Comprehensive Guide to Terraform" blog post series: https://blog.gruntwork.io/a-comprehensive-guide-to-terraform-b3d32832baca
Building Distributed Applications with AWS Step FunctionsAmazon Web Services
This talk covers how you can use the new AWS Step Functions service to coordinate different components of your application, maintain state, and build sophisticated serverless solutions.
고승범(peter.ko) / kakao corp.(인프라2팀)
---
카카오에서는 빅데이터 분석, 처리부터 모든 개발 플랫폼을 이어주는 솔루션으로 급부상한 카프카(kafka)를 전사 공용 서비스로 운영하고 있습니다. 전사 공용 카프카를 직접 운영하면서 경험한 트러블슈팅과 운영 노하우 등을 공유하고자 합니다. 특히 카프카를 처음 접하시는 분들이나 이미 사용 중이신 분들이 많이 궁금해하는 프로듀서와 컨슈머 사용 시의 주의점 등에 대해서도 설명합니다.
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This second block of lightning talks will cover the following topics:
- New IO Scheduler and Disk Parallelism
- Per-Service-Level Timeouts
- Better Workload Estimation for Backpressure and Out-of-Memory Conditions
- Large Partition Handling Improvements
- Optimizing Reverse Queries
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Node has captured the attention of early adopters by clearly differentiating itself as being asynchronous from the ground up while remaining accessible. Now that server side JavaScript is at the cutting edge of the asynchronous, real time web, it is in a much better position to establish itself as the go to language for also making synchronous, CRUD webapps and gain a stronger foothold on the server.
This talk covers the current state of server side JavaScript beyond Node. It introduces Common Node, a synchronous CommonJS compatibility layer using node-fibers which bridges the gap between the different platforms. We look into Common Node's internals, compare its performance to that of other implementations such as RingoJS and go through some ideal use cases.
This talk was given at the Dutch PHP Conference 2011 and details the use of Comet (aka reverse ajax or ajax push) technologies and the importance of websockets and server-sent events. More information is available at http://joind.in/3237.
DSLing your System For Scalability Testing Using Gatling - Dublin Scala User ...Aman Kohli
The power of Gatling is the DSL it provides to allow writing meaningful and expressive tests. We provide an overview of the framework, a description of their development environment and goals, and present their test results.
Source code available https://github.com/lawlessc/random-response-time
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
Spark SQL enables Spark to perform efficient and fault-tolerant relational query processing with analytics database technologies. The relational queries are compiled to the executable physical plans consisting of transformations and actions on RDDs with the generated Java code. The code is compiled to Java bytecode, executed at runtime by JVM and optimized by JIT to native machine code at runtime. This talk will take a deep dive into Spark SQL execution engine. The talk includes pipelined execution, whole-stage code generation, UDF execution, memory management, vectorized readers, lineage based RDD transformation and action.
Leveraging Hadoop in your PostgreSQL EnvironmentJim Mlodgenski
This talk will begin with a discussion of the strengths of PostgreSQL and Hadoop. We will then lead into a high level overview of Hadoop and its community of projects like Hive, Flume and Sqoop. Finally, we will dig down into various use cases detailing how you can leverage Hadoop technologies for your PostgreSQL databases today. The use cases will range from using HDFS for simple database backups to using PostgreSQL and Foreign Data Wrappers to do low latency analytics on your Big Data.
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"DataStax Academy
The ColumnFamily data model and wide-row support provides the ability to store and access data efficiently in a de-normalized state. Recent enhancements for CQL's spare tables and built-in indexing provide the capability to store data in a manner similar to that of relational databases. For many use cases hybrid approaches are needed, because complete de-normalization is appropriate for some access patterns whereas more structured data is appropriate for others. At times a single logical event becomes multiple insertions across multiple column families. Likewise a user request might require a several reads across different column families. This talk describes some of these scenarios and demonstrates how advanced operations such multiple step procedures, filtering, intersection, and paging can be implemented client side or server side with the help of the IntraVert plugin.
CouchDB Mobile - From Couch to 5K in 1 HourPeter Friese
In this talk, I explain how to use CouchDB mobile to connect your iPhone or Android phone with a a remote ChouchDB to build a RunKeeper clone. The code for this talk is available at https://github.com/peterfriese/CouchTo5K
This is my presentation from TechBeats #3 hosted by Applause about Server-Side Swift framework called Vapor.
Swift is a great language and possibility of using it also in backend is a huge benefit for any iOS developer out there. Using Vapor is a seamless experience. With this framework creating advance APIs by iOS developer is as easy as writing simple iOS app.
https://www.meetup.com/TechBeats-hosted-by-Applause/events/254910023/
Continuous Application with Structured Streaming 2.0Anyscale
Introduction to Continuous Application with Apache Spark 2.0 Structured Streaming. This presentation is a culmination and curation from talks and meetups presented by Databricks engineers.
The notebooks on Structured Streaming demonstrates aspects of the Structured Streaming APIs
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
8. Who uses Presto - Airbnb
Airpal - a web-based, query execution tool
Presto is amazing. It's an order of magnitude
faster than Hive in most our use cases. It reads
directly from HDFS, so unlike Redshift, there
isn't a lot of ETL before you can use it. It just
works.
- Christopher Gutierrez, Manager of Online Analytics, Airbnb
12. Presto is
•Fast !!! (10x faster than Hive)
•Even faster with new Presto ORC reader
•Written in Java with a pluggable backend
•Not SQL-like, but ANSI-SQL
•Code generation like LLVM
•Not only source is open, but open sourced (No private branch)
15. Presto is
CREATE TABLE mysql.hello.order_item AS
SELECT o.*, i.*
FROM hive.world.orders o —― TABLESAMPLE SYSTEM (10)
JOIN mongo.deview.lineitem i —― TABLESAMPLE BERNOULLI (40)
ON o.orderkey = i.orderkey
WHERE conditions..
16. Coordinator
Presto - Planner
Fragmenter
Worker
SQL Analyzer
Analysis
Logical
Plan
Optimizer
Plan
Plan
Fragment
Distributed
Query
Scheduler
Stage
Execution
Plan
Worker
Worker
Local
Execution
Planner
TASK
TASK
17. Presto - Page
PAGE
- positionCount
VAR_WIDTH BLOCK
nulls
Offsets
Values
FIEXED_WIDTH BLOCK
nulls
Values
positionCount blockCount 11 F I X E D _ W I D T H po
sitionCount bit encoded nullFlags values length
values
14 V A R I A B L E _ W
I D T H positionCount offsets[0] offsets[1] offsets[2…]
offsets[pos-1] offsets[pos] bit encoded nullFlags
values
Page / Block serialization
18. Presto - Cluster Memory Manager
Coordinator
Worker
Worker
Worker
GET /v1/memory
@Config(“query.max-memory”) = 20G
@Config(“query.max-memory-per-node”) = 1G
@Config(“resources.reserved-system-memory”) = 40% of -Xmx
System
reserved-system-memory
Reserved
max-memory-per-node
General
Task
Block All tasks
26. Airlift - Distributed service framework
•https://github.com/airlift/airlift
•Core of Presto communication
•HTTP
•Bootstrap
•Node discovery
•RESTful API
•Dependency Injection
•Configuration
•Utilities
@Path("/v2/event")
public class EventResource {
@POST
public Response createQuery(EventRequests events) {
…
}
}
public class CollectorMainModule implements ConfigurationAwareModule {
@Override
public synchronized void configure(Binder binder) {
discoveryBinder(binder).bindHttpAnnouncement("collector");
jsonCodecBinder(binder).bindJsonCodec(EventRequest.class);
jaxrsBinder(binder).bind(EventResource.class);
}
public static void main(String[] args){
Bootstrap app = new Bootstrap(ImmutableList.of(
new NodeModule(),
new DiscoveryModule(),
new HttpServerModule(),
new JsonModule(), new JaxrsModule(true),
new EventModule(),
new CollectorMainModule()
));
Injector injector = app.strictConfig().initialize();
injector.getInstance(Announcer.class).start();
}
}
ex) https://github.com/miniway/presto-event-collector
27. Fastutil - Fast Java collection
•FastUtil 6.6.0 turned out to be consistently fast.
•Koloboke is getting second in many tests.
•GS implementation is good enough, but is slower than FastUtil and Koloboke.
http://java-performance.info/hashmap-overview-jdk-fastutil-goldman-sachs-hppc-koloboke-trove-january-2015/
28. ASM - Bytecode manipulation
package pkg;
public interface SumInterface {
long sum(long value);
}
public class MyClass implements SumInterface {
private long result = 0L;
public MyClass(long value) {
result = value;
}
@Override
public long sum(long value) {
result += value;
return result;
}
}
ClassWriter cw = new ClassWriter(0);
cw.visit(V1_7, ACC_PUBLIC,
"pkg/MyClass", null,
"java/lang/Object",
new String[] { "pkg/SumInterface" });
cw.visitField(ACC_PRIVATE,
"result", "J", null, new Long(0));
// constructor
MethodVisitor m = cw.visitMethod(ACC_PUBLIC,
"<init>", "(J)V", null, null);
m.visitCode();
// call super()
m.visitVarInsn(ALOAD, 0); // this
m.visitMethodInsn(INVOKESPECIAL,
"java/lang/Object", "<init>", “()V",
false);
29. ASM - Bytecode manipulation (Cont.)
// this.result = value
m.visitVarInsn(ALOAD, 0); // this
m.visitVarInsn(LLOAD, 1 ); // value
m.visitFieldInsn(PUTFIELD,
"pkg/MyClass", "result", "J");
m.visitInsn(RETURN);
m.visitMaxs(-1, -1).visitEnd();
// public long sum(long value)
m = cw.visitMethod(ACC_PUBLIC , "sum", "(J)J",
null, null);
m.visitCode();
m.visitVarInsn(ALOAD, 0); // this
m.visitVarInsn(ALOAD, 0); // this
m.visitFieldInsn(GETFIELD,
"pkg/MyClass", "result", "J");
m.visitVarInsn(LLOAD, 1); // value
// this.result + value
m.visitInsn(LADD);
m.visitFieldInsn(PUTFIELD,
"pkg/MyClass", "result", "J");
m.visitVarInsn(ALOAD, 0); // this
m.visitFieldInsn(GETFIELD,
"pkg/MyClass", "result", "J");
m.visitInsn(LRETURN);
m.visitMaxs(-1, -1).visitEnd();
cw.visitEnd();
byte[] bytes = cw.toByteArray();
ClassLoader.defindClass(bytes)
30. Library - Misc.
•JDK 8u40 +
•Guice - Lightweight dependency injection
•Guava - Replacing Java8 Stream, Optional and Lambda
•ANTLR4 - Parser generator, SQL parser
•Jetty - HTTP Server and Client
•Jackson - JSON
•Jersey - RESTful API
32. Code Generation
•ASM
•Runtime Java classes and methods generation base on SQL
•30% of performance gain
•Where
•Filter and Projection
•Join Lookup source
•Join Probe
•Order By
•Aggregation
33. Code Generation - Filter
SELECT * FROM lineitem
WHERE orderkey = 100 AND quantity = 200
AND
EQ
(#0,100)
EQ
(#1,200)
Logical Planner
class AndOperator extends Operator {
private Operator left = new EqualOperator(#1, 100);
private Operator right = new EqualOperator(#2, 200);
@Override
public boolean evaluate(Cursor cur)
{
if (!left.evaluate(cur)) {
return false;
}
return right.evaluate(cur);
}
}
class EqualOperator extends Operator {
@Override
public boolean evaluate(Cursor c)
{
return cur.getValue(position).equals(value);
}
}
34. Code Generation - Filter
// invoke MethodHandle( $operator$EQUAL(#0, 100) )
push cursor.getValue(#0)
push 100
$statck = invokeDynamic boostrap(0) $operator$EQUAL
if (!$stack) { goto end; }
push cursor.getValue(#1)
push 200
$stack = invokeDynamic boostrap(0) $operator$EQUAL
end:
return $stack
@ScalarOperator(EQUAL)
@SqlType(BOOLEAN)
public static boolean equal(@SqlType(BIGINT) long left,
@SqlType(BIGINT) long right){
return left == right;
}
=> MethodHandle(“$operator$EQUAL(long, long): boolean”)
AND
$op$EQ
(#0,100)
$op$EQ
(#1,200)
Local Execution Planner
37. Code Generation - PageHash (Cont.)
long hashRow (int position,
Block[] blocks) {
int result = 0;
for (int i = 0; i < hashChannels.size(); i++) {
int hashChannel = hashChannels.get(i);
Type type = types.get(hashChannel);
result = result * 31 +
type.hash(blocks[i], position);
}
return result;
}
long (Compiled)hashRow (int position,
Block[] blocks) {
int result = 0;
result = result * 31 +
type_colX.hash(block[0], position);
result = result * 31 +
type_colY.hash(block[1], position);
return result;
}
38. Code Generation - PageHash (Cont.)
boolean equalsRow (
int leftBlockIndex, int leftPosition,
int rightPosition, Block[] rightBlocks) {
for (int i = 0; i < hashChannels.size(); i++) {
int hashChannel = hashChannels.get(i);
Type type = types.get(hashChannel);
Block leftBlock =
channels.get(hashChannel)
.get(leftBlockIndex);
if (!type.equalTo(leftBlock, leftPosition,
rightBlocks[i], rightPosition)) {
return false;
}
}
return true;
}
boolean (Compiled)equalsRow (
int leftBlockIndex, int leftPosition,
int rightPosition, Block[] rightBlocks) {
Block leftBlock =
channels_colX.get(leftBlockIndex);
if (!type.equalTo(leftBlock, leftPosition,
rightBlocks[0], rightPosition)) {
return false;
}
leftBlock =
channels_colY.get(leftBlockIndex);
if (!type.equalTo(leftBlock, leftPosition,
rightBlocks[1], rightPosition)) {
return false;
}
return true;
}
39. Method Variable Binding
1. regexp_like(string, pattern) → boolean
2. regexp_like(string, cast(pattern as RegexType)) // OperatorType.CAST
3. regexp_like(string, new Regex(pattern))
4. MethodHandle handle = MethodHandles.insertArgument(1, new Regex(pattern))
5. handle.invoke (string)
@ScalarOperator(OperatorType.CAST)
@SqlType(“RegExp”)
public static Regex castToRegexp(@SqlType(VARCHAR) Slice pattern){
return new Regex(pattern.getBytes(), 0, pattern.length());
}
@ScalarFunction
@SqlType(BOOLEAN)
public static boolean regexpLike(@SqlType(VARCHAR) Slice source,
@SqlType(“RegExp”) Regex pattern){
Matcher m = pattern.matcher(source.getBytes());
int offset = m.search(0, source.length());
return offset != -1;
}
42. Plugin - Raptor
•Storage data in flash on the Presto machines in ORC format
•Metadata is stored in MySQL (Extendable)
•Near real-time loads (5 - 10mins)
•3TB / day, 80B rows/day , 5 secs query
•CREATE VIEW myview AS SELECT …
•DELETE FROM tab WHERE conditions…
•UPDATE (Future)
•Coarse grained Index : min / max value of all columns
•Compaction
•Backup Store (Extendable)
No more ?!
43. Plugin - How to write
•https://prestodb.io/docs/current/develop/spi-overview.html
•ConnectorFactory
•ConnectorMetadata
•ConnectorSplitManager
•ConnectorHandleResolver
•ConnectorRecordSetProvider (PageSourceProvider)
•ConnectorRecordSinkProvider (PageSinkProvider)
•Add new Type
•Add new Function (A.K.A UDF)
44. Plugin - MongoDB
•https://github.com/facebook/presto/pull/3337
•5 Non-business days
•Predicate Pushdown
•Add a Type (ObjectId)
•Add UDFs (objectid(), objectid(string))
public class MongoPlugin implements Plugin {
@Override
public <T> List<T> getServices(Class<T> type) {
if (type == ConnectorFactory.class) {
return ImmutableList.of(
new MongoConnectorFactory(…));
} else if (type == Type.class) {
return ImmutableList.of(OBJECT_ID);
} else if (type == FunctionFactory.class) {
return ImmutableList.of(
new MongoFunctionFactory(typeManager));
}
return ImmutableList.of();
}
}
45. Plugin - MongoDB
class MongoFactory implements ConnectorFactory {
@Override
public Connector create(String connectorId) {
Bootstrap app = new Bootstrap(new
MongoClientModule());
return app.initialize()
.getInstance(MongoConnector.class);
}
}
class MongoClientModule implements Module {
@Override
public void configure(Binder binder){
binder.bind(MongoConnector.class)
.in(SINGLETON);
…
configBinder(binder)
.bindConfig(MongoClientConfig.class);
}
}
class MongoConnector implements Connector {
@Inject
public MongoConnector(
MongoSession mongoSession,
MongoMetadata metadata,
MongoSplitManager splitManager,
MongoPageSourceProvider
pageSourceProvider,
MongoPageSinkProvider
pageSinkProvider,
MongoHandleResolver
handleResolver) {
…
}
}
46. Plugin - MongoDB UDF
public class MongoFunctionFactory
implements FunctionFactory {
@Override
public List<ParametricFunction> listFunctions()
{
return new FunctionListBuilder(typeManager)
.scalar(ObjectIdFunctions.class)
.getFunctions();
}
}
public class ObjectIdType
extends AbstractVariableWidthType {
ObjectIdType OBJECT_ID = new ObjectIdType();
@JsonCreator
public ObjectIdType() {
super(parseTypeSignature("ObjectId"),
Slice.class);
}
}
public class ObjectIdFunctions {
@ScalarFunction("objectid")
@SqlType("ObjectId")
public static Slice ObjectId() {
return Slices.wrappedBuffer(
new ObjectId().toByteArray());
}
@ScalarFunction("objectid")
@SqlType("ObjectId")
p.s Slice ObjectId(@SqlType(VARCHAR) Slice value) {
return Slices.wrappedBuffer(
new ObjectId(value.toStringUtf8()).toByteArray())
}
@ScalarOperator(EQUAL)
@SqlType(BOOLEAN)
p.s boolean equal(@SqlType("ObjectId") Slice left,
@SqlType("ObjectId") Slice right) {
return left.equals(right);
}
}