15 shades of fvertica

•Download as PPTX, PDF•

1 like•604 views

Zvika Gutkin

See How Vertica works . Get 15 + Tips on Query performance Loading Performance DML Performance Vertica on AWS

Helping marketers optimize their spend
Across: Channels | Devices
Online + Offline
Named leader in report.

Agenda
✓ What Vertica Is
✓ Vertica for Developers
✓ Architecture
✓ 15 Tips/Info About Vertica

✓1,000,000 concurrent users
✓1,000,000 operations/s
✓Micro seconds read & write latency
✓Complex analytics queries with seconds
latency
✓ACID

MPP-Columnar DBMS
✓ 10x –100x performance of classic RDBMS
✓ Linear Scale
✓ SQL
✓ Commodity Hardware
✓ Built-in fault tolerance

Vertica For Developers
✓ SQL
✓ Analytics
✓ UDF, UDX & External Procedures
✓ Hadoop Integration (Storage Location,
Hadoop connector, Hdfs connector)
✓ Flex Zone

Regular table
Continent Country City Size Size type Population
Asia Israel Tel Aviv 52000 Acres 450000
N.America USA Dallas 385 Sq. miles 1200000
Create Table …..

Rows Vs. Column
Block1
Asia
Israel
Acres
Tel Aviv
Block3
Israel
Acres
Jerusalem
Block4
78000
800000
N.America
Block 5
Usa
Dallas
Sq. miles
385
Block 6
1200000
Asia
Israel
Block 7
Haifa
Acres
63000
Block 8
268000
America
Usa
Block 9
New York
Sq. miles
468
8200000
52000
450000
Asia
Block2
Block1
Asia
Israel
Acres
Tel Aviv
Block3
Israel
Acres
Jerusalem
Block4
78000
800000
N.America
Block 5
Usa
Dallas
Sq. miles
385
Block 6
1200000
Asia
Israel
Block 7
Haifa
Acres
63000
Block 8
268000
America
Usa
Block 9
New York
Sq. miles
468
8200000
52000
450000
Asia
Block2
Block1
Asia
Israel
Acres
Tel Aviv
Block3
Israel
Acres
Jerusalem
Block4
78000
800000
N.America
Block 5
Usa
Dallas
Sq. miles
385
Block 6
1200000
Asia
Israel
Block 7
Haifa
Acres
63000
Block 8
268000
America
Usa
Block 9
New York
Sq. miles
468
8200000
52000
450000
Asia
Block2
Block1
Asia
Israel
Acres
Tel Aviv
Block3
Israel
Acres
Jerusalem
Block4
78000
800000
N.America
Block 5
Usa
Dallas
Sq. miles
385
Block 6
1200000
Asia
Israel
Block 7
Haifa
Acres
63000
Block 8
268000
America
Usa
Block 9
New York
Sq. miles
468
8200000
52000
450000
Asia
Block2

Continent
Asia
Asia
Asia
N.America
N.America
N.America
Country
Israel
Israel
Israel
Usa
Usa
usa
Size Type
Acres
Acres
Acres
Sq miles
Sq. miles
Sq. miles
City size
52000
78000
63000
385
468
8700
City Name
Tel Aviv
Jerusalem
Haifa
Dallas
New York
New Jersey
Population
450000
800000
268000
1200000
8200000
8800000
Continent
Asia,3
N.America,3
RLE Encoding
Country
Israel,3
Usa,3
RLE Encoding
Size Type
Acres,3
Sq. miles,3
RLE Encoding
City size
52000
385
+83
+8315
+62615
+77615
DeltaVal
Encoding
City Name
!@#$@a
$%##!
*&&^
!@#$
LZO Encoding
Population
450000
268000
+532000
+932000
+7932000
+8532000
DeltaVal
Encoding
Rows Vs Column

Query Performance
• Out of the box improvements
• Encoding & Sorting
• Use DBD
• Check query_events system table
• Denormalize
• Hydro

Loading
How Can you load
35 TB / Hour ?
FACEBOOK

Load Performance
• Not Real MPP (Copy)
• Big Bulks (Use Direct)
• Bring your files to your cluster
• Load from Several Nodes
• Hdfs Connector Vs Hadoop Connector Vs
Application Loader

Load Performance
Hadoop
Temp
Tables
Unified
Temp
Table
Target
Table/Partition
Stream
COPY
MOVE_PARTITIONS_TO_TABLE
MOVE_PARTITIONS_TO_TABLE
How many parallel loads ?
How many initiators ?
How big each load ?

Delete/Updates Performance
• Yes we can BUT please avoid
• Looks Very Fast But ... NOT real Delete
• Performance Issues:
– Queries
– Recovery
• Creating new projections (evaluate_delete_performance)
• Locking Entire Table

Vertica On AWS
• Creating clusters in minutes
• Extend “Ephemeral node” solutions for
Elastic scaling
• Nodes Will fail
• Use the same placement group
• Monitor/Test your throughput (I/O,network
& cpu)

✓ Great Database
✓ Even Lebron can’t Do it all
✓ Understand it & your processes
Let’s sum it up…

LivePersonDev is happy to host this meetup with Zvika Gutkin, an Oracle and Vertica expert DBA in LivePerson, and specialist in BI and Big Data. At LivePerson, we handle enormous amounts of data. We use Vertica to analyse this data in real time. In this lecture Zvika will cover the following: 1. Present the architecture of Vertica 2. Compare row store to column store 3. Explain how Vertica achieve Fast query time 4. Show few use cases . 5. Explains what does Liveperson do with Vertica? Why we chose Vertica? 6. Talk about why we Love Vertica and Why we hate it . 7. Is Vertica SQL DB or NoSQL? Is vertica Consistent or Eventually consistent? 8. How Vertica differ from other SQL and noSQL technologies?

Vertica architecture

Zvika Gutkin

In this lecture I will cover the following: 1. Present the architecture of Vertica 2. Compare row store to column store 3. Explain how Vertica achieve Fast query time 4. Show few use cases . 5. Explains what does Liveperson do with Vertica? Why we chose Vertica? 6. Talk about why we Love Vertica and Why we hate it . 7. Is Vertica SQL DB or NoSQL? Is vertica Consistent or Eventually consistent? 8. How Vertica differ from other SQL and noSQL technologies?

Vertica mpp columnar dbms

Zvika Gutkin

Amazon Athena, w/ benchmark against Redshift - Pop-up Loft TLV 2017

Amazon Web Services

Amazon Athena is a new interactive query service that makes it easy to analyze data in Amazon S3, using standard SQL. In this session, we will show you how easy is to start querying your data stored in Amazon S3, with Amazon Athena. First we will use Athena to create the schema for data already in S3. Then, we will demonstrate how you can run interactive queries through the built-in query editor. We will provide best practices and use cases for Athena. Then, we will talk about supported queries, data formats, and strategies to save costs when querying data with Athena.

BDT201 AWS Data Pipeline - AWS re: Invent 2012

Amazon Web Services

Lens at apachecon

amarsri

Data Analytics Week at the San Francisco Loft Data Warehousing with Amazon Redshift Asser Moustafa - Data Warehouse Specialist Solutions Architect, AWSA closer look at the fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. We'll show how to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution. Speakers: Jay Formosa - Solutions Architect, AWS Asser Moustafa - Data Warehouse Specialist Solutions Architect, AWS

Amazon Redshift Masterclass

Amazon Web Services

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. You can start small for just $0.25 per hour with no commitment or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year, less than a tenth of most other data warehousing solutions. See a recording of the webinar based on this presentation here on YouTube: https://youtu.be/GgLKodmL5xE Masterclass series webinars, including on-demand access to all of this years recorded webinars: http://aws.amazon.com/campaigns/emea/masterclass/ Journey Through the Cloud webinar series, including on-demand access to all webinars so far this year: http://aws.amazon.com/campaigns/emea/journey/

(DAT201) Introduction to Amazon Redshift

Amazon Web Services

Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing and scale-out architecture to ensure compute resources grow with your dataset size, and columnar, direct-attached storage to dramatically reduce I/O time. Learn how top online retailer RetailMeNot moved their largest Vertica cluster on Amazon EC2 to Amazon Redshift. See how they gain insights from clickstream, location, merchant, marketing, and operational data across desktop and mobile properties.

Best Practices for Migrating Your Data Warehouse to Amazon Redshift

Amazon Web Services

by Darin Briskman, Technical Evangelist, AWS You can gain substantially more business insights and save costs by migrating your existing data warehouse to Amazon Redshift. This session will cover the key benefits of migrating to Amazon Redshift, migration strategies, and tools and resources that can help you in the process. We’ll learn about AWS Database Migration Service and AWS Schema Migration Tool, which were recently enhanced to import data from six common data warehouse platforms. Level: 200

Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...

Amazon Web Services

Get a look under the hood: Understand how to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. You’ll also hear about how the University of Technology Sydney (UTS) are using Redshift. The University of Technology Sydney will describe how utilizing Amazon Redshift enabled agility in dealing with Data Quality, a capacity to scale when required, and optimizing development processes through rapid provisioning of Data Warehouse environments. Speaker: Ganesh Raja, Solutions Architect, Amazon Web Services with Susan Gibson, Manager, Data and Business Intelligence, UTS Level: 300

Dynamo db

Parag Patil

Best Practices for Migrating your Data Warehouse to Amazon Redshift

Amazon Web Services

You can gain substantially more business insights and save costs by migrating your existing data warehouse to Amazon Redshift. This session will cover the key benefits of migrating to Amazon Redshift, migration strategies, and tools and resources that can help you in the process. We’ll learn about AWS Database Migration Service and AWS Schema Migration Tool, which were recently enhanced to import data from six common data warehouse platforms.

Scalability of Amazon Redshift Data Loading and Query Speed

FlyData Inc.

Data Warehousing with Amazon Redshift

Amazon Web Services

Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. By following a few best practices, you can take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to minimize I/O and deliver high throughput and query performance. This webinar will cover techniques to load data efficiently, design optimal schemas, and use work load management. Learning Objectives: • Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities • Learn how to migrate from existing data warehouses, optimize schemas, and load data efficiently • Learn best practices for managing workload, tuning your queries, and using Amazon Redshift's interleaved sorting features Who Should Attend: • Data Warehouse Developers, Big Data Architects, BI Managers, and Data Engineers

Deep Dive on Amazon DynamoDB

Amazon Web Services

by Edin Zulich, NoSQL Solutions Architect, AWS Explore Amazon DynamoDB capabilities and benefits in detail and learn how to get the most out of your DynamoDB database. We go over best practices for schema design with DynamoDB across multiple use cases, including gaming, IoT, and others. We explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including DynamoDB Accelerator (DAX), DynamoDB Time-to-Live, and more. We also provide lessons learned from operating DynamoDB at scale, including provisioning DynamoDB for IoT. Level: 200

Streaming SQL

Julian Hyde

Streaming is necessary to handle data rates and latency but SQL is unquestionably the lingua franca of data. Where do the two meet? Apache Calcite is extending SQL to include streaming, and the Samza, Storm and Flink are projects are each building it into their engines. In this talk, Julian Hyde describes streaming SQL in detail and shows how you can use streaming SQL in your application. He also describes how Calcite’s planner optimizes queries for throughput and latency. Julian Hyde gave this talk at the first Kafka Summit, San Francisco, 2016/04/26.

AWS (Amazon Redshift) presentationVolodymyr Rovetskiy

Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Web Services

Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service that costs less than $1,000 per terabyte per year—less than a tenth the price of most traditional data warehousing solutions. In this session, you get an overview of Amazon Redshift, including how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. Finally, we announce new features that we've been working on over the past few months.

Discardable In-Memory Materialized Queries With Hadoop

Julian Hyde

What to do with all that memory in a Hadoop cluster? Should we load all of our data into memory to process it? The goal should be to put memory into its right place in the storage hierarchy, alongside disk and solid-state drives (SSD). Data should reside in the right place for how it is being used, and should be organized appropriately for where it resides. This proposed solution requires a new kind of data set called the Discardable, In-Memory, Materialized Query (DIMMQ). In this session we will talk through how we can build on existing Hadoop facilities to deliver three key underlying concepts that enable this approach.

Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...

Spark Summit

What if you could get the simplicity, convenience, interoperability, and storage niceties of an old-fashioned CSV with the speed of a NoSQL database and the storage requirements of a gzipped file? Enter Parquet. At The Weather Company, Parquet files are a quietly awesome and deeply integral part of our Spark-driven analytics workflow. Using Spark + Parquet, we’ve built a blazing fast, storage-efficient, query-efficient data lake and a suite of tools to accompany it. We will give a technical overview of how Parquet works and how recent improvements from Tungsten enable SparkSQL to take advantage of this design to provide fast queries by overcoming two major bottlenecks of distributed analytics: communication costs (IO bound) and data decoding (CPU bound).

Introduction to aws dynamo db

Omid Vahdaty

AWS July Webinar Series: Amazon Redshift Optimizing Performance

Amazon Web Services

Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze your data for a fraction of the cost of traditional data warehouses. By following a few best practices for schema design and cluster design, you can unleash the high performance capabilties of Amazon Redshift. This webinar is a deep dive into performance tuning techniques based on real-world use cases. Learning Objectives: Learn how to get the best performance from your Redshift cluster Design Amazon Redshift clusters based on real world use cases See sample tuning scripts to diagnose and maximize cluster performance Learn about increasing query performance using interleaved sorting

Deep Dive on Amazon Redshift

Amazon Web Services

Take an in-depth look at data warehousing with Amazon Redshift and get answers to your technical questions. We will cover performance tuning techniques that take advantage of Amazon Redshift's columnar technology and massively parallel processing architecture. We will also discuss best practices for migrating from existing data warehouses, optimizing your schema, loading data efficiently, and using work load management and interleaved sorting.

Deep Dive on Amazon DynamoDB

Amazon Web Services

VerticaSamchu Li

DBSamchu Li

What's hot

8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...

LDBC council

Data Warehousing with Amazon Redshift: Data Analytics Week SF

Amazon Web Services

Amazon Redshift Masterclass

Amazon Web Services

(DAT201) Introduction to Amazon Redshift

Amazon Web Services

Best Practices for Migrating Your Data Warehouse to Amazon Redshift

Amazon Web Services

Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...

Amazon Web Services

Dynamo db

Parag Patil

Best Practices for Migrating your Data Warehouse to Amazon Redshift

Amazon Web Services

Scalability of Amazon Redshift Data Loading and Query Speed

FlyData Inc.

Data Warehousing with Amazon Redshift

Amazon Web Services

Deep Dive on Amazon DynamoDB

Amazon Web Services

Streaming SQL

Julian Hyde

AWS (Amazon Redshift) presentationVolodymyr Rovetskiy

Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Amazon Web Services

Discardable In-Memory Materialized Queries With Hadoop

Julian Hyde

Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...

Spark Summit

Introduction to aws dynamo db

Omid Vahdaty

AWS July Webinar Series: Amazon Redshift Optimizing Performance

Amazon Web Services

Deep Dive on Amazon Redshift

Amazon Web Services

Deep Dive on Amazon DynamoDB

Amazon Web Services

What's hot (20)

8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...

Data Warehousing with Amazon Redshift: Data Analytics Week SF

Amazon Redshift Masterclass

(DAT201) Introduction to Amazon Redshift

Best Practices for Migrating Your Data Warehouse to Amazon Redshift

Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...

Dynamo db

Best Practices for Migrating your Data Warehouse to Amazon Redshift

Scalability of Amazon Redshift Data Loading and Query Speed

Data Warehousing with Amazon Redshift

Deep Dive on Amazon DynamoDB

Streaming SQL

AWS (Amazon Redshift) presentation

Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013

Discardable In-Memory Materialized Queries With Hadoop

Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...

Introduction to aws dynamo db

AWS July Webinar Series: Amazon Redshift Optimizing Performance

Deep Dive on Amazon Redshift

Deep Dive on Amazon DynamoDB

Viewers also liked

VerticaSamchu Li

DBSamchu Li

Hive_pSamchu Li

A short introduction to Vertica

Tommi Siivola

Vertica

Samchu Li

Optimize Your Vertica Data Management Infrastructure

Imanis Data

Viewers also liked (6)

Vertica

Hive_p

A short introduction to Vertica

Vertica

Optimize Your Vertica Data Management Infrastructure

Similar to 15 shades of fvertica

Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things

Amazon Web Services

Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line. Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together. Reasons to attend: - Learn how AWS can help you process and make better use of your data with meaningful insights. - Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions. - Learn about real time data processing with Amazon Kinesis.

Leveraging Amazon Redshift for Your Data Warehouse

Amazon Web Services

In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. You¹ll also hear from Dan Wagner, CEO at Civis Analytics, as he discusses why the Civis data science platform was designed on top of Amazon Redshift and the AWS platform in order to help smart organizations bridge their data silos, build 360 degree view of their customer relationships, and identify opportunities for driving their companies forward by leveraging enormous datasets, the power of analytics, and economies of scale on the AWS platform.

Getting Started with Amazon Redshift

Amazon Web Services

Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.

Leveraging Amazon Redshift for your Data Warehouse

Amazon Web Services

In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.

Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services

ADL/U-SQL Introduction (SQLBits 2016)

Michael Rys

AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics

Launching Your First Big Data Project on AWS

Amazon Web Services

Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line. Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together. Reasons to attend: Learn how AWS can help you process and make better use of your data with meaningful insights. Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions. Learn about real time data processing with Amazon Kinesis.

Introduction to ClustrixDB

I Goo Lee

Getting Started with Amazon Redshift

Amazon Web Services

Database Virtualization: The Next Wave of Big Data

exponential-inc

Building Analytic Apps for SaaS: “Analytics as a Service”

Amazon Web Services

TIBCO Jaspersoft® for AWS is a business intelligence suite that helps you deliver stunning interactive reports and dashboards inside your app that make it easy for your customers to get answers. Purpose-built for AWS, our reporting and analytics server quickly and easily connects to Amazon Relational Database Service (RDS), Amazon Redshift, and Amazon EMR. It includes ad-hoc reporting, dashboards, data analysis, data visualization, and data blending. In less than 10 minutes, you can be analyzing and reporting on your data. You get a full Cloud BI server starting at less than $1/hour, with no user or data limits and no additional fees. This webinar deck shows how embeddable analytics with TIBCO Jaspersoft for AWS gives you the power to create the experience your end users demand and how to scale and manage that experience across your customer base with AWS.

re:Invent re:Cap - Big Data & IoT at Any Scale

Adrian Hornsby

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...

Precisely

Getting Started with Amazon Redshift

Amazon Web Services

Best Practices for Migrating your Data Warehouse to Amazon Redshift

Amazon Web Services

Building Your Data Warehouse with Amazon Redshift

Amazon Web Services

In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.

Redshift overview

Amazon Web Services LATAM

Amazon RedShift - Ianni Vamvadelis

huguk

Azure Cosmos DB - Technical Deep Dive

Andre Essing

NoSQL Strikes Back (An introduction to the dark side of your data) A long time ago in a database far, far away... SQL was the only option to save vast amounts of application data for a long period of time. There were always some rebellion activities, to overcome the SQL Empire, which brought a new hope, but all other ways of storing data were never more than a phantom menace. Now Cosmos DB awakens and is ready for the revenge of the NoSQL. During this talk, we will have a look at what Azure Cosmos DB is, what you can achieve with its possibilities and how to use it in a galactic environment of data and applications. Join me and find your way to the right solution for your application. May the data be with you!

Similar to 15 shades of fvertica (20)

Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things

Leveraging Amazon Redshift for Your Data Warehouse

Getting Started with Amazon Redshift

Leveraging Amazon Redshift for your Data Warehouse

Databases in the Cloud - DevDay Austin 2017 Day 2

ADL/U-SQL Introduction (SQLBits 2016)

AquaQ Analytics Kx Event - Data Direct Networks Presentation

Launching Your First Big Data Project on AWS

Introduction to ClustrixDB

Getting Started with Amazon Redshift

Database Virtualization: The Next Wave of Big Data

Building Analytic Apps for SaaS: “Analytics as a Service”

re:Invent re:Cap - Big Data & IoT at Any Scale

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...

Getting Started with Amazon Redshift

Best Practices for Migrating your Data Warehouse to Amazon Redshift

Building Your Data Warehouse with Amazon Redshift

Redshift overview

Amazon RedShift - Ianni Vamvadelis

Azure Cosmos DB - Technical Deep Dive

Recently uploaded

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Abida Shariff

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Search and Society: Reimagining Information Access for Radical Futures

Bhaskar Mitra

The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

ODC, Data Fabric and Architecture User Group

CatarinaPereira64715

Recently uploaded (20)

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Leading Change strategies and insights for effective change management pdf 1.pdf

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

The Art of the Pitch: WordPress Relationships and Sales

When stars align: studies in data quality, knowledge graphs, and machine lear...

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Search and Society: Reimagining Information Access for Radical Futures

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Designing Great Products: The Power of Design and Leadership by Chief Designe...

GraphRAG is All You need? LLM & Knowledge Graph

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

Accelerate your Kubernetes clusters with Varnish Caching

How world-class product teams are winning in the AI era by CEO and Founder, P...

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

Epistemic Interaction - tuning interfaces to provide information for AI support

ODC, Data Fabric and Architecture User Group

15 shades of fvertica

1. 15 Shades of Vertica Zvika Gutkin Big data DBA Zvika.gutkin@convertro.com

2. Helping marketers optimize their spend Across: Channels | Devices Online + Offline Named leader in report.

3. Agenda ✓ What Vertica Is ✓ Vertica for Developers ✓ Architecture ✓ 15 Tips/Info About Vertica

4. ✓1,000,000 concurrent users ✓1,000,000 operations/s ✓Micro seconds read & write latency ✓Complex analytics queries with seconds latency ✓ACID

5. MPP-Columnar DBMS ✓ 10x –100x performance of classic RDBMS ✓ Linear Scale ✓ SQL ✓ Commodity Hardware ✓ Built-in fault tolerance

6. Vertica For Developers ✓ SQL ✓ Analytics ✓ UDF, UDX & External Procedures ✓ Hadoop Integration (Storage Location, Hadoop connector, Hdfs connector) ✓ Flex Zone

7. Column Store Architecture

8. Regular table Continent Country City Size Size type Population Asia Israel Tel Aviv 52000 Acres 450000 N.America USA Dallas 385 Sq. miles 1200000 Create Table …..

9. Rows Vs. Column Block1 Asia Israel Acres Tel Aviv Block3 Israel Acres Jerusalem Block4 78000 800000 N.America Block 5 Usa Dallas Sq. miles 385 Block 6 1200000 Asia Israel Block 7 Haifa Acres 63000 Block 8 268000 America Usa Block 9 New York Sq. miles 468 8200000 52000 450000 Asia Block2 Block1 Asia Israel Acres Tel Aviv Block3 Israel Acres Jerusalem Block4 78000 800000 N.America Block 5 Usa Dallas Sq. miles 385 Block 6 1200000 Asia Israel Block 7 Haifa Acres 63000 Block 8 268000 America Usa Block 9 New York Sq. miles 468 8200000 52000 450000 Asia Block2 Block1 Asia Israel Acres Tel Aviv Block3 Israel Acres Jerusalem Block4 78000 800000 N.America Block 5 Usa Dallas Sq. miles 385 Block 6 1200000 Asia Israel Block 7 Haifa Acres 63000 Block 8 268000 America Usa Block 9 New York Sq. miles 468 8200000 52000 450000 Asia Block2 Block1 Asia Israel Acres Tel Aviv Block3 Israel Acres Jerusalem Block4 78000 800000 N.America Block 5 Usa Dallas Sq. miles 385 Block 6 1200000 Asia Israel Block 7 Haifa Acres 63000 Block 8 268000 America Usa Block 9 New York Sq. miles 468 8200000 52000 450000 Asia Block2

10. Continent Asia Asia Asia N.America N.America N.America Country Israel Israel Israel Usa Usa usa Size Type Acres Acres Acres Sq miles Sq. miles Sq. miles City size 52000 78000 63000 385 468 8700 City Name Tel Aviv Jerusalem Haifa Dallas New York New Jersey Population 450000 800000 268000 1200000 8200000 8800000 Continent Asia,3 N.America,3 RLE Encoding Country Israel,3 Usa,3 RLE Encoding Size Type Acres,3 Sq. miles,3 RLE Encoding City size 52000 385 +83 +8315 +62615 +77615 DeltaVal Encoding City Name !@#$@a $%##! *&&^ !@#$ LZO Encoding Population 450000 268000 +532000 +932000 +7932000 +8532000 DeltaVal Encoding Rows Vs Column

11. Query Performance • Out of the box improvements • Encoding & Sorting • Use DBD • Check query_events system table • Denormalize • Hydro

12. Loading How Can you load 35 TB / Hour ? FACEBOOK

13. Load Performance • Not Real MPP (Copy) • Big Bulks (Use Direct) • Bring your files to your cluster • Load from Several Nodes • Hdfs Connector Vs Hadoop Connector Vs Application Loader

14. Load Performance Hadoop Temp Tables Unified Temp Table Target Table/Partition Stream COPY MOVE_PARTITIONS_TO_TABLE MOVE_PARTITIONS_TO_TABLE How many parallel loads ? How many initiators ? How big each load ?

15. Delete/Updates Performance • Yes we can BUT please avoid • Looks Very Fast But ... NOT real Delete • Performance Issues: – Queries – Recovery • Creating new projections (evaluate_delete_performance) • Locking Entire Table

16. Vertica On AWS • Creating clusters in minutes • Extend “Ephemeral node” solutions for Elastic scaling • Nodes Will fail • Use the same placement group • Monitor/Test your throughput (I/O,network & cpu)

17. ✓ Great Database ✓ Even Lebron can’t Do it all ✓ Understand it & your processes Let’s sum it up…

18. Thank You

15 shades of fvertica

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to 15 shades of fvertica

Similar to 15 shades of fvertica (20)

Recently uploaded

Recently uploaded (20)

15 shades of fvertica