Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing and scale-out architecture to ensure compute resources grow with your dataset size, and columnar, direct-attached storage to dramatically reduce I/O time. Learn how top online retailer RetailMeNot moved their largest Vertica cluster on Amazon EC2 to Amazon Redshift. See how they gain insights from clickstream, location, merchant, marketing, and operational data across desktop and mobile properties.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing and scale-out architecture to ensure compute resources grow with your dataset size, and columnar, direct-attached storage to dramatically reduce I/O time. Learn how top online retailer RetailMeNot moved their largest Vertica cluster on Amazon EC2 to Amazon Redshift. See how they gain insights from clickstream, location, merchant, marketing, and operational data across desktop and mobile properties.
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
Learn how Boingo Wireless and online media provider Edmunds gained substantial business insights and saved money and time by migrating to Amazon Redshift. Get an inside look into how they accomplished their migration from on-premises solutions. Learn how they tuned their schema and queries to take full advantage of the columnar MPP architecture in Amazon Redshift, how they leveraged third party solutions, and how they met their business intelligence needs in record time.
In this session, you get an overview of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service. We'll cover how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also discuss new features, architecture best practices, and share how customers are using Amazon Redshift for their Big Data workloads.
In this session, we will introduce Amazon RedShift, a new petabyte scale data warehouse service. We'll walk through the basics of the Redshift architecture, launching a new cluster and run SQL queries across a large scale, public dataset. After demonstrating how easy it is to get started with RedShift, we will show how to visualize and query large scale datasets, running queries, reports, and analytics against millions of rows of records in just a few seconds.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
אני עובד בהייטק כבר 20 שנה. ההתמחות שלי היא תיעוד טכני, אך הייתי גם מנהל מוצר ומנהל פרויקטים; ניהלתי מפתחים, אנשי QA ומהנדסי מערכות. אך איני מהנדס, ומשום כך מעולם לא סמכתי על עצמי שאוכל לראיין ולבחור אדם לתפקיד שקשור בהנדסה גרידא – פיתוח תוכנה, עיצוב אלגוריתם וכדומה – ללא המעורבות של בעל ניסיון בתחומים הספציפיים הללו. אני פשוט לא יודע מספיק (ואני גם לא יודע איך לפתור את חידוני השחמט).
A quick tour in 16 slides of Amazon's Redshift clustered, massively parallel database.
Find out what differentiates it from the other database products Amazon has, including SimpleDB, DynamoDB and RDS (MySQL, SQL Server and Oracle).
Learn how it stores data on disk in a columnar format and how this relates to performance and interesting compression techniques.
Contrast the difference between Redshift and a MySQL instance and discover how the clustered architecture may help to dramatically reduce query time.
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
Learn how Boingo Wireless and online media provider Edmunds gained substantial business insights and saved money and time by migrating to Amazon Redshift. Get an inside look into how they accomplished their migration from on-premises solutions. Learn how they tuned their schema and queries to take full advantage of the columnar MPP architecture in Amazon Redshift, how they leveraged third party solutions, and how they met their business intelligence needs in record time.
In this session, you get an overview of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service. We'll cover how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also discuss new features, architecture best practices, and share how customers are using Amazon Redshift for their Big Data workloads.
In this session, we will introduce Amazon RedShift, a new petabyte scale data warehouse service. We'll walk through the basics of the Redshift architecture, launching a new cluster and run SQL queries across a large scale, public dataset. After demonstrating how easy it is to get started with RedShift, we will show how to visualize and query large scale datasets, running queries, reports, and analytics against millions of rows of records in just a few seconds.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
אני עובד בהייטק כבר 20 שנה. ההתמחות שלי היא תיעוד טכני, אך הייתי גם מנהל מוצר ומנהל פרויקטים; ניהלתי מפתחים, אנשי QA ומהנדסי מערכות. אך איני מהנדס, ומשום כך מעולם לא סמכתי על עצמי שאוכל לראיין ולבחור אדם לתפקיד שקשור בהנדסה גרידא – פיתוח תוכנה, עיצוב אלגוריתם וכדומה – ללא המעורבות של בעל ניסיון בתחומים הספציפיים הללו. אני פשוט לא יודע מספיק (ואני גם לא יודע איך לפתור את חידוני השחמט).
A quick tour in 16 slides of Amazon's Redshift clustered, massively parallel database.
Find out what differentiates it from the other database products Amazon has, including SimpleDB, DynamoDB and RDS (MySQL, SQL Server and Oracle).
Learn how it stores data on disk in a columnar format and how this relates to performance and interesting compression techniques.
Contrast the difference between Redshift and a MySQL instance and discover how the clustered architecture may help to dramatically reduce query time.
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all of your data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.
Learn best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your data warehouse performance.
by Dhanraj Pondicherry, Sr. Solutions Architecture Manager, AWS
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management. Level: 300
Nesta segunda parte do tema Redshift, mostramos o case da Movile, líder em mobile commerce com 50 milhões de usuários, e analisamos tópicos avançados como compressão, macros SQL embutidas e índices multidimensionais para grandes bases de dados.
This talk at the Percona Live MySQL Conference and Expo describes open source column stores and compares their capabilities, correctness and performance.
This presentation was used in a Redis talk that took place in ALT.NET Melbourne at February 25th, 2014. It is aimed for .Net developers, though almost all of the slides solely discuss Redis as the data store server, without any relation to client libraries in general and .Net in particular.
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. By following a few best practices, you can take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to minimize I/O and deliver high throughput and query performance. This webinar will cover techniques to load data efficiently, design optimal schemas, and tune query and database performance.
Learning Objectives:
Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
Learn how to migrate from existing data warehouses, optimize schemas, and load data efficiently
Learn best practices for managing workload, tuning your queries, and using Amazon Redshift's interleaved sorting features
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
Presentations from the Cloudera Impala meetup on Aug 20 2013:
- Nong Li on Parquet+Impala and UDF support
- Henry Robinson on performance tuning for Impala
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use workload management.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
10. Zone Maps
Read
Min: 5
Max 45
Read
Min: 9
Max: 32
Min: 30
Max: 42
Read
Min: 22
Max : 80
Read
Min: 18
Max: 50
10
Min: 1
Max 10
Read
Min: 11
Max: 25
Min: 26
Max: 40
Min: 41
Max : 55
Min: 56
Max: 95
Select count(*) from customers where age = 24
Unsorted Sorted
11. Sort Key Options
11
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM
12. Sort Key Options
12
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM
Compound Sort Key • Table is sorted by 1st column , then 2nd column etc.
• Queries that use 1st column as primary filter, then other columnss
• Can speed up joins and group bys
• Slower to VACUUM
13. Sort Key Options
13
Single Column Sort Key • Table is sorted by 1 column
• Queries that use 1st column (i.e. date) as primary filter
• Can speed up joins and group-bys
• Quickest to VACUUM
Compound Sort Key • Table is sorted by 1st column , then 2nd column etc.
• Queries that use 1st column as primary filter, then other columnss
• Can speed up joins and group bys
• Slower to VACUUM
Interleaved Sort Key • Equal weight is given to each column
• Queries that use different columns in filter
• Queries get fasterthe more columns used in the filter (up to 8)
• Slowest to VACUUM
• More effective with large tables (> 100M+ rows)
15. Distribution Style Options
15
All
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
All data on every node
Key
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Same key to same location
Node 1
Slice
1
Slice
2
Node 2
Slice
3
Slice
4
Even
Round robin distribution
• Tables with no joins or
group-bys
• Small dimension tables
(<1000 rows)
• Medium dimension
tables (1K – 2M)
• Large fact tables
• Large dimension tables
17. How are they different?
17
u Primary and foreign key constraints are not enforced by Redshift
u Indexes are not created (only sort keys exist for indexing)
u They do help with query plan optimization though
19. Redshift Compression
19
u Each column can be compressed with most appropriatealgorithm for content
u Many algorithms supported
u Raw encoding, Byte-dictionary, Delta encoding, Mostly encoding, Runlength encoding, Text encoding,
LZO encoding
u Average of 2-4x compression rates are common
u Can cut query time as much as 50%
u Use analyze compression to get recommendations
23. Vacuum
23
u 4 modes:
u FULL – Reclaims space and re-sorts
u DELETE ONLY – Reclaims space but does not re-sort
u SORT ONLY – Re-sorts but does not reclaim space
u REINDEX – Used for INTERLEAVED sort keys. Re-Analyzes sort keys and then runs FULL VACUUM
u Vacuum is I/O intensive and can take time to run
u Run regularly to minimize impact
24. Analyze
24
u Updates statistics used by the query planner
u Run regularly to keep statistics up to date
u Especially after large data loads
26. Workload Management
26
u Workload management is about creating queues for different workloads
User Group A
Short-running queueLong-running queue
Short
Query Group
Long
Query Group