Сергей Ковалёв: Solutions Architect, Big Data/High-performance Computation Expert в Altoros; г.Минск
Доклад: «Practical Steps to Improve Apache Hive Performance»
This deck presents the best practices of using Apache Hive with good performance. It covers getting data into Hive, using ORC file format, getting good layout into partitions and files based on query patterns, execution using Tez and YARN queues, memory configuration, and debugging common query performance issues. It also describes Hive Bucketing and reading Hive Explain query plans.
This document provides an overview of Hive and its performance capabilities. It discusses Hive's SQL interface for querying large datasets stored in Hadoop, its architecture which compiles SQL queries into MapReduce jobs, and its support for SQL semantics and datatypes. The document also covers techniques for optimizing Hive performance, including data abstractions like partitions, buckets and skews. It describes different join strategies in Hive like shuffle joins, broadcast joins and sort-merge bucket joins and how they are implemented in MapReduce. The overall presentation aims to explain how Hive provides scalable SQL processing for big data.
The document discusses various techniques for optimizing data organization and performance in Hive, including:
- Partitioning data by meaningful columns like customer ID or VIN to improve lookup performance.
- Using the right number and size of buckets to avoid performance issues from too many small files or skewed data distribution.
- Denormalizing data and optimizing JOIN queries through techniques like broadcast joins.
- Storing data in its natural types like numbers instead of strings to enable predicate pushdown and better performance.
- Using temporary tables and in-memory storage to optimize queries involving data reorganization or distinct slices.
This document provides a summary of improvements made to Hive's performance through the use of Apache Tez and other optimizations. Some key points include:
- Hive was improved to use Apache Tez as its execution engine instead of MapReduce, reducing latency for interactive queries and improving throughput for batch queries.
- Statistics collection was optimized to gather column-level statistics from ORC file footers, speeding up statistics gathering.
- The cost-based optimizer Optiq was added to Hive, allowing it to choose better execution plans.
- Vectorized query processing, broadcast joins, dynamic partitioning, and other optimizations improved individual query performance by over 100x in some cases.
This document summarizes techniques for optimizing Hive queries, including recommendations around data layout, format, joins, and debugging. It discusses partitioning, bucketing, sort order, normalization, text format, sequence files, RCFiles, ORC format, compression, shuffle joins, map joins, sort merge bucket joins, count distinct queries, using explain plans, and dealing with skew.
This presentation describes how to efficiently load data into Hive. I cover partitioning, predicate pushdown, ORC file optimization and different loading schemes
If you’re already a SQL user then working with Hadoop may be a little easier than you think, thanks to Apache Hive. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL (HQL).
This cheat sheet covers:
-- Query
-- Metadata
-- SQL Compatibility
-- Command Line
-- Hive Shell
This deck presents the best practices of using Apache Hive with good performance. It covers getting data into Hive, using ORC file format, getting good layout into partitions and files based on query patterns, execution using Tez and YARN queues, memory configuration, and debugging common query performance issues. It also describes Hive Bucketing and reading Hive Explain query plans.
This document provides an overview of Hive and its performance capabilities. It discusses Hive's SQL interface for querying large datasets stored in Hadoop, its architecture which compiles SQL queries into MapReduce jobs, and its support for SQL semantics and datatypes. The document also covers techniques for optimizing Hive performance, including data abstractions like partitions, buckets and skews. It describes different join strategies in Hive like shuffle joins, broadcast joins and sort-merge bucket joins and how they are implemented in MapReduce. The overall presentation aims to explain how Hive provides scalable SQL processing for big data.
The document discusses various techniques for optimizing data organization and performance in Hive, including:
- Partitioning data by meaningful columns like customer ID or VIN to improve lookup performance.
- Using the right number and size of buckets to avoid performance issues from too many small files or skewed data distribution.
- Denormalizing data and optimizing JOIN queries through techniques like broadcast joins.
- Storing data in its natural types like numbers instead of strings to enable predicate pushdown and better performance.
- Using temporary tables and in-memory storage to optimize queries involving data reorganization or distinct slices.
This document provides a summary of improvements made to Hive's performance through the use of Apache Tez and other optimizations. Some key points include:
- Hive was improved to use Apache Tez as its execution engine instead of MapReduce, reducing latency for interactive queries and improving throughput for batch queries.
- Statistics collection was optimized to gather column-level statistics from ORC file footers, speeding up statistics gathering.
- The cost-based optimizer Optiq was added to Hive, allowing it to choose better execution plans.
- Vectorized query processing, broadcast joins, dynamic partitioning, and other optimizations improved individual query performance by over 100x in some cases.
This document summarizes techniques for optimizing Hive queries, including recommendations around data layout, format, joins, and debugging. It discusses partitioning, bucketing, sort order, normalization, text format, sequence files, RCFiles, ORC format, compression, shuffle joins, map joins, sort merge bucket joins, count distinct queries, using explain plans, and dealing with skew.
This presentation describes how to efficiently load data into Hive. I cover partitioning, predicate pushdown, ORC file optimization and different loading schemes
If you’re already a SQL user then working with Hadoop may be a little easier than you think, thanks to Apache Hive. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL (HQL).
This cheat sheet covers:
-- Query
-- Metadata
-- SQL Compatibility
-- Command Line
-- Hive Shell
Building a Video Encoding Pipeline at The New York TimesFlávio Ribeiro
These slides were presented on the Streaming Media West conference in 2016. This talk is also a reference for the blog post "Using Microservices to Encode and Publish Videos at The New York Times" at The New York Times Open blog.
- Streaming Media West 2016: http://streamingmedia.com/Conferences/West2016/
- Open Blog: http://open.blogs.nytimes.com/2016/11/01/using-microservices-to-encode-and-publish-videos-at-the-new-york-times/
Sbt + giter8 is a document about using sbt and giter8 for project setup and management. It provides step-by-step instructions on installing sbt and giter8, creating a new project using giter8 templates, adding log4j as a dependency using sbt, and writing and running a sample Scala application that logs output using log4j. The document demonstrates how sbt handles dependency resolution and running applications.
These slides were presented at the Streaming Media West conference in 2016. This talk is also a reference to the blog post "Using Microservices to Encode and Publish Videos at The New York Times" at The New York Times Open blog.
- Streaming Media West 2016: http://streamingmedia.com/Conferences/West2016/
- Open Blog:
http://open.blogs.nytimes.com/2016/11/01/using-microservices-to-encode-and-publish-videos-at-the-new-york-times/
This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!
Building Killr Applications with DataStax EnterpriseDataStax
This document discusses building applications with DataStax Enterprise (DSE) that allow searching of Cassandra data using DSE Search. It describes a video catalog application that tags videos and makes the tags searchable in Cassandra. It then explains how DSE Search can be used to enable searching of the video data by keywords in titles and descriptions without adding other infrastructure. It notes that configuring DSE Search is simpler than traditional Solr setups, allowing basic search with a single command. Improvements to DSE Search in the latest version are also highlighted.
This document discusses building applications with DataStax Enterprise (DSE) using the Killr video catalog application as an example. It shows how to make video data searchable by adding tags to videos and storing them in a Cassandra table with the tag as the primary key. It then demonstrates how to use DSE Search to enable searching on video titles, descriptions, and other fields without adding other infrastructure components. The document highlights improvements to DSE Search in the upcoming 5.1 release, including upgrading Solr and allowing core management via CQL.
FileWrite.javaFileWrite.java To change this license header.docxssuser454af01
FileWrite.javaFileWrite.java/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package filewrite;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
/**
* @description This program will write text to a file and save the file in the
* project's root directory.
* @author Eric
*/
publicclassFileWrite{
/**
* @param args the command line arguments
*/
publicstaticvoid main(String[] args){
// declaring variables of text and initializing the buffered writer
String txt ="Hello World.";
BufferedWriter writer =null;
// write the text variable using the bufferedwriter to testing.txt
try{
writer =newBufferedWriter(newFileWriter("testing.txt"));
writer.write(txt);
}
// print error message if there is one
catch(IOException io){
System.out.println("File IO Exception"+ io.getMessage());
}
//close the file
finally{
try{
if(writer !=null){
writer.close();
}
}
//print error message if there is one
catch(IOException io){
System.out.println("Issue closing the File."+ io.getMessage());
}
}
}
}
JavaMail.javaJavaMail.java/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package javamail;
import java.util.Properties;
import javax.mail.Message;
import javax.mail.MessagingException;
import javax.mail.PasswordAuthentication;
import javax.mail.Session;
import javax.mail.Transport;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeMessage;
/**
* @description This program uses Java to send emails over the SSL protocol.
* @author Eric
*/
publicclassJavaMail{
/**
* @param args the command line arguments
*/
publicstaticvoid main(String[] args){
Properties props =newProperties();
props.put("mail.smtp.host","smtp.gmail.com");
props.put("mail.smtp.socketFactory.port","465");
props.put("mail.smtp.socketFactory.class",
"javax.net.ssl.SSLSocketFactory");
props.put("mail.smtp.auth","true");
props.put("mail.smtp.port","465");
Session session =Session.getDefaultInstance(props,
new javax.mail.Authenticator(){
protectedPasswordAuthentication getPasswordAuthentication(){
returnnewPasswordAuthentication("username","password");
}
});
try{
Message message =newMimeMessage(session);
message.setFrom(newInternetAddress("[email protected]"));
message.setRecipients(Message.RecipientType.TO,
InternetAddress.parse("[email protected]"));
message.setSubject("Testing Subject");
message.setText("Dear Mail Crawler,"+
"\n\n No spam to my email, please!");
Transport.send(message);
System.out.println("Done");
}catch(MessagingException e){
thrownewRuntimeException(e);
}
}
}
loginApp.javaloginApp.java ...
This document outlines how to install and use YOLO (You Only Look Once) object detection with Darknet and YOLO3-4-Py. It describes downloading Darknet and compiling it with GPU support. It also explains how to train YOLO on the Pascal VOC and COCO datasets, and test models on images and video streams. Finally, it provides instructions for installing the YOLO3-4-Py Python wrapper to run YOLO models from Python.
Antonios Chatzipavlis presented on SQL Server backup and restore. The presentation covered database architecture basics including data files, transaction log files, and the buffer cache. It also discussed backup types like full, differential, transaction log, copy only and partial backups. Backup strategies and restore processes were explained, including restoring to a point in time and restoring system databases. The internals of how SQL Server performs backups using buffers and I/O threads was also summarized.
The document discusses how applications are built for the Boxee platform. It explains the structure of applications including descriptor files, skin files, and controls. Descriptor files provide metadata and configuration details. Skin files define the user interface using controls like images, labels, buttons, and panels. Controls have attributes like position, size, labels, and events. The document provides references for learning more about building Boxee applications.
Scale Your Data Tier With Windows Server App FabricChris Dufour
The distributed in-memory caching capabilities of Windows Server AppFabric will change how you think about scaling your Microsoft .NET-connected applications. Come learn how the distributed nature of the AppFabric cache allows large amounts of data to be stored in-memory for extremely fast access, how AppFabric's integration with Microsoft ASP.NET makes it easy to add low-latency data caching across the web farm, and discover the unique high availability features of AppFabric which will bring new degrees of scale and resilience to your data tier and your web applications.
This is module 11 in the EDI Data Publishing training course. In this module, you will learn the procedure to upload a data package to the EDI Repository.
This tutorial is designed for anyone who needs to work with data stored in HDF5 files. The tutorial will cover functionality and useful features of the HDF5 utilities h5dump, h5diff, h5repack, h5stat, h5copy, h5check and h5repart. We will also introduce a prototype of the new h52jpeg conversion tool and recently released h5perf_serial tool used for performance studies. We will briefly introduce HDFView. Details of the HDFView and HDF-Java will be discussed in a separate tutorial.
TechDays 2010 Portugal - Scaling your data tier with app fabric 16x9Nuno Godinho
This document discusses using Windows Server AppFabric caching to scale data layers. AppFabric caching provides a distributed, in-memory cache that can span machines and processes. It addresses issues like limited cache memory on individual servers. The document outlines how AppFabric caching works, how to install and configure it, and how to access the cache through the API. It also describes features like data distribution, eviction policies, and change notifications that allow the cache to efficiently scale to large workloads and data sets.
The Rocket Optimizer log contains information on video transcoding statistics and video server performance. The VideoTranscoderStats section provides details on optimization such as session ID, video duration, processing time and file sizes. The VideoServerStats section describes metrics like requests served, bandwidth usage and error rates to evaluate video delivery.
This a really short and compact introduction to CMake mechanisum and common variables used. Showed in a simple groupe meeting of the REVES team of the INRIA Sophia Antipolis (France) to sudents/PhD.
This document provides a Java coding standard for code structure, documentation, naming conventions, and recommendations. It outlines how to structure packages, program files, classes, methods, and code layout. It provides conventions for naming packages, files, classes, variables, methods and other identifiers. It gives recommendations for classes, exception handling, variables, methods, technical points and applying common sense. Code examples are provided to demonstrate the conventions.
You've made a good career developing applications using a relational database. You know learning how to be a Cassandra developer is going to be a great skill to add. Now it's time to bridge those two things into reality. I was in your shoes and I can help. How do you work without ACID transactions? The data model looks similar but is so different! What are some of the bad things I should avoid? What are some of the traps I can fall into moving from a relational database? I hear these questions all the time. Let's spend some time to walk through each one and get you on track. Before you know it, you'll be going crazy on your next Cassandra based application!
About the Speaker
Patrick McFadin Chief Evangelist, DataStax
Patrick McFadin is one of the leading experts of Apache Cassandra and data modeling techniques. As the Chief Evangelist for Apache Cassandra and consultant for DataStax, he has helped build some of the largest and exciting deployments in production. Previous to DataStax, he was Chief Architect at Hobsons and an Oracle DBA/Developer for over 15 years.
IP3build.xml Builds, tests, and runs the project IP3..docxchristiandean12115
This document describes a Java project that builds, tests, and runs a sandwich order application. It includes the source code for the main IP3 class and Sub class. The IP3 class runs the application, which prompts the user to enter order details like name, address, beverage, bread type, sandwich type and size. It calculates the total cost and outputs the order details. The Sub class defines the object used to store each order details.
This document provides a tutorial on Verilog HDL (Hardware Description Language). It discusses that HDLs like Verilog and VHDL are used to describe hardware using code. Verilog allows designers to describe designs at different levels of abstraction. Digital systems are highly complex, and Verilog provides a software platform for designers to express their designs with behavioral constructs. A Verilog program can be converted to a description used to manufacture chips like VLSI. The document then covers various Verilog topics like modules, ports, data types, always blocks, structural modeling, dataflow modeling, and behavioral modeling.
Building a Video Encoding Pipeline at The New York TimesFlávio Ribeiro
These slides were presented on the Streaming Media West conference in 2016. This talk is also a reference for the blog post "Using Microservices to Encode and Publish Videos at The New York Times" at The New York Times Open blog.
- Streaming Media West 2016: http://streamingmedia.com/Conferences/West2016/
- Open Blog: http://open.blogs.nytimes.com/2016/11/01/using-microservices-to-encode-and-publish-videos-at-the-new-york-times/
Sbt + giter8 is a document about using sbt and giter8 for project setup and management. It provides step-by-step instructions on installing sbt and giter8, creating a new project using giter8 templates, adding log4j as a dependency using sbt, and writing and running a sample Scala application that logs output using log4j. The document demonstrates how sbt handles dependency resolution and running applications.
These slides were presented at the Streaming Media West conference in 2016. This talk is also a reference to the blog post "Using Microservices to Encode and Publish Videos at The New York Times" at The New York Times Open blog.
- Streaming Media West 2016: http://streamingmedia.com/Conferences/West2016/
- Open Blog:
http://open.blogs.nytimes.com/2016/11/01/using-microservices-to-encode-and-publish-videos-at-the-new-york-times/
This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!
Building Killr Applications with DataStax EnterpriseDataStax
This document discusses building applications with DataStax Enterprise (DSE) that allow searching of Cassandra data using DSE Search. It describes a video catalog application that tags videos and makes the tags searchable in Cassandra. It then explains how DSE Search can be used to enable searching of the video data by keywords in titles and descriptions without adding other infrastructure. It notes that configuring DSE Search is simpler than traditional Solr setups, allowing basic search with a single command. Improvements to DSE Search in the latest version are also highlighted.
This document discusses building applications with DataStax Enterprise (DSE) using the Killr video catalog application as an example. It shows how to make video data searchable by adding tags to videos and storing them in a Cassandra table with the tag as the primary key. It then demonstrates how to use DSE Search to enable searching on video titles, descriptions, and other fields without adding other infrastructure components. The document highlights improvements to DSE Search in the upcoming 5.1 release, including upgrading Solr and allowing core management via CQL.
FileWrite.javaFileWrite.java To change this license header.docxssuser454af01
FileWrite.javaFileWrite.java/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package filewrite;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
/**
* @description This program will write text to a file and save the file in the
* project's root directory.
* @author Eric
*/
publicclassFileWrite{
/**
* @param args the command line arguments
*/
publicstaticvoid main(String[] args){
// declaring variables of text and initializing the buffered writer
String txt ="Hello World.";
BufferedWriter writer =null;
// write the text variable using the bufferedwriter to testing.txt
try{
writer =newBufferedWriter(newFileWriter("testing.txt"));
writer.write(txt);
}
// print error message if there is one
catch(IOException io){
System.out.println("File IO Exception"+ io.getMessage());
}
//close the file
finally{
try{
if(writer !=null){
writer.close();
}
}
//print error message if there is one
catch(IOException io){
System.out.println("Issue closing the File."+ io.getMessage());
}
}
}
}
JavaMail.javaJavaMail.java/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package javamail;
import java.util.Properties;
import javax.mail.Message;
import javax.mail.MessagingException;
import javax.mail.PasswordAuthentication;
import javax.mail.Session;
import javax.mail.Transport;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeMessage;
/**
* @description This program uses Java to send emails over the SSL protocol.
* @author Eric
*/
publicclassJavaMail{
/**
* @param args the command line arguments
*/
publicstaticvoid main(String[] args){
Properties props =newProperties();
props.put("mail.smtp.host","smtp.gmail.com");
props.put("mail.smtp.socketFactory.port","465");
props.put("mail.smtp.socketFactory.class",
"javax.net.ssl.SSLSocketFactory");
props.put("mail.smtp.auth","true");
props.put("mail.smtp.port","465");
Session session =Session.getDefaultInstance(props,
new javax.mail.Authenticator(){
protectedPasswordAuthentication getPasswordAuthentication(){
returnnewPasswordAuthentication("username","password");
}
});
try{
Message message =newMimeMessage(session);
message.setFrom(newInternetAddress("[email protected]"));
message.setRecipients(Message.RecipientType.TO,
InternetAddress.parse("[email protected]"));
message.setSubject("Testing Subject");
message.setText("Dear Mail Crawler,"+
"\n\n No spam to my email, please!");
Transport.send(message);
System.out.println("Done");
}catch(MessagingException e){
thrownewRuntimeException(e);
}
}
}
loginApp.javaloginApp.java ...
This document outlines how to install and use YOLO (You Only Look Once) object detection with Darknet and YOLO3-4-Py. It describes downloading Darknet and compiling it with GPU support. It also explains how to train YOLO on the Pascal VOC and COCO datasets, and test models on images and video streams. Finally, it provides instructions for installing the YOLO3-4-Py Python wrapper to run YOLO models from Python.
Antonios Chatzipavlis presented on SQL Server backup and restore. The presentation covered database architecture basics including data files, transaction log files, and the buffer cache. It also discussed backup types like full, differential, transaction log, copy only and partial backups. Backup strategies and restore processes were explained, including restoring to a point in time and restoring system databases. The internals of how SQL Server performs backups using buffers and I/O threads was also summarized.
The document discusses how applications are built for the Boxee platform. It explains the structure of applications including descriptor files, skin files, and controls. Descriptor files provide metadata and configuration details. Skin files define the user interface using controls like images, labels, buttons, and panels. Controls have attributes like position, size, labels, and events. The document provides references for learning more about building Boxee applications.
Scale Your Data Tier With Windows Server App FabricChris Dufour
The distributed in-memory caching capabilities of Windows Server AppFabric will change how you think about scaling your Microsoft .NET-connected applications. Come learn how the distributed nature of the AppFabric cache allows large amounts of data to be stored in-memory for extremely fast access, how AppFabric's integration with Microsoft ASP.NET makes it easy to add low-latency data caching across the web farm, and discover the unique high availability features of AppFabric which will bring new degrees of scale and resilience to your data tier and your web applications.
This is module 11 in the EDI Data Publishing training course. In this module, you will learn the procedure to upload a data package to the EDI Repository.
This tutorial is designed for anyone who needs to work with data stored in HDF5 files. The tutorial will cover functionality and useful features of the HDF5 utilities h5dump, h5diff, h5repack, h5stat, h5copy, h5check and h5repart. We will also introduce a prototype of the new h52jpeg conversion tool and recently released h5perf_serial tool used for performance studies. We will briefly introduce HDFView. Details of the HDFView and HDF-Java will be discussed in a separate tutorial.
TechDays 2010 Portugal - Scaling your data tier with app fabric 16x9Nuno Godinho
This document discusses using Windows Server AppFabric caching to scale data layers. AppFabric caching provides a distributed, in-memory cache that can span machines and processes. It addresses issues like limited cache memory on individual servers. The document outlines how AppFabric caching works, how to install and configure it, and how to access the cache through the API. It also describes features like data distribution, eviction policies, and change notifications that allow the cache to efficiently scale to large workloads and data sets.
The Rocket Optimizer log contains information on video transcoding statistics and video server performance. The VideoTranscoderStats section provides details on optimization such as session ID, video duration, processing time and file sizes. The VideoServerStats section describes metrics like requests served, bandwidth usage and error rates to evaluate video delivery.
This a really short and compact introduction to CMake mechanisum and common variables used. Showed in a simple groupe meeting of the REVES team of the INRIA Sophia Antipolis (France) to sudents/PhD.
This document provides a Java coding standard for code structure, documentation, naming conventions, and recommendations. It outlines how to structure packages, program files, classes, methods, and code layout. It provides conventions for naming packages, files, classes, variables, methods and other identifiers. It gives recommendations for classes, exception handling, variables, methods, technical points and applying common sense. Code examples are provided to demonstrate the conventions.
You've made a good career developing applications using a relational database. You know learning how to be a Cassandra developer is going to be a great skill to add. Now it's time to bridge those two things into reality. I was in your shoes and I can help. How do you work without ACID transactions? The data model looks similar but is so different! What are some of the bad things I should avoid? What are some of the traps I can fall into moving from a relational database? I hear these questions all the time. Let's spend some time to walk through each one and get you on track. Before you know it, you'll be going crazy on your next Cassandra based application!
About the Speaker
Patrick McFadin Chief Evangelist, DataStax
Patrick McFadin is one of the leading experts of Apache Cassandra and data modeling techniques. As the Chief Evangelist for Apache Cassandra and consultant for DataStax, he has helped build some of the largest and exciting deployments in production. Previous to DataStax, he was Chief Architect at Hobsons and an Oracle DBA/Developer for over 15 years.
IP3build.xml Builds, tests, and runs the project IP3..docxchristiandean12115
This document describes a Java project that builds, tests, and runs a sandwich order application. It includes the source code for the main IP3 class and Sub class. The IP3 class runs the application, which prompts the user to enter order details like name, address, beverage, bread type, sandwich type and size. It calculates the total cost and outputs the order details. The Sub class defines the object used to store each order details.
This document provides a tutorial on Verilog HDL (Hardware Description Language). It discusses that HDLs like Verilog and VHDL are used to describe hardware using code. Verilog allows designers to describe designs at different levels of abstraction. Digital systems are highly complex, and Verilog provides a software platform for designers to express their designs with behavioral constructs. A Verilog program can be converted to a description used to manufacture chips like VLSI. The document then covers various Verilog topics like modules, ports, data types, always blocks, structural modeling, dataflow modeling, and behavioral modeling.
Similar to Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance (20)
Владимир Иванов (Oracle): Java: прошлое и будущееOlga Lavrentieva
Владимир Иванов: Software Engineer / Principal Member of Technical Staff в Oracle; г.Санкт-Петербург
Ведущий инженер Oracle, работает в группе разработки виртуальной Java-машиныHotSpot. Специализируется на JIT-компиляции и поддержке альтернативных языков на платформе Java.
Доклад: «Java: прошлое и будущее».
This document discusses various topics related to language and technology including:
- The history and names of the city of Strasbourg in different languages.
- Features of programming languages like classes, lambdas, and multiple assignment in ECMAScript 6.
- How Apache Cordova allows building native mobile apps using HTML, CSS, and JavaScript.
- Different phonetic systems used by linguists and how to organize searches across multiple conversions.
- Functional programming languages like Haskell and its features like laziness and parallelism.
- Web frameworks like Yesod built for Haskell.
- Cloud platforms like Cloud Foundry that support building apps in multiple languages using buildpacks.
Александр Протасеня: "PayPal. Различные способы интеграции"Olga Lavrentieva
Александр Протасеня (.Net Developer в Altoros): "PayPal. различные способы интеграции"
- Classic API, Subscriptions, Express Checkout, использование IPN. Разбор наиболее частых проблем.
Сергей Черничков: "Интеграция платежных систем в .Net приложения"Olga Lavrentieva
Сергей Черничков (.Net Developer в Altoros): "Интеграция платежных систем в .Net приложения"
- Выбор платежной системы (Payment Gateway)
- Обзор типовых решений интеграции платежных систем
- Рекомендации по разработке, тестированию интеграции платежной системы.
Антон Шемерей «Single responsibility principle в руби или почему instanceclas...Olga Lavrentieva
Антон Шемерей (Senior Developer в Sphere Consulting, г.Минск)
Доклад: «Single Responsibility Principle в Руби или почему instance/class variables это ОЧЕНЬ плохо»
Всем приходится работать с унаследованным кодом и часами тратить время на поиск устранения ошибок, которых в большинстве случаев можно было бы легко избежать. Одним из краеугольных камней является нарушение принципа единственной ответственности. В докладе пойдет речь о том, как провести анализ кода, как его можно исправить и как избегать таких ошибок в будущем.
Егор Воробьёв (Web Developer в Datarockets)
Доклад: «Ruby internals»
Юкихиро Мацумото и его команда потратили уйму времени, чтобы реализовать те вещи, которыми мы пользуемся каждый день. В своем докладе Егор расскажет, что скрывается за обычными строчками, которые каждый из нас использует, и объяснит, почему важно знать то, что находятся по ту сторону экрана.
Андрей Колешко (Team Lead проекта Mezuka)
Доклад: «Что не так с Rails?»
Андрей расскажет, как и почему он и его команда решили отказаться от многих возможностей Rails и чем их заменили на своем проекте. В целом рассказ Андрея - это рассуждение о том, к чему приводит неправильное использование Rails, почему Rails не годится для всех Web-проектов в том виде, в котором представляет его сообщество разработчиков, авторы книг и best practices.
Дмитрий Савицкий (Senior Software Engineer в Altoros)
Доклад: «Ruby Anti-Magic Shield»
Не упустите шанс попасть на сеанс практической магии с разоблачением от Дмитрия Савицкого. Способов помешать кому-то, кто пытается повлиять на ваш код со злым умыслом или по незнанию, не так уж и много. Дмитрий расскажет о тех немногочисленных возможностях, которые позволяют избежать запутанной и опасной "метамагии" в приложениях. Будет магически интересно.
Сергей Алексеев «Парное программирование. Удаленно»Olga Lavrentieva
Сергей Алексеев (Ruby Developer в Pinshape)
Доклад: «Парное программирование. Удаленно»
«Устали объяснять как это работает? Парное программирование – вместо тысячи слов. Потратили полдня на решение задачи и безрезультатно? Не тормозите – программируйте с напарником. Следуете трендам, следите за тенденциями – новое поколение выбирает парное программирование. Когда программировать одному уже не ice... Просто добавьте напарника. Несколько полезных инструментов и техник – мы отбираем только самое лучшее. Вы еще программируете в одиночку? Тогда мы идем к вам!»
Алексей Дёмин (Java Developer в InData Labs)
Доклад: «Почему Spark отнюдь не так хорош»
О чём: Сейчас по всем каналам идёт обсуждение новой революционной технологии обработки данных Spark. Алексей предлагает взглянуть чуть глубже и узнать, действительно ли Spark так хорош, как нам рассказывает об этом маркетинг.
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»Olga Lavrentieva
Cassandra is a scalable, masterless database. It uses a column family data structure with rows and columns mapped to keys. Data is stored in SSTables. Cassandra supports composite primary keys and secondary indexes. Examples show creating tables, inserting data, and performing queries. Normalization and denormalization techniques are discussed. References provide additional resources on Cassandra data modeling best practices.
«Практика построения высокодоступного решения на базе Cloud Foundry Paas»Olga Lavrentieva
This document discusses building a highly available solution based on the Cloud Foundry PaaS. It describes selecting AWS and OpenStack as technologies, implementing a pilot project on AWS across two regions, and using Cloud Foundry for application deployment. The solution provides a scalable and distributed platform for managing devices as a service, leveraging technologies like Cassandra, MariaDB, and open source components.
«Дизайн продвинутых нереляционных схем для Big Data»Olga Lavrentieva
Виктор Смирнов (Java Tech Lead в Klika Technologies)
Доклад: «Дизайн продвинутых нереляционных схем для Big Data»
О чём: Виктор познакомит всех с примерами продвинутых нереляционных схем данных и тем, как они могут использоваться для решения задач, связанных с хранением и обработкой больших данных.
«Нужно больше шин! Eventbus based framework vertx.io»Olga Lavrentieva
Михаил Бортник (Ruby Developer в R&R Music Ukraine, г.Киев)
Доклад: «Нужно больше шин! Eventbus-based framework Vertx.io»
О чём: Михаил поведает о мультиязычном фреймворке с нетрадиционным подходом, а также о том, как Software заимствует идеи у Hardware.
«Работа с базами данных с использованием Sequel»Olga Lavrentieva
Сергей Нартымов (Software Engineer в Transinet GmbH, г.Минск)
Доклад: «Работа с базами данных с использованием Sequel»
О чём: Ruby библиотека для работы с базами данных Sequel представляет собой легковесную альтернативу более популярной Active Record. Sequel лежит в основе работы с SQL базами данных в ROM (Ruby Object Mapper) - развивающемся ORM для Ruby, реализующим паттерн Data Mapper. В докладе будут рассмотрены различные аспекты использования Sequel, в том числе показаны примеры использования некоторых возможностей PostgreSQL с помощью Sequel.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
4. 1. Use partitions whenever possible
create table video (
id STRING,
title STRING,
description STRING,
viewCount BIGINT
) PARTITIONED BY (uploadYear date)
STORED AS ORC;
insert into table video PARTITION (uploadYear) select * from video_external;
5. 2. Use bucketing
create table video (
id STRING,
channelId STRING,
title STRING,
description STRING,
) CLUSTERED BY(channelId)
INTO 2 BUCKETS
STORED AS ORC;
create table channel (
id STRING,
title STRING,
description STRING,
viewCount BIGINT
) CLUSTERED BY(id)
INTO 2 BUCKETS
STORED AS ORC;
SELECT v.title FROM video v JOIN channel ch ON v.channelId = ch.id WHERE
ch.viewCount>1000
12. 4. Use joins optimization
Sort-merge-bucket (SMB) join:
13. 5. Choose the right input format
Row Data Column Store
14. 6. Other optimization
Avoid highly normalized table structures
Compress map/reduce output
For map output compression, execute set mapred.compress.map.output = true.
For job output compression, execute set mapred.output.compress = true.
Use parallel execution
SET hive.exce.parallel=true;
15. 7. Use the 'explain' keyword to improve the query
execution plan
EXPLAIN query...
16. 7. Use the 'explain' keyword to improve the query
execution plan
17. 8. Stinger Initiative
Use cost-based optimization
Use vectorization
Transactions with ACID semantics