The document discusses various indexing strategies and data structures in MySQL for high performance, including:
1. Hash indexes use a hash table to store keys and row pointers for exact match queries in constant time, but don't support range queries.
2. B-tree indexes support range, prefix and multicolumn queries in O(log n) time and are widely used in databases like InnoDB.
3. Indexing strategies recommend isolating columns to be indexed and using prefix indexes for columns like text to improve selectivity and performance of queries.
The document provides information about using IDLE to edit and run Python scripts under Windows. It discusses using IDLE as an interactive development environment with a graphical user interface. The startup message for IDLE is shown, indicating the version number and interactive mode. Basic variable types and operations in Python like integers, floats, strings, Boolean expressions, and lists are introduced.
در این جلسه از کلاس به معرفی ساختار های داده ای در زبان پایتون و معرفی رشته ها و اعداد میپردازیم
PySec101 Fall 2013 J2E1 By Mohammad Reza Kamalifard
Talk About
Python Data Structures, Strings, Numbers,...
This presentation is all about various built in
datastructures which we have in python.
List
Dictionary
Tuple
Set
and various methods present in each data structure
In this chapter we will explore strings. We are going to explain how they are implemented in Java and in what way we can process text content. Additionally, we will go through different methods for manipulating a text: we will learn how to compare strings, how to search for substrings, how to extract substrings upon previously settled parameters and last but not least how to split a string by separator chars. We will demonstrate how to correctly build strings with the StringBuilder class. We will provide a short but very useful information for the most commonly used regular expressions.
This document provides an overview of dictionaries, hash tables, and sets. It discusses the dictionary abstract data type and how it can be implemented using hash tables. It covers hashing, collision resolution strategies, and the .NET Dictionary<TKey, TValue> class. It also discusses sets and the HashSet<T> and SortedSet<T> classes, comparing their time complexities.
Strings in Python are arrays of bytes representing Unicode characters. Individual characters in a string can be accessed using indexes and slices. Strings are immutable, so their elements cannot be changed once created.
Various methods are available for string manipulation in Python. These include methods for accessing characters by index or slice, checking substrings, converting case, padding/stripping strings, and more. Character methods like ord() and chr() allow getting ASCII values and characters.
The document provides information about using IDLE to edit and run Python scripts under Windows. It discusses using IDLE as an interactive development environment with a graphical user interface. The startup message for IDLE is shown, indicating the version number and interactive mode. Basic variable types and operations in Python like integers, floats, strings, Boolean expressions, and lists are introduced.
در این جلسه از کلاس به معرفی ساختار های داده ای در زبان پایتون و معرفی رشته ها و اعداد میپردازیم
PySec101 Fall 2013 J2E1 By Mohammad Reza Kamalifard
Talk About
Python Data Structures, Strings, Numbers,...
This presentation is all about various built in
datastructures which we have in python.
List
Dictionary
Tuple
Set
and various methods present in each data structure
In this chapter we will explore strings. We are going to explain how they are implemented in Java and in what way we can process text content. Additionally, we will go through different methods for manipulating a text: we will learn how to compare strings, how to search for substrings, how to extract substrings upon previously settled parameters and last but not least how to split a string by separator chars. We will demonstrate how to correctly build strings with the StringBuilder class. We will provide a short but very useful information for the most commonly used regular expressions.
This document provides an overview of dictionaries, hash tables, and sets. It discusses the dictionary abstract data type and how it can be implemented using hash tables. It covers hashing, collision resolution strategies, and the .NET Dictionary<TKey, TValue> class. It also discusses sets and the HashSet<T> and SortedSet<T> classes, comparing their time complexities.
Strings in Python are arrays of bytes representing Unicode characters. Individual characters in a string can be accessed using indexes and slices. Strings are immutable, so their elements cannot be changed once created.
Various methods are available for string manipulation in Python. These include methods for accessing characters by index or slice, checking substrings, converting case, padding/stripping strings, and more. Character methods like ord() and chr() allow getting ASCII values and characters.
Pointers allow programs to store and manipulate memory addresses. A pointer variable contains the address of another variable. Pointers are useful for passing data between functions, returning multiple values from functions, and dynamically allocating memory at runtime. Pointers can also be used to access elements of arrays indirectly and implement multidimensional arrays more efficiently. Pointer notation provides an alternative way to access array elements through pointer arithmetic rather than subscripts.
This document provides an overview of string fundamentals in Python including:
- Strings can be indexed and sliced to access individual characters or substrings
- Built-in functions like len() return the length of a string
- Strings are immutable so cannot be modified, but new strings can be created
- Common string methods like upper(), lower(), find(), strip() can manipulate strings
All data values in Python are encapsulated in relevant object classes. Everything in Python is an object and every object has an identity, a type, and a value. Like another object-oriented language such as Java or C++, there are several data types which are built into Python. Extension modules which are written in C, Java, or other languages can define additional types.
To determine a variable's type in Python you can use the type() function. The value of some objects can be changed. Objects whose value can be changed are called mutable and objects whose value is unchangeable (once they are created) are called immutable.
- Variables in PHP are prefixed with a $ sign and can contain any type of data value. Variable names are case-sensitive.
- PHP supports scalar data types like integers, floats, booleans, and strings as well as complex types like arrays and objects. Variables do not require explicit typing.
- Arrays allow storing multiple values in a single variable through numeric or associative indexes. Arrays can be nested to any level and PHP provides many functions for manipulating array values and structure.
The document discusses cryptographic hash functions, including an overview of their usage, properties, structures, attacks, and the need for a new secure hash standard. It describes how hash functions work by condensing arbitrary messages into fixed-size message digests. The properties of preimage resistance, second preimage resistance, and collision resistance are explained. Common hashing algorithms like MD5, SHA-1, and SHA-2 are outlined along with vulnerabilities like birthday attacks. The document concludes by noting the need to replace standards like MD5 and SHA-1 due to successful cryptanalysis attacks.
This document provides an introduction and overview of strings in Python. It discusses that strings are a data type that can contain sequences of characters. The built-in string class is 'str' and strings can be defined using single, double, or triple quotes. Strings support various methods like indexing, slicing, concatenation, formatting and more. Common string methods are also described such as upper(), lower(), split(), join() which allow manipulating strings. The document also discusses comparing and slicing strings in Python.
These are the slides of the second part of this multi-part series, from Learn Python Den Haag meetup group. It covers List comprehensions, Dictionary comprehensions and functions.
This document provides an introduction to the Python language and discusses Python data types. It covers how to install Python, interact with the Python interpreter through command line and IDLE modes, and learn basic Python parts like data types, operators, functions, and control structures. The document discusses numeric, string, and other data types in Python and how to manipulate them using built-in functions and operators. It also introduces Python library modules and the arcpy package for geoprocessing in ArcGIS.
The document discusses various aspects of arrays in C programming language. It defines arrays as collections of similar data types stored in contiguous memory locations. It describes single dimensional and multi-dimensional arrays. It also discusses array declaration and initialization syntax. Some key points covered are: advantages of arrays over single variables, accessing array elements using indexes, passing arrays to functions, and two dimensional or 2D arrays also called matrices.
The document discusses various aspects of arrays in C programming, including:
- Declaring and initializing one-dimensional arrays
- Accessing array elements using pointers and indexes
- Declaring and initializing two-dimensional arrays
- Passing arrays to functions by passing the base address
- Declaring arrays of pointers where each element is a pointer variable
An array is a collection of variables of the same type that are referenced using a common name and contiguous memory locations. One-dimensional arrays allow storing multiple variables of the same type under a single variable name. Linear/sequential search compares each element to the search key while binary search divides the array in half at each step to find the search key faster than linear search.
This document provides an introduction to the R programming language. It discusses that R was created in the 1990s and is based on the S language. R is an interpreted, high-level language that supports multiple programming paradigms. The document then covers getting started with R, choosing an integrated development environment, using R as a calculator, assigning variables, comments, getting help, basic data types, and various data structures in R including vectors, matrices, arrays, and lists.
Here are the function definitions and declarations to transfer the variables and arrays between main and the function as specified:
(a)
float func1(float a, float b, int jstar[20]) {
float x;
// function body
return x;
}
int main() {
float a, b;
int jstar[20];
float x = func1(a, b, jstar);
}
(b)
float func2(int n, char c, double values[50]) {
float x;
// function body
return x;
}
int main() {
int n;
char c;
double values[50
An array is a container that holds a fixed number of values of the same type. An array's length is determined when it is created and cannot be changed. The document then provides an example of creating an integer array called "scores" with 4 elements to store the scores of 4 cricket teams. It demonstrates accessing the elements of the array using indexes and printing the team scores.
1. This document discusses string operations and methods in Python. It covers topics like equality, numerical operations, containment, indexing, slicing, and various string methods such as capitalize(), count(), isalpha(), join(), find(), and replace().
2. Common string methods are explained including capitalize(), right/left/center justification, count(), checking string types, title case, swap case, joining strings, finding substrings, and replacing characters.
3. Examples are provided to demonstrate various string methods like capitalize(), center(), count(), isalpha(), join(), find(), and replace(). Length, indexing, and checking string types are also shown.
The document discusses strings in Python. It describes that strings are immutable sequences of characters that can contain letters, numbers and special characters. It covers built-in string functions like len(), max(), min() for getting the length, maximum and minimum character. It also discusses string slicing, concatenation, formatting, comparison and various string methods for operations like conversion, formatting, searching and stripping whitespace.
Dev Concepts: Data Structures and AlgorithmsSvetlin Nakov
Brief overview of the "data structures" and "algorithms" concepts.
Watch the video lesson from Svetlin Nakov and learn more at: https://softuni.org/dev-concepts/what-are-data-structures-and-algorithms
The document discusses various string manipulation techniques in Python such as getting the length of a string, traversing strings using loops, slicing strings, immutable nature of strings, using the 'in' operator to check for substrings, and comparing strings. Key string manipulation techniques covered include getting the length of a string using len(), extracting characters using indexes and slices, traversing strings with for and while loops, checking for substrings with the 'in' operator, and comparing strings.
This document discusses various string operations in Python including: finding the length of a string; accessing and slicing characters; the difference between strings and lists; converting case; checking character types; splitting strings; finding substrings; reading and printing strings; concatenation and repetition; iterating through strings with for loops; and common string methods like isalpha, isdigit, lower, upper, title, join, split, count, find, index. It also provides examples of problems involving anagrams, pangrams, unique characters, and removing duplicates from strings.
Matt Ranney explains the Uber architecture overall, with a focus on the dispatch systems, the geospatial index, handling failure, and dealing with the distributed traveling salesman problem.
A B+ tree is a self-balancing search tree where all leaf nodes are at the same depth. It consists of index pages and data pages, with the leaf nodes containing the data entries. Searching, insertion, and deletion may require rebalancing the tree by splitting or merging nodes. Duplicates are allowed and can be retrieved by finding the left-most entry and following sequence pointers to additional leaf pages containing the same key.
Pointers allow programs to store and manipulate memory addresses. A pointer variable contains the address of another variable. Pointers are useful for passing data between functions, returning multiple values from functions, and dynamically allocating memory at runtime. Pointers can also be used to access elements of arrays indirectly and implement multidimensional arrays more efficiently. Pointer notation provides an alternative way to access array elements through pointer arithmetic rather than subscripts.
This document provides an overview of string fundamentals in Python including:
- Strings can be indexed and sliced to access individual characters or substrings
- Built-in functions like len() return the length of a string
- Strings are immutable so cannot be modified, but new strings can be created
- Common string methods like upper(), lower(), find(), strip() can manipulate strings
All data values in Python are encapsulated in relevant object classes. Everything in Python is an object and every object has an identity, a type, and a value. Like another object-oriented language such as Java or C++, there are several data types which are built into Python. Extension modules which are written in C, Java, or other languages can define additional types.
To determine a variable's type in Python you can use the type() function. The value of some objects can be changed. Objects whose value can be changed are called mutable and objects whose value is unchangeable (once they are created) are called immutable.
- Variables in PHP are prefixed with a $ sign and can contain any type of data value. Variable names are case-sensitive.
- PHP supports scalar data types like integers, floats, booleans, and strings as well as complex types like arrays and objects. Variables do not require explicit typing.
- Arrays allow storing multiple values in a single variable through numeric or associative indexes. Arrays can be nested to any level and PHP provides many functions for manipulating array values and structure.
The document discusses cryptographic hash functions, including an overview of their usage, properties, structures, attacks, and the need for a new secure hash standard. It describes how hash functions work by condensing arbitrary messages into fixed-size message digests. The properties of preimage resistance, second preimage resistance, and collision resistance are explained. Common hashing algorithms like MD5, SHA-1, and SHA-2 are outlined along with vulnerabilities like birthday attacks. The document concludes by noting the need to replace standards like MD5 and SHA-1 due to successful cryptanalysis attacks.
This document provides an introduction and overview of strings in Python. It discusses that strings are a data type that can contain sequences of characters. The built-in string class is 'str' and strings can be defined using single, double, or triple quotes. Strings support various methods like indexing, slicing, concatenation, formatting and more. Common string methods are also described such as upper(), lower(), split(), join() which allow manipulating strings. The document also discusses comparing and slicing strings in Python.
These are the slides of the second part of this multi-part series, from Learn Python Den Haag meetup group. It covers List comprehensions, Dictionary comprehensions and functions.
This document provides an introduction to the Python language and discusses Python data types. It covers how to install Python, interact with the Python interpreter through command line and IDLE modes, and learn basic Python parts like data types, operators, functions, and control structures. The document discusses numeric, string, and other data types in Python and how to manipulate them using built-in functions and operators. It also introduces Python library modules and the arcpy package for geoprocessing in ArcGIS.
The document discusses various aspects of arrays in C programming language. It defines arrays as collections of similar data types stored in contiguous memory locations. It describes single dimensional and multi-dimensional arrays. It also discusses array declaration and initialization syntax. Some key points covered are: advantages of arrays over single variables, accessing array elements using indexes, passing arrays to functions, and two dimensional or 2D arrays also called matrices.
The document discusses various aspects of arrays in C programming, including:
- Declaring and initializing one-dimensional arrays
- Accessing array elements using pointers and indexes
- Declaring and initializing two-dimensional arrays
- Passing arrays to functions by passing the base address
- Declaring arrays of pointers where each element is a pointer variable
An array is a collection of variables of the same type that are referenced using a common name and contiguous memory locations. One-dimensional arrays allow storing multiple variables of the same type under a single variable name. Linear/sequential search compares each element to the search key while binary search divides the array in half at each step to find the search key faster than linear search.
This document provides an introduction to the R programming language. It discusses that R was created in the 1990s and is based on the S language. R is an interpreted, high-level language that supports multiple programming paradigms. The document then covers getting started with R, choosing an integrated development environment, using R as a calculator, assigning variables, comments, getting help, basic data types, and various data structures in R including vectors, matrices, arrays, and lists.
Here are the function definitions and declarations to transfer the variables and arrays between main and the function as specified:
(a)
float func1(float a, float b, int jstar[20]) {
float x;
// function body
return x;
}
int main() {
float a, b;
int jstar[20];
float x = func1(a, b, jstar);
}
(b)
float func2(int n, char c, double values[50]) {
float x;
// function body
return x;
}
int main() {
int n;
char c;
double values[50
An array is a container that holds a fixed number of values of the same type. An array's length is determined when it is created and cannot be changed. The document then provides an example of creating an integer array called "scores" with 4 elements to store the scores of 4 cricket teams. It demonstrates accessing the elements of the array using indexes and printing the team scores.
1. This document discusses string operations and methods in Python. It covers topics like equality, numerical operations, containment, indexing, slicing, and various string methods such as capitalize(), count(), isalpha(), join(), find(), and replace().
2. Common string methods are explained including capitalize(), right/left/center justification, count(), checking string types, title case, swap case, joining strings, finding substrings, and replacing characters.
3. Examples are provided to demonstrate various string methods like capitalize(), center(), count(), isalpha(), join(), find(), and replace(). Length, indexing, and checking string types are also shown.
The document discusses strings in Python. It describes that strings are immutable sequences of characters that can contain letters, numbers and special characters. It covers built-in string functions like len(), max(), min() for getting the length, maximum and minimum character. It also discusses string slicing, concatenation, formatting, comparison and various string methods for operations like conversion, formatting, searching and stripping whitespace.
Dev Concepts: Data Structures and AlgorithmsSvetlin Nakov
Brief overview of the "data structures" and "algorithms" concepts.
Watch the video lesson from Svetlin Nakov and learn more at: https://softuni.org/dev-concepts/what-are-data-structures-and-algorithms
The document discusses various string manipulation techniques in Python such as getting the length of a string, traversing strings using loops, slicing strings, immutable nature of strings, using the 'in' operator to check for substrings, and comparing strings. Key string manipulation techniques covered include getting the length of a string using len(), extracting characters using indexes and slices, traversing strings with for and while loops, checking for substrings with the 'in' operator, and comparing strings.
This document discusses various string operations in Python including: finding the length of a string; accessing and slicing characters; the difference between strings and lists; converting case; checking character types; splitting strings; finding substrings; reading and printing strings; concatenation and repetition; iterating through strings with for loops; and common string methods like isalpha, isdigit, lower, upper, title, join, split, count, find, index. It also provides examples of problems involving anagrams, pangrams, unique characters, and removing duplicates from strings.
Matt Ranney explains the Uber architecture overall, with a focus on the dispatch systems, the geospatial index, handling failure, and dealing with the distributed traveling salesman problem.
A B+ tree is a self-balancing search tree where all leaf nodes are at the same depth. It consists of index pages and data pages, with the leaf nodes containing the data entries. Searching, insertion, and deletion may require rebalancing the tree by splitting or merging nodes. Duplicates are allowed and can be retrieved by finding the left-most entry and following sequence pointers to additional leaf pages containing the same key.
This document discusses indexing in MySQL databases to improve query performance. It begins by defining an index as a data structure that speeds up data retrieval from databases. It then covers various types of indexes like primary keys, unique indexes, and different indexing algorithms like B-Tree, hash, and full text. The document discusses when to create indexes, such as on columns frequently used in queries like WHERE clauses. It also covers multi-column indexes, partial indexes, and indexes to support sorting, joining tables, and avoiding full table scans. The concepts of cardinality and selectivity are introduced. The document concludes with a discussion of index overhead and using EXPLAIN to view query execution plans and index usage.
The document discusses B+ trees, which are self-balancing search trees used to store data in databases. It defines B+ trees, provides examples, and explains how to perform common operations like searching, insertion, and deletion on B+ trees in a way that maintains the tree's balanced structure. Key aspects are that B+ trees allow fast searching, maintain balance during operations, and improve performance over other tree structures for large databases.
The document discusses maintaining a dynamic dictionary on disk and describes the state of the art which includes B-trees and several variants. It then introduces the fractal tree, which is a replacement for traditional B-trees that can perform high entropy inserts and deletes up to 100 times faster without suffering from aging effects on range queries. Experimental results show the fractal tree implemented in TokuDB provides 10-100x faster index inserts and faster queries compared to traditional B-trees.
The document discusses full text search improvements in MySQL 5.1, including faster boolean search, custom plugins, and better unicode support. It provides examples of how to create and use full text indexes in MySQL, and tips for speeding up full text search such as fitting indexes in memory and partitioning data. Integrating Sphinx search with MySQL is also covered.
The document summarizes a presentation given to the AFCOM Melbourne Chapter in February 2012 titled "Through the Looking Glass 2012." It discusses trends in technology from 2012 to 2017 and their impact on data centers. Key points include:
1) CPUs were becoming faster, moving to more cores and lower power designs. Server density was increasing but also heat.
2) Storage trends included declining SSD prices, larger hard drives, and improvements in deduplication and tape storage.
3) Networking was shifting to 10 Gigabit and converged networks, but duplicate hardware would be needed for 5 years during the transition.
4) Emerging technologies like cloud, big data appliances, and the "Internet of
Power Notes Measurements and Dealing with Datajmori1
1) This document provides instructions and notes for a chemistry class assignment on measurements and dealing with data.
2) It includes a table to log assignments, with due dates. It also lists materials and tools needed for the class.
3) The notes cover the International System of Units (SI system), units of measurement, accuracy vs precision, variables, types of relationships between variables, and how to analyze and interpret data through graphs.
El documento lista las fechas y temas de varios correos electrónicos enviados por Saida Yenifer López de la Institución Educativa Francisco José de Caldas. Los correos incluyen información sobre talleres para grados octavos enviada el 14 de febrero, una lista de necesidades para la sala de sistemas el 5 de marzo, notas del primer periodo el 16 de marzo, un formato el 11 de abril, talleres de recuperación para el primer periodo el 6 de mayo, asistencia de padres de familia para grado sexto el
Презентация к докладу президента Международного центра по налогам и инвестициям Даниэла А. Уитта на казахстанско-американском инвестиционном форуме в Нью-Йорке 7 декабря 2011 года.
What is Social Media and how can it work for a service professionalRather Inventive
Social media can be used for business promotion, research, and connecting with potential customers. The document provides tips for using social media including understanding your audience and goals, regularly engaging with relevant posts to raise your profile, and measuring how social media drives people to desired actions. Business owners are advised to build relationships on Facebook by liking relevant pages and mentioning them, and to stand out on LinkedIn with complete profiles.
The document provides instructions for an assignment on science tools and note-taking. Students are asked to color illustrations of tools like thermometers and graduated cylinders. They also need to complete Cornell notes on tools, measurement, and note-taking techniques. The assignment emphasizes accurate use of tools like balances and measuring volume at the bottom of containers. Students should finish diagrams of tools for homework.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms for those who already suffer from conditions like anxiety and depression.
The document discusses Java interfaces. It explains that interfaces declare methods but do not provide implementations, and that classes implement interfaces to provide those method implementations. It provides examples of how to declare an interface and how a class implements an interface. It also discusses why interfaces are useful for allowing classes to have multiple roles and behave in standard ways.
This document discusses and summarizes 6 development tools that are useful for Mac development: Gitbox & GitHub for collaborating on code repositories, CloudApp for easily sharing screenshots, Kaleidoscope for comparing code changes, Sublime Text 2 as a full-featured text editor, YAJL for formatting and validating JSON, and Curl & HTTP Client for making API calls. It also provides links to download or learn more about each tool.
This document provides instructions for a chemistry lab involving testing unknown powder samples. Students are asked to:
1) Perform a series of tests (sensory observations, solubility, iodine, vinegar, heat) on 5 known powder samples and record results.
2) Use the evidence from the tests to determine the identities of 2 unknown powders in a sample bag.
3) Explain their reasoning for the identifications and turn in their lab sheet. Proper lab clean-up and binder/note submission are also required.
This two sentence document refers to the first and second floors of a building, suggesting it may be describing the layout of a multi-level structure. No other context or details are provided about the contents or purpose of the different floors.
Implementing transparency and open government projects in GreeceMichael Psallidas
This document discusses challenges and opportunities for implementing transparency and open government projects in Greece. It outlines past open government projects in Greece from 2009-2012 that focused on proof of concepts, skills development, interoperability, and open standards. Current demands for transparency stem from the economic crisis. The Transparency Program aims to establish a new relationship between citizens and government based on accountability. It promotes open data through a multichannel platform and encourages third parties to create applications. Challenges include overcoming issues within institutions and culture, bridging access gaps, and adopting diverse technical solutions. Opportunities exist for both public and private sectors to improve governance and gain business value through openness, feedback, and knowledge-based applications.
The document discusses dictionaries in Python. It explains that a dictionary is a collection of unordered key-value pairs where keys must be unique. Keys can be strings, numbers, or tuples, while values can be any data type. Dictionaries allow accessing values using keys. Common dictionary operations include adding/updating items, checking if a key exists, getting values, and looping through keys and key-value pairs. The document also provides examples of using dictionaries to count word frequencies in a text file by parsing and removing punctuation.
14-Intermediate code generation - Variants of Syntax trees - Three Address Co...venkatapranaykumarGa
The document discusses intermediate code generation in compilers. It describes benefits of using an intermediate representation like retargetability and optimization. Common intermediate representations include syntax trees, postfix notation, and three-address code. Three-address code represents expressions as sequences of instructions with three operands. It is generated from syntax trees or DAGs through syntax-directed translation.
This document provides an overview of dictionaries in Python. It discusses how dictionaries are defined using curly braces {}, how keys can be any immutable object like strings or numbers while values can be any object, and how to access values using keys. It also covers adding and deleting key-value pairs, checking for keys, shallow and deep copying, and creating dictionaries from keys or sequences.
Arrays in C allow storing multiple values of the same data type in contiguous memory locations. An array is declared with a data type, array name, and size. Individual elements are accessed using the array name and index. Arrays are useful for storing lists of values, performing matrix operations, implementing algorithms like search and sort, and more. Strings in C are implemented as arrays of characters that are null-terminated. Functions like strcpy(), strcat(), strcmp() allow manipulating strings.
The document discusses intermediate code generation in compilers. It describes intermediate code as the output of the parser and input to the code generator. Three common types of intermediate representations are discussed: syntax trees, postfix notation, and three address code. Three address code represents statements in the form of X=Y op Z and is described as a linearized representation of a syntax tree that is easy to manipulate and optimize. The document provides examples of three address code generated from syntax trees and DAGs.
This document provides an introduction to Python programming concepts including data types, operators, control flow statements, functions and modules. It discusses the basic Python data types like integers, floats, booleans, strings, lists, tuples, dictionaries and sets. It also covers Python operators like arithmetic, assignment, comparison, logical and identity operators. Additionally, it describes control flow statements like if/else and for loops. Finally, it touches on functions, modules and input/output statements in Python.
The document discusses an R programming module that will cover getting started with R, data types and structures, control flow and functions, and scalability. It compares R to MATLAB and Python, describing their similarities as interactive shells for data manipulation but noting differences in popularity across fields and open-source availability. Base graphics and ggplot2 for data visualization are introduced. Sample datasets are also mentioned.
MATLAB can represent different types of numerical data including integers, floating point numbers, and complex numbers. Integers can be 8-bit, 16-bit, 32-bit or 64-bit. Floating point numbers can be single or double precision. Complex numbers contain real and imaginary parts which are double precision by default. MATLAB provides various mathematical functions and the output format of numerical data can be controlled using format commands. Common functions include trigonometric, rounding, absolute value and exponential functions which can operate on scalars, vectors and matrices. Prime numbers can be identified using the isprime function and the mathematical constant e is represented by exp(1).
Arrays in C are collections of similar data types stored in contiguous memory locations that can be accessed via indexes, they can be declared with a specified data type and size and initialized with values, and multi-dimensional arrays allow the storage of two-dimensional data structures like matrices through multiple subscripts denoting rows and columns.
An array is a contiguous block of memory that stores elements of the same data type. Arrays allow storing and accessing related data collectively under a single name. An array is declared with a data type, name, and size. Elements are accessed via indexes that range from 0 to size-1. Common array operations include initialization, accessing elements using loops, input/output, and finding highest/lowest values. Arrays can be single-dimensional or multi-dimensional. Multi-dimensional arrays represent matrices and elements are accessed using multiple indexes. Common array applications include storing student marks, employee salaries, and matrix operations.
Homework Assignment – Array Technical DocumentWrite a technical .pdfaroraopticals15
Homework Assignment – Array Technical Document
Write a technical document that describes the structure and use of arrays. The document should
be 3 to 5 pages and include an Introduction section, giving a brief synopsis of the document and
arrays, a Body section, describing arrays and giving an annotated example of their use as a
programming construct, and a conclusion to revisit important information about arrays described
in the Body of the document. Some suggested material to include:
Declaring arrays of various types
Array pointers
Printing and processing arrays
Sorting and searching arrays
Multidimensional arrays
Indexing arrays of various dimension
Array representation in memory by data type
Passing arrays as arguments
If you find any useful images on the Internet, you can use them as long as you cite the source in
end notes.
Solution
Array is a collection of variables of the same type that are referenced by a common name.
Specific elements or variables in the array are accessed by means of index into the array.
If taking about C, In C all arrays consist of contiguous memory locations. The lowest address
corresponds to the first element in the array while the largest address corresponds to the last
element in the array.
C supports both single and multi-dimensional arrays.
1) Single Dimension Arrays:-
Syntax:- type var_name[size];
where type is the type of each element in the array, var_name is any valid identifier, and size is
the number of elements in the array which has to be a constant value.
*Array always use zero as index to first element.
The valid indices for array above are 0 .. 4, i.e. 0 .. number of elements - 1
For Example :- To load an array with values 0 .. 99
int x[100] ;
int i ;
for ( i = 0; i < 100; i++ )
x[i] = i ;
To determine to size of an array at run time the sizeof operator is used. This returns the size in
bytes of its argument. The name of the array is given as the operand
size_of_array = sizeof ( array_name ) ;
2) Initialisg array:-
Arrays can be initialised at time of declaration in the following manner.
type array[ size ] = { value list };
For Example :-
int i[5] = {1, 2, 3, 4, 5 } ;
i[0] = 1, i[1] = 2, etc.
The size specification in the declaration may be omitted which causes the compiler to count the
number of elements in the value list and allocate appropriate storage.
For Example :- int i[ ] = { 1, 2, 3, 4, 5 } ;
3) Multidimensional array:-
Multidimensional arrays of any dimension are possible in C but in practice only two or three
dimensional arrays are workable. The most common multidimensional array is a two
dimensional array for example the computer display, board games, a mathematical matrix etc.
Syntax :type name [ rows ] [ columns ] ;
For Example :- 2D array of dimension 2 X 3.
int d[ 2 ] [ 3 ] ;
A two dimensional array is actually an array of arrays, in the above case an array of two integer
arrays (the rows) each with three elements, and is stored row-wise in memory.
For Example :- Program to fill .
This chapter discusses working with text and numbers in PHP. It covers defining and manipulating strings, including validating, formatting, and changing case. Functions for selecting, replacing, and exploding parts of strings are described. Working with numbers, math operators, variables, and number formatting functions are also summarized. Key string functions include substr(), str_replace(), printf(), and number functions include rand(), round(), pow(), and abs().
The document discusses different techniques for storing and searching data, including sequential search, binary search, and hashing. It provides details on open hashing and closed hashing, describing that closed hashing stores elements within buckets and can cause collisions when multiple elements are mapped to the same bucket. The document also outlines characteristics of good hash functions and different hashing methods like division, mid-square, folding, digit analysis, length dependent, algebraic coding, and multiplicative hashing.
The document describes 11 practical implementations of cryptographic techniques using C programming language. The techniques implemented include Caesar cipher, Playfair cipher, Hill cipher, Rail Fence cipher, Data Encryption Standard (DES), Rivest-Shamir-Adleman (RSA) algorithm, and Diffie-Hellman key exchange algorithm. For each practical, it provides the objective, description of the algorithm, example, steps of the algorithm, and the C program code with inputs and outputs. The document is a practical file submitted by a student to fulfill the requirements for a Bachelor of Technology degree.
presentation on important DAG,TRIE,Hashing.pptxjainaaru59
Directed acyclic graph (DAG) is used to represent the flow of values between basic blocks of code. A DAG is a directed graph with no cycles. It is generated during intermediate code generation. DAGs determine common subexpressions and the flow of names and computed values between blocks of code. An algorithm is described to construct a DAG by creating nodes for operands and adding edges between nodes and operator nodes. Examples show how expressions are represented by a DAG. The complexity of a DAG depends on its width and depth. Applications of DAGs include determining common subexpressions, names used in blocks, and which statements' values may be used outside blocks.
This document discusses compiler architecture and intermediate code generation. It begins by describing the typical phases of a compiler: parsing, static checking, and code generation. It then discusses intermediate code, which ties the front end and back end phases together and is language and machine independent. Various forms of intermediate code are described, including trees, postfix notation, and triple/quadruple intermediate code. The rest of the document focuses on triple/quadruple code, including how it represents expressions, statements, addressing of arrays, and the translation process from source code to triple/quadruple intermediate code.
This document provides an overview of Python object and data structure basics. It discusses basic data types like integers, floats, strings, lists, dictionaries, tuples, sets, and booleans. It covers numeric operators and variable assignments. Strings are described in detail including indexing, slicing, and common string methods. Lists support indexing, slicing, and methods like append, sort, and reverse. Dictionaries use key-value pairs to store data and have methods like keys and values. Tuples are like lists but immutable. Sets store unique elements. The input function allows user input in programs. Examples are given for calculating area and volume using user input.
This document provides an overview of some basic elements of SQL and MySQL, including literals, data types, null values, comments, and simple queries. It defines literals as fixed data values like numbers or character strings. It describes several numeric, date/time, and string data types. It also covers null values, comments, and how to write basic SELECT queries to retrieve and filter rows from a database table.
Similar to Mysql Performance Optimization Indexing Algorithms and Data Structures (20)
PageRank is an algorithm created by Google's founders to rank the importance of websites in the network of links on the internet. It uses a probability-based model to determine the likelihood that a random user would arrive at a given page. PageRank is calculated through an iterative process of evaluating the inbound links from other pages, with more weight given to pages that are already highly ranked. The example demonstrates how PageRank is computed for a simple network of four pages, with the highest ranking going to the page that receives a link from the page with the strongest inbound links.
Rotating Wave Approximation (RWA) breaks down for few-cycle pulses. Using Gaussian pulses, quantum logic gates like NOT and Hadamard can be implemented, but their effectiveness decreases for pulses with a small frequency compared to the transition frequency between levels. Attosecond pulses cannot be used to study ultrafast phenomena in biology due to limitations of RWA - the required x-ray or gamma ray frequencies would damage living cells. The document examines how RWA breakdown affects population dynamics in a two-level system interacting with femtosecond and attosecond pulses.
This document discusses various MySQL performance optimization techniques, including:
- Choosing between the InnoDB and MyISAM storage engines, with InnoDB generally recommended due to its transactional capabilities and row-level locking.
- Selecting optimal data types to minimize storage size and improve indexing and query performance.
- Considering whether to normalize or denormalize database schemas based on query patterns to reduce the need for joins or minimize data duplication respectively.
- Using summary/cache tables to pre-aggregate data and improve performance of analytical queries that involve expensive joins across multiple tables.
- Understanding the EXPLAIN output to analyze indexes used, table access methods, and ways to improve queries by adding appropriate indexes.
The document describes simulation and experimentation of pulsed light interacting with multilevel quantum systems. It presents two coupled differential equations modeling the interaction and applies the rotating wave approximation to neglect fast oscillating terms. It also only considers the case of exact resonance between the light frequency and the system energy levels.
The document summarizes four cryptographic protocols:
1. Needham-Schroeder protocol authenticates users (A and B) to each other over a network with the help of a trusted authority (Trent).
2. Kerberos protocol allows a client (A) to authenticate to a server (S) in two steps by first authenticating to the Kerberos server and then to the ticket granting service (TGS) of the target server.
3. Secret sharing protocol partitions a secret key (K) into shares and distributes them among trustees, requiring a minimum number of shares to reconstruct the key and preventing any single trustee from accessing the secret.
4. Zero knowledge proofs allow a
This document summarizes several public key cryptosystems including the Knapsack cryptosystem, RSA cryptosystem, ElGamal cryptosystem, and elliptic curve cryptography applied to ElGamal. For each cryptosystem, it describes the key generation process, encryption, and decryption algorithms. It also discusses security aspects such as the hard computational problems that the cryptosystems rely on like integer factorization and discrete logarithms. Finally, it provides code examples for implementing some of the cryptosystems in C.
This document summarizes several number theory concepts and algorithms including:
1. Mersenne primes which are of the form 2^p - 1 where p is prime. It proves some theorems about their properties.
2. Fermat's Little Theorem and Euler's Theorem which relate to exponents modulo a prime. It includes proofs and an algorithm for computing modular inverses.
3. The Chinese Remainder Theorem and its application to finding solutions to systems of congruences.
4. Polynomial arithmetic over finite fields including finding remainders, GCDs, inverses and doing operations in Fp[x]. It describes using these to construct finite fields.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
3. Hash Indexes
● A hash index is built on a hash table and is useful only for exact lookups that use
every column in the index. For each row, the storage engine computes a hash code
of the indexed columns, which is a small value that will probably differ from the
hash codes computed for other rows with different key values. It stores the hash
codes in the index and stores a pointer to each row in a hash table.
● CREATE TABLE user_info (user_id int not null primary key auto_increment,
username varchar(50), password char(32), KEY USING HASH(username,
password)) ENGINE=MEMORY;
● Suppose the has function is f() i.e. f : (username, password) -> Integer, then our data
will have has values as such for eg. f('john','abc123') = 2789. The index's data
structure will have a pointer from slot 2789 to the row which has username 'john'
and password 'abc123'.
● If the function f() is very selective i.e. For each combination of username and
password it gives a different integer as output, then lookups will be O(1) in constant
time (very very fast). For queries such as SELECT * from user_info where
username='john' and password='abc123', it will not scan the table but compute
f('john','abc123')=2789 and directly pick up the row from slot 2789.
4. Hash Indexes
● ORDER BY queries on Memory engine will not take advantage of hash indexes as
rows are not stored in sorted order.
● Queries such as SELECT * from user_info where username='john'; will not use
hash index because to compute the function f() it needs both username and
password.
● Range queries doesn't use hash indexes because to compute f() it needs exact values
for the parameters.
● If the function f() is not selective, i.e. For more than one combination of username,
password pair it returns the same integer output e.g. f('john','abc123')=2789 and
f('mary','25qwer')=2789 and so on for 5 other pairs then the slot 2789 points to a
linked list of row pointers where each row pointer in the linked list has username,
password pair that gives the same output when f() id applied on it. This case is
termed chaining.
● In case of hash collisions the worst case perormance for a query like SELECT *
from user_info where username='mary' and password='25quer'; can amount to
equivalent of a full table scan if all username, password pairs in the table have the
same hash value.
5. Hash Indexes
● Analysis of hashing with chaining :
1. How long does it take to return the output of the query SELECT * from
user_info where username='johnny' and password='derp123' ?
2. Assuming simple uniform hashing, if there are 'm' slots in the index and a
total of 'n' rows then the expected number of rows each slot points to is a=n/m (the
average length of linked list for each slot is n/m ).
3. For query such as SELECT * from user_info where username='johnny' and
password='derp123' the average number of lookups is Θ(1+a).
Proof : Suppose the username-password combination we are searching is non-
existent then Mysql would compute f('johnny','derp123') = x, then it will search in
the linked list of pointers in slot 'x'. Since it is not there it has to search till the end of
linked list i.e. Average length of linked list = a = Θ(1+a).
If the particular username-password combination is present then the number of
lookups is equal to 1+ #(row pointers before ('johnny','derp123') in the linked list).
For large values of n (number of rows in the table) we can assume that the
expected number of row pointers before ('johnny','derp123') in its linked list is a/2.
Thus average number of lookups = 1+a/2 = Θ(1+a).
6. Hash Indexes
● Hash Indexes for InnoDB engine : The InnoDB storage engine has a special feature
called adaptive hash indexes. When InnoDB notices that some index values are
being accessed very frequently, it builds a hash index for them in memory on top of
B-Tree indexes.
● A 'Good' Hash function f() : Each row is equally likely to hash to any of the 'm' slots
independently of where any other row has hashed to.i.e. f('john','abc123') should be
independent of f('johnny','derp123').
● In InnoDB there is no inbuilt hash function that we can take advantage of for
“explicit” indexing. So we can maintain one column in the table for our hash values.
ALTER TABLE user_info add column hash char(32) key. Then index 'hash'.
● Collision analysis using 16 byte (32 hexadecimal digits) MD5() hash function :
1. MD5() hash lookups are time consuming as the algorithm takes time to
compute the value and then since the value is 32 digit hexadecimal string
comparison also takes time.
2. SELECT * from user_info where username='johnny' and password='derp123'
and hash='690cdca9655043e9d087a1d50cd74e02'; we need the check on username
and password field also so that single row is returned in case of collisions.
7. Hash Indexes
● Method 2 : Using CRC32() as another builtin hash function is a better choice than
MD5() since it results in a 10 digit integer value which can speed up comparisons
effectively.
SELECT * from user_info where username='johnny' and password='derp123' and
hash=3682452828;
● Method 3 : Using column prefixes as hash index. We can use fixed length prefixes
from our username and password values. For e.g. For username 'johnny' and
password 'derp123' we can choose our hash to be (4+3) character long 'johnder'.
1. SELECT * from user_info where username='johnny' and password='derp123'
and hash='johnder';
2. Less comparison overhead compared to indexing the whole username and
password values.
3. Less selectivity. Defining selectivity s1= (# of distinct username-password pairs)/
(# of rows in user_info) and s2=(# of distinct hash values)/(# of rows in user_info).
Choose a length L for our hash values for which s2 ≈ s1, then number of collisions
will be minimized.
8. Hash Indexes
● Method 4 : Using universal class of hash functions. Convert our username and
password strings to integer by summing up their ASCII character values and
assuming the following for them :
1. The ASCII character values for username and passwords lie between 0 and
255.
2. Maximum length of username is 10 and password is 10. Thus the maximum
integer value for username is 255*10 and password is 255*10 adding them gives the
maximum integer value for our key = 5100.
3. Assuming there are 1000 distinct username passwords in our database,
choose a prime p > 5100, p=5101, choose 2 integers 1<= a <= p-1 and 0<= b <= p-
1, let a=19 and b=21.
● Let the sum of the ASCII values of username and password be k. Then our universal
hash function becomes f(k) = ((ak+b) mod p) mod m, where p=5101, m= number of
distinct username-password pairs (1000 in our case), a=19 and b=21.
So f(k)= ((19k+21) mod 5101) mod 1000.
● For username 'johnny' and password 'derp123', k = 106 + 111 + 104 + 110 + 110 +
121 + 100 + 101 + 114 + 112 + 49 + 50 + 51 = 1239. Thus f(1239) = 158. Thus our
hash value for ('johnny','derp123') is 158.
9. Hash Indexes
●
Using universal class of hash functions the probability that Pr(f(k)=f(l), k≠l) <=
1/m. Hence in our case probability that f(k)=f(l) is less than 1/1000 = 0.001.
● Proof:
Let r = (ak+b) mod p and s = (al+b) mod p, then r-s = a(k-l) mod p.
But 1<= a < p and (k-l) < p and p is prime hence r≠s (mod p). Since there are p(p-1)
pairs for (a,b) and since r≠s (mod p) thus there are p(p-1) pairs for (r,s), there is one-
to-one correspondence between (a,b) and (r,s).
Thus if collision occurs it is due to for some r = s (mod m).
For a given value of 0<= s < p, and r≠s , the number of values for which r = s (mod
m) is at most (p-1)/m. Thus the probability that for a particular value of s , r = s
(mod m) is at most ((p-1)/m)/(p-1) = 1/m.
● Thus programmatically computing f(k) for lookups and the using query :
SELECT * from user_info where username='johnny' and password='derp123' and
hash=158; has great performance benefits.
10. B-Tree Indexes
● B-trees are balanced search trees: height = O log(n) for the worst case.
● They were designed to work well on Direct Access secondary storage devices
(magnetic disks).
●
● B-trees (and variants like B+ and B* trees ) are widely used in database systems.
11. B-Tree Indexes
● A B-tree T is a rooted tree (with root root[T]) with properties:
Every node x has four fields:
1. The number of keys currently stored in node x, n[x].
2. The n[x] keys themselves, stored in nondecreasing order:
key1[x] ≤ key2[x] ≤ · · · ≤ keyn[x][x] .
3. leaf[x] = “True” if x is a leaf else “False”
4. n[x] + 1 pointers, c1[x], c2[x], . . . , cn[x]+1[x] to its children.
● The keys keyi[x] separate the ranges of keys stored in each subtree: if k i is any key
stored in the subtree with root ci[x], then:
k1 ≤ key1[x] ≤ k2 ≤ key2[x] ≤ . . . ≤ keyn[x] ≤ kn[x]+1 .
12. B-Tree Indexes
● All leaves have the same height, which is the tree’s height h.
● There are upper on lower bounds on the number of keys on a node. To specify these
bounds we use a fixed integer t ≥ 2, the minimum degree of the B-tree:
lower bound: every node other than root must have at least t − 1 keys i.e. At
least t children.
upper bound: every node can contain at most 2t − 1 keys i.e. every internal node
has at most 2t children.
●
13. B-Tree Indexes
● SELECT * from user_info where firstname='johnny' and lastname='derp' and
dob='1981-08-14'; (InnoDB engine; index on (firstname,lastname,dob));
● Search Algorithm : (x : node pointer to some node in a subtree)
BTree-MySQL-Search (x=null, firstname='', lastname='', dob='')
i=1;
while ( i < n[x] and (firstname,lastname,dob) > keyi[x] )
i = i+1;
if ( i ≤ n[x] and (firstname,lastname,dob) > key i[x] ) then
return keyi[x] -> rows;
else if ( leaf[x] ) then
return null;
else
Disk-Read(ci[x]);
return BTree-MySQL-Search(ci[x], firstname, lastname, dob );
● Number of disk pages accessed by BTree-MySQL-Search Θ(h) = Θ(log t n) where n
is the number of rows in the index.
14. B-Tree Indexes
● INSERT, DELETE and UPDATE queries are much more involved. Let's discuss in
brief about only INSERT.
● INSERT into user_info (firstname,lastname,dob) values ('johnny', 'derp', '1981-08-
14');
● Insert algorithm :
1. Let's assume k= (firstname,lastname,dob). If we find the leaf node x where k
will be inserted.
a. If x is not full, then insert k into x at an appropriate position (in
ascending order of keys ).
b. If x is full then compute the median value of all the keys in x . Then split
the node into 2 nodes about the median. Then k is inserted into one of the splitted
nodes at an appropriate position. The median value is then considered inserting into
the parent node of x and this process is followed recursively. Moving up the tree if
we find that the current root node needs to be split then the root node is split into 2
and our new root node is a single key node with the median value from last split.
15. B-Tree Indexes
● B-Tree insertion demonstration :
● The key is always inserted in a leaf node
● Requires O(h) = O(logt n) disk accesses.
16. B-Tree Indexes
● B+ Trees are B-Trees with the modification that all internal nodes store the keys that
are used in the indexing while the leaf nodes contains both the keys and the rows
corresponding to the key.
● Types of queries that can use a B-Tree index :
1. Match the full value – SELECT * from user_info where firstname='johnny'
and lastname='derp' and dob='1981-08-14';
2. Match a leftmost prefix – SELECT * from user_info where
firstname='johnny';
3. Match a column prefix - SELECT * from user_info where firstname like
'john%';
4. Match a range of values - SELECT * from user_info where firstname
between 'john' and 'johnny';
5. Match one part exactly and match a range on another part – SELECT * from
user_info where firstname='johnny and lastname like 'de%';
6. InnoDB uses B+Tree indexes, so to take advantage of index-only-queries
where rows are returned directly from index, select columns which are indexed -
SELECT firstname, lastname from user_info where firstname like 'john%';
17. B-Tree Indexes
● Types of queries that can't use a B-Tree index :
1. They are not useful if the lookup does not start from the leftmost side of the
indexed columns – SELECT * from user_info where lastname='derp';
SELECT * from user_info where firstname like '%p';
2. You can’t skip columns in the index – SELECT * from user_info where
firstname='johnny' and dob='1981-08-14';
3. The storage engine can’t optimize accesses with any columns to the right of
the first range condition – SELECT * from user_info where firstname="johnny" and
lastname like 'de%' and dob='1981-08-14';
18. Indexing Strategies for High
Performance
● Isolating the Column : “Isolating” the column means it should not be part of an
expression or be inside a function in the query.
SELECT * from user_info where user_id + 1 = 5; or
SELECT * from user_info where TO_DAYS(CURRENT_DATE) -
TO_DAYS(dob) <= 365; don't use indexes with MySQL.
● Prefix Indexes and Index Selectivity : For BLOB and TEXT columns instead of
indexing a very long string , alternative is to index a prefix of the string . But index
selectivity is also be taken care of . Index selectivity is the ratio of the distinct
number of rows (grouped by our indexed field) to the total number of rows. The
prefix length depends on index selectivity.
For e.g. If there are 1000 rows in our user_info table and based on city there are
435 distinct rows grouped by city, then our selectivity is 435/1000 = 0.435, now
assuming that we choose a prefix length of 3, then the number of distinct rows
grouped by city becomes 879 since there are many cities that have same prefix.
Increasing the prefix length will always improve selectivity but choosing an optimal
value (selectivity closest to 0.435 but length not too high) is important. In our case a
prefix length of 7 gives number of distinct rows grouped by city 450. Thus we
choose 7 as prefix length.
ALTER TABLE user_info ADD KEY (city(7));
19. Indexing Strategies for High
Performance
● Choosing a good column order (For multicolumn indexes) :
1. If ORDER BY or GROUP BY is not required then index the columns from
left to right in order of selectivity. i.e. The most selective column should be the
leftmost so that probability of filtering maximizes for the leftmost column. For e.g
the indexing order for the columns country and city should be (city, country)
because more users belong to the same country compared to the same city. i.e.
Selectivity of city is more than country thus filtering on “where city='kolkata' and
country ='india' ” is efficient than “where country='india' and city ='kolkata' ” .
2. In case of ORDER BY or GROUP BY the ORDER BY columns should be
the rightmost in the index after the GROUP BY columns after the normal where
clauses. For e.g “where firstname='johnny' GROUP BY city,country ORDER BY
country” the index order should be (firstname,city,country).
● Clustered Indexes : InnoDB’s clustered indexes actually store a B-Tree index and
the rows together in the same structure.When a table has a clustered index, its rows
are actually stored in the index’s leaf pages. The term “clustered” refers to the fact
that rows with adjacent key values are stored close to each other.
20. Indexing Strategies for High
Performance
● Clustered Indexes : (contd.) InnoDB clusters the data by the primary key. If you
don’t define a primary key, InnoDB will try to use a unique non-nullable index
instead. If there’s no such index, InnoDB will define a hidden primary key for you
and then cluster on that. InnoDB clusters records together only within a page. Pages
with adjacent key values might be distant from each other.
●
21. Indexing Strategies for High
Performance
● Clustered Indexes : (contd.) Example : SELECT * from user_info ORDER BY
username. If our primary key is username then this query's output is very fast
because it returns all the columns from the leaf node of the B-tree index only
without referring the table and also since it is clustered on username hence rows are
stored in a page in alphabetical order of the usernames hence ORDER BY does not
require to do any sort in a single page.
● If clustering on primary key is not desired i.e. If we do not need order by on primary
key and then return almost all the columns, it is better not to define a primary key
derived from any of the column values. For e.g. If we do not require queries as
above then instead of defining primary key on username define primary key to be
some user_id auto_increment because with username primary key there will be lots
of random I/O in case of insertions (since insertions are not in any order of
username) which is inefficient but with auto increment insertions follow sequential
order thus saving random I/O.
● MyISAM engine does not use clustering.
22. Indexing Strategies for High
Performance
● Covering Indexes : An index that contains all the data needed to satisfy a query is
called a covering index. Consider the query :
SELECT firstname, lastname from user_info where firstname='johnny' and lastname
like 'de%'; The query is index covered since all the rows that are returned are part of
the index (firstname, lastname, dob ).
● Index covered queries are very fast since no row lookups (random I/O on disk)
required, instead all rows returned from index.
● Hash, spatial, and full-text indexes don’t use covering indexes, so MySQL can use
only B-Tree indexes to cover queries.
● When you issue a query that is covered by an index (an index-covered query), you’ll
see “Using index” in the Extra column in EXPLAIN.
● Due to the secondary index structure of InnoDB where secondary indexes store
primary keys in their leaf nodes, queries that fetch columns that includes the primary
key column and the secondary indexed columns is also a index covered query. For
e.g SELECT user_id, firstname, lastname from user_info where firstname='johnny'
and lastname like 'de%'; here user_id is not part of the index (firstname, lastname,
dob ) but its a primary key so its index covered also.
23. Full-Text Searching
● Most of the queries you’ll write will probably have WHERE clauses that compare
values for equality, filter out ranges of rows, and so on. However, you might also
need to perform keyword searches, which are based on relevance instead of
comparing values to each other. Full-text search systems are designed for this
purpose.
● Full-Text search is based on finding words (terms) in documents instead of patterns.
● For example we want to find all matching rows in the reviews table for which the
reviews contains some or all words of the phrase “good excellent exciting”.
ALTER TABLE reviews add FULLTEXT KEY(review);
SELECT review, MATCH(review) AGAINST('good excellent exciting') as
relevancy from reviews where MATCH(review) AGAINST('good excellent
exciting');
● Full-Text Searching can be accomplished without indexing also.
● There are two different modes for Full-Text Searching : Natural Language Mode
and Boolean Mode.
● Only MyISAM engine supports Full-Text searching and indexing.
24. Full-Text Searching
● Natural Language Mode of Full-Text Searching : The relevancy of a query with a
particular row in the table is calculated as follows -
1. Compute the weight of each word/term in the fulltext indexed columns in
each row. The weight for each word in a row increases if the number of times it
occurs in one row increases and decreases if the number of rows it occurs in
increases. i.e. to say that if a word in the query exists in few rows then that word
determines how relevant that word is for ordering the search results. For example in
the query “we had an exciting adventure” words such as “we”, “had” and “an” are
pretty common terms in holiday reviews so they exists in more than 75% of rows in
our database but words such as “exciting” and “adventure” are less common and
occur in less than 10% of our database so “naturally we are looking for” rows in the
table which contains words like “exciting” and “adventure” and thus they should be
ranked higher. Infact words such as “we” , “an” , “the” etc. are called stopwords and
they are not even considered while calculating weights.
2. Mathematically the formula for weight of a term ti in given row is given as :
w[ti]= (log(dtf[ti])+1)/sumdtf * U/(1+0.0115*U) * log((N-nf[ti])/nf[ti])
25. Full-Text Searching
● Natural Language Mode of Full-Text Searching : contd.
2. w[ti]= (log(dtf[ti])+1)/sumdtf * U/(1+0.0115*U) * log((N-nf[ti])/nf[ti])
where dtf[ti] : number of times term ti appears in the row.
sumdtf : sum of (log(dtf)+1)'s for all terms in the same row.
U : number of unique terms in the row.
N : Total number of rows.
nf[ti] : number of rows that contain the term ti .
The middle term signifies that if the length of the indexed columns in the row is
shorter than the average length (= number of unique words) then the weight for that
row increases (i.e. “short and sweet” row so as to say).
3. Then the Rank of that row is computed as R = ∑i w[ti]*qnf[ti], where qnf[ti]
is the number of times the term ti occurs in the query. This value is given by
SELECT MATCH() AGAINST() query.
● Structure of the index : The index is a B-Tree structure with 2 levels. In the first
level the nodes store the terms as keys and in the second level for each first level
term, a pointer to the rows that contains the term. This is similar to inverted index.
26. Full-Text Searching
● MATCH AGAINST clause can't be used to regard words from a particular column
as more important than words from other columns. For example, you might want
search results to appear first when the keywords appear in an review's title.
● Alternative solution to give twice the importance to the title of the review than the
review itself :
ALTER TABLE ADD FULLTEXT KEY(title, review);
ALTER TABLE ADD FULLTEXT KEY(title);
SELECT title, review, ROUND(MATCH(title, review) AGAINST('good
excellent exciting'), 3) AS full_rel, ROUND(MATCH(title) AGAINST('good
excellent exciting'), 3) AS title_rel FROM reviews WHERE MATCH(title, review)
AGAINST('good excellent exciting') ORDER BY (2 * MATCH(title)
AGAINST('good excellent exciting')) + MATCH(title, review) AGAINST('good
excellent exciting') DESC;
27. Full-Text Searching
● Boolean Mode of Full-Text Searching : In Boolean searches, the query itself
specifies the relative relevance of each word in a match. When constructing a
Boolean search query, you can use prefixes to modify the relative ranking of each
keyword in the search string.
● Examples :
1. SELECT * from reviews where MATCH(title,review) AGAINST ('good
~bad +adventure' in BOOLEAN MODE); i.e. Rows must contain the word
'adventure' and rows with the word 'good' should be ranked higher and rows with
the word 'bad' should be ranked lower.
2. SELECT * from reviews where MATCH(title,review) AGAINST ('good
-bad +adventure' in BOOLEAN MODE); i.e. Rows must contain the word
'adventure' and rows with the word 'good' should be ranked higher and the rows
should not contain the word 'bad'.
3. SELECT * from reviews where MATCH(title,review) AGAINST ('“good
adventure”' in BOOLEAN MODE); i.e. Rows should contain the phrase “good
adventure”.
28. Full-Text Searching
● Phrase searches tend to be quite slow. The full-text index alone can’t answer such
queries, because it doesn’t record where words are located relative to each other in
the original full-text collection. Consequently, the server actually has to look inside
the rows to do a phrase search.
To execute such a search, the server will find all documents that contain both
“good” and “adventure” It will then fetch the rows from which the documents were
built, and check for the exact phrase in the collection.
● Disadvantages of Full-Text Indexing and Searching :
1. The index doesn’t record the indexed word’s position in the string, so
proximity doesn’t contribute to relevance.
2. MySQL’s full-text indexing performs well when the index fits in memory,
but if the index is not in memory it can be very slow, especially when the fields are
large.
3. Modifying a piece of text with 100 words requires not 1 but up to 100 index
operations.
29. Full-Text Searching
● Disadvantages of Full-Text Indexing and Searching : (contd.)
4. The field length doesn’t usually affect other index types much, but with full-
text indexing, text with 3 words and text with 10,000 words will have performance
profiles that differ by orders of magnitude.
5. If there’s a full-text index and the query has a MATCH AGAINST clause
that can use it, MySQL will use the full-text index to process the query. It will not
compare the full-text index to the other indexes that might be used for the query.
6. The full-text search index can perform only full-text matches. Any other
criteria in the query, such as WHERE clauses, must be applied after MySQL reads
the row from the table.
7. Full-text indexes don’t store the actual text they index. Thus, you can never
use a full-text index as a covering index.
8. Full-text indexes cannot be used for any type of sorting, other than sorting by
relevance in natural-language mode. If you need to sort by something other than
relevance, MySQL will use a filesort.
30. Full-Text Searching
● Disadvantages of Full-Text Indexing and Searching : (contd.)
SELECT * from reviews where MATCH(title, review) AGAINST ('good
exciting adventure') and review_id= 879;
The query will not use the index on review_id since the preference for fulltext
index is higher. So it will look into the fulltext index and filter out all matching
rows then will use the review_id value to filter from WHERE clause.
Solution:
Index the review_id column also in the fulltext index by converting the values
into some string format 'review_id_is_879'.
ALTER TABLE reviews add FULLTEXT KEY(review_id, title, review);
SELECT * from reviews where MATCH(review_id, title, review) AGAINST
('+review_id_is_879 good exciting adventure' in BOOLEAN MODE);
31. References
● Introduction to Algorithms, CLRS, 3rd Edition.
● High Performance MySQL by Baron Schwartz, Peter Zaitsev and Vadim
Tkachenko.
Thank You