SlideShare a Scribd company logo
1 of 7
Glossary of Terms
Term Definition Significance
10GbE (Ethernet)
Networking
Network cabling capable of supporting the
transmission of data at a rate of up to 10
gigabits (10bn bits) per second
As Kognitio unifies the resources of multiple nodes
and randomly distributes the data, heavy use is made
of networking in the execution of queries so the
higher the network bandwidth the better with (dual)
10GbE, as opposed to the more commonly available
1GbE, being our preferred standard
ACID ACID (Atomicity, Consistency, Isolation,
Durability) is a set of properties that guarantee a
database transaction is processed reliably. For
example, a transfer of funds from one bank
account to another
Kognitio is ACID compliant. As a result, even though it
has been designed to carry out analytical workloads,
it can also carry out transactional workloads
Amazon Web
Services (AWS)
A provider of public cloud infrastructure as a
service (IaaS), enabling the provisioning and
hardware management of appliances on-
demand based on an hourly charge
Enables applications to be considered that were not
previously possible by increasing flexibility and
considerably reducing short term costs and need for
capital expenditure
Analytical Platform A database platform that is specifically designed
and built to manage analytical workloads rather
than transactional workloads
Kognitio provides a scalable analytical platform to
support complex analytical applications
Analytical
Workloads
An analytical, as opposed to transactional,
workload is one associated with the reporting
and analysis of information. Typically analytical
workloads will involve a relatively small number
(compared with transactional workloads) of
querying tasks on all or large subsets of the
entire data set. As such, query performance is
essential
Kognitio has been designed to support analytical
rather than transactional workloads
Blade Servers A small form factor of server that enables high
density compute power. Units do not carry their
own power supply, cooling, networking, etc. so
cannot be run independently of blade
enclosures
Kognitio provides high performance computing and
requires a number of servers (scale-out) to achieve
this. As the performance is achieved through holding
data in RAM rather than on disk, compute density is
essential
Blade Enclosures Supplies the power, cooling, networking, etc. for
blade servers. Can contain several blades to
provide high density compute power
Kognitio benefits from the compute density offered by
the blade server form factor.
Cores Each core is an independent processing unit
(CPU). CPU chips now include multiple cores that
are capable of processing multiple tasks in
parallel
Multiple cores facilitate the parallel processing of
data which is a key driver of Kognitio’s performance.
Kognitio can drive cores at 100% as part of providing
linear scalability
CPU ’Central Processing Unit’ – the area of the
computer that executes instructions and
processes
CPUs/cores are the driver of Kognitio’s performance
capabilities
Cube The name given to a multidimensional (hence
‘cube’) structure built within an OLAP engine
Cubes can be designed and published, without
building, within the MDX designer associated with
Kognitio
Data Warehouse A central repository of information, created by
integrating data from one or more source
systems, that is used to support reporting and
Kognitio’s target markets are closely associated with
data warehousing
Glossary of Terms
analysis within an organisation
Database Appliance A group of servers/nodes that are combined to
form a pre-built and pre-configured MPP
database environment that can be used ‘out of
the box’
Appliances have an advantage over software as they
can be brought into service quickly, ensuring a faster
return on investment. Kognitio can be delivered as an
appliance
Dimension A group of related attributes, typically defined in
one or more hierarchies, that enable the filtering
and grouping of associated measures in a data
warehouse
Data warehousing is a key application within
Kognitio’s target markets
Disk Data storage device – the common format for
storing data for processing. Typically a Hard Disk
Drive (HDD) but may be a Solid State Drive (SSD)
Provides a persistence layer for data held within or
associated with a Kognitio instance. As Kognitio
usually provides multiple disks in an appliance, RAID
methods can be used to improve resilience
Elastic Block Store
(EBS)
An area of persistent block storage available on
AWS infrastructure that can be attached to a
server. Typically used in database applications
Provides the facility to persist a Kognitio platform thus
enabling instances to be stopped and restarted which
considerably reduces on-demand infrastructure costs
ETL (Extract
Transform and Load)
A process for taking data from operational
systems, transforming it into information by
applying pre-defined processes to provide
context and loading it into, typically, a data
warehouse environment. A class of tools, such as
Informatica, has grown up to provide
sophisticated capabilities to carry out this
functionality.
This is a standard process within the data
warehousing space, an area that is closely associated
with Kognitio’s target markets. Tools such as
Informatica (a Kognitio strategic partner) work
effectively with Kognitio.
External Scripting A Kognitio version 8 capability that enables any
code capable of running under Linux to be
executed in parallel within a SQL framework on
the Kognitio Analytical Platform. Examples
include R, Python and Perl.
Enables very high performance execution of complex
analytical processes by removing the bottlenecks
traditionally associated with this workload, such as
moving the data to a single application server for
processing. Note that some processes cannot be
parallelized and, as such, will not be accelerated
External Tables A Kognitio version 8 capability that enables a
table to be mapped onto an external data source
before pulling the data into RAM. Each data
source requires a connector to be defined, with
initial connectors provided for Hadoop, S3 and
other Kognitio instances
Provides a very flexible and powerful way to access
external data sources without the need for ETL tools
or scripting
Flash Memory/SSD
(Solid State Disks)
SSDs use flash memory to provide relatively
faster access (than Hard Disk Drives) to
persistent data without using moving parts
(spinning disks/heads). Unlike RAM, data is
preserved after power loss. Access is still
considerably slower than RAM. As such, SSDs are
NOT a direct replacement for RAM
Kognitio’s disk based environment can benefit from
the provision of SSDs. However, as SSDs are generally
considerably more expensive than HDDs, it is
recommended that systems employ RAM rather than
SSDs as this will provide significantly greater
performance benefits. Disk based competitors
generally benefit more from the inclusion of SSDs
Glossary of Terms
Hyperthreading Intel’s technology solution for increasing the
parallelization capabilities of CPU cores. Each
hyperthread is ‘seen’, by operating systems that
support hyperthreading, as a separate core,
enabling the workload to be shared between
them
Kognitio can effectively utilise hyperthreading to
increase the parallelization of processing, thus
enhancing performance and throughput
In-memory database A database specifically designed to operate
within RAM rather than one that is designed for
disk and utilises RAM to process data retrieved
from disk blocks (caching)
Kognitio has its roots as an in-memory database and
gets its performance by storing data in RAM. This has
advantages over caching in the fact that, if specific
data values or query results are not available within
the cache, there will be a ‘cache miss’ which will result
in further (expensive) disk reads to acquire the data
JDBC Java DataBase Connectivity is a standard API for
accessing relational database management
systems (RDBMS) for the Java programming
language
Kognitio supports the JDBC standard via a JDBC to
ODBC bridge provided by Simba Technologies
Latency Time delay between initiating a request and any
actions associated with the request being
completed. Typically this will be the time taken
for a query to run. However, it could also be
associated with disk access times, load times,
network transmission and time to insight
Kognitio holds data in RAM to make sure that it is as
close to the CPUs as possible, thus reducing the
latency associated with moving data and reducing
query times. In many use cases, it may not be
necessary to write to disk, thus reducing latency
associated with data loading. Time to insight is also
key to the value proposition associated with the
Kognitio analytical platform
Linear scalability The capability to improve performance in line
with system size. For example, doubling the
power of a system will result in the same query
time on twice the volume of data (NOTE: this is
not the same as doubling the power results in
half the query time on the same data)
As Kognitio has focused on reducing bottlenecks, it
provides linear scalability for both query and bulk load
performance (insert rather than update – referential
integrity has a significant impact)
Massively Parallel
Processing (MPP)
Parallel processing on a large scale, typically
achieved through combining the processing
capabilities of a number of nodes
Kognitio combines the compute power of multiple
nodes and CPUs to provide MPP capabilities to
analytical workloads
MDX
(MultiDimensional
eXpressions)
MDX is a language developed by Microsoft to
enable querying of multidimensional data stores
(OLAP) in much the same way that Structured
Query Language (SQL) is used for relational data
stores.
MDX is a supported language for querying the
Kognitio Analytical Platform. It requires that a model
is in place that defines the relationships between
dimension and fact tables and a provider that
converts the MDX code into SQL. A tool to design and
build the model is available to Kognitio
Measure In data warehousing, a measure is a property
that can be aggregated (sum, count, average,
etc.). For example, the number of units for a
product in a retail basket is a measure.
Data warehousing is a key application in Kognitio’s
markets
Memory (RAM) Random Access Memory (RAM) is referred to
simply as ‘memory’ by Kognitio and is a form of
memory that provides random access to data.
Data does not persist in RAM when power is lost
Kognitio is an in-memory (RAM) analytical platform.
As such Kognitio gains its performance advantage over
disk based environments when tables or images are
stored in RAM as the data is kept close to the CPUs to
reduce query and loading latency
Glossary of Terms
Node A modular unit of a MPP architecture = a server
(physical or virtual)
Nodes form the basic units for constructing a Kognitio
MPP instance
NoSQL Databases Originally indicating that SQL was not used to
query the environment, this has since been
modified to become “Not Only SQL”. NoSQL
databases were designed to handle Big Data
‘volumes, velocities and varieties’ and, as such,
tend to provide less rigorous integrity and
metadata handling than relational database
management systems. Built for scale out, they
are schema less and ‘eventually consistent’
(BASE) rather than ACID compliant.
Kognitio is NOT a NoSQL database but is incorporating
additional scripting languages to provide NoSQL
capabilities. For business intelligence and ‘repeatable’
analytics on a defined dataset, a schema is considered
to be a positive asset
ODBC Open DataBase Connectivity is a standard API for
accessing relational database management
systems (RDBMS)
ODBC is the standard approach for connecting to a
Kognitio instance. The majority of BI tools will support
generic ODBC connectivity and, hence, will likely be
able to connect to a Kognitio instance. The exceptions
tend to be OLAP clients, which will typically connect
via ODBC or XML/A, or tools that utilise JDBC or REST
interfaces
ODBO OLE DB for OLAP is a Microsoft published
standard mechanism for connecting to OLAP
data sources via the MDX language. OLAP
sources and clients may only adopt part of the
standard which can lead to connectivity and
processing issues. ODBO is a two tier
architecture (client and server)
Kognitio, via its partner Simba, has an MDX provider
interface that can support ODBO connectivity.
However, note that not all OLAP clients may
necessarily be supported owing to the variability with
which tools have incorporated the standard.
OLAP OnLine Analytical Processing is a representation
of a business intelligence model suitable for
consumption by non-technical users. Typically
data would be stored in ‘cubes’ that contain
measures and hierarchical dimensions which are
logically grouped in the manner that businesses
reference them (e.g. a product hierarchy
consisting of product group, sub-group, family,
sub-family and product).
Traditional cubes would be pre-calculated at
intervals with aggregated measures stored at the
various levels and combinations of the
dimensions to facilitate very fast access. The
cubes would be accessed by purpose built clients
and, typically, by the specially defined MDX
language
Kognitio provides the facility to view the Analytical
Platform via an OLAP model utilising connectivity
software provided by Simba Technologies. Rather
than pre-calculating OLAP cubes Kognitio utilises the
performance characteristics of the platform to
provide virtual cubes which eliminates the lengthy
build times associated with OLAP
OLTP (OnLine
Transaction
Processing)
A class of system designed to manage
transaction oriented workloads. An OLTP
database will be specifically designed to manage
data entered, produced or processed by a
transactional system and, hence, is designed for
the rapid insertion and updating of records
within a table
Whilst Kognitio can support OLTP associated
workloads, it was designed for analytical workloads
and, hence, is suboptimal for OLTP environments
Glossary of Terms
Parallel Processing The simultaneous use of more than one CPU or
core to execute a program. Operations that can
be performed in parallel will execute faster
within a parallel computing framework
(potentially proportionate to the number of
cores/CPUs available). The overall effectiveness
of the parallelism may be limited by tasks that
are executed serially
Kognitio has a strong parallel architecture and
achieves its performance through parallelism across
multiple nodes, multiple CPUs and associated cores.
This enables Kognitio to provide linear scalability in
line with increasing memory size and core counts
Persistence Layer An area provided to ensure that data is
maintained when a server/appliance is powered
down, typically hard disk based.
Data in RAM does not persist when the hardware is
powered down so if data is required to persist it
should be stored within this layer. For physical devices
this will typically be local disk based. However, for
AWS based instances this has to be managed in a
different way as local storage is ephemeral, meaning
that the disk drives are wiped when a server is
terminated. EBS or S3 storage are typically used to
provide persistence in AWS
Private Cloud Provision of non-publicly available infrastructure
on-demand – see public cloud. Private clouds
typically provide additional certified standards
compared with public clouds
Kognitio provides its own infrastructure to clients,
which is referred to as a ‘private cloud’. Provisioning is
done on a term basis rather than on-demand but is
maintained by Kognitio or its partners offsite for
customer’s use rather than on-premise. Provides a
facility to customers to get environments up and
running quickly without up-front capital expenditure
Public Cloud The publicly available provisioning of shared
computing infrastructure. Typically this is
achieved through virtualization and is generally
provided on-demand with no upfront capital
costs
Enables Kognitio to provide access to a pre-configured
appliance on-demand rather than in days (private
cloud) or weeks/months (on-premise appliance).
Kognitio uses Amazon Web Services (AWS) to provide
this facility but, in principle, any provider could be
used
R language R is software and its associated syntax language
for providing statistical computation and
graphics. It is open source and has grown to
become a standard for statistical processing with
particularly high penetration in the academic
world and, increasingly, the data science
community
Kognitio has recently added support for the R
language via the external scripting capability in v8
RAID Redundant Array of Inexpensive/Independent
Disks is a storage mechanism to combine
multiple disks into a single logical unit. Data is
distributed across the disks for the purposes of
improved performance or resilience. There are
several levels of RAID available which provide
different performance and resilience
characteristics.
A Kognitio appliance uses RAID 1 (mirroring) to ensure
that the appliance does not lose data should a node
become unavailable.
Racks Physical frameworks for holding an array of
servers or blade enclosures specifically designed
to be mounted within the framework
Kognitio appliances utilise racks
Glossary of Terms
Rackmounts Independent, fully self-contained servers. These
servers are generally larger than blades and can
have more RAM, CPU, Disk, etc. Whilst they are
typically housed in a rack, rackmounts provide
flexibility over blades in that a limited number
(up to three practically) can be stacked
independently (with switching) to form an
appliance without the need for rack
infrastructure
Kognitio appliances can be based on rackmounts as
well as blade servers. For certain applications,
rackmounts can provide cost advantages over blade
servers (e.g. small appliances up to 768Gb RAM)
RAM Only
Temporary Tables
(ROTT)
A table in Kognitio RAM with no associated
storage (protection) in the persistence layer.
This means that, whilst the structure is
persistent, the data is ephemeral
ROTTs are used for non-persistent workloads. For
example, they provide the highest potential load
speeds for data that needs to be processed before it is
persisted. The alternative, tables and table images,
would involve writing to disk with the resultant delay.
Failure of an appliance will result in loss of data held
in ROTTs
Referential Integrity This is the process of ensuring that the data
entered in a column is valid. For example, in a
relational table, a column may be specified as a
foreign key (i.e. the data must exist in another
table) in which case, at data load time, this
constraint will be checked before the data is
entered. Failure of the constraint will result in
the data not being entered
Kognitio is fully ACID compliant and supports
referential integrity. However, tables need to be in
RAM to perform this task and the process has a
severe impact on load performance since it results in a
full table scan for each referential integrity check.
Careful consideration needs to be given to application
design implications
Scale-up To increase the size of a server through the
addition of new resources (CPU/memory)
Many databases can only utilise single servers, so the
ability to incorporate greater resources is necessary
for them to address larger data sets. However, there
are cost implications to scaling-up (e.g. larger memory
DIMMS tend to be considerably more expensive) and
limitations to the data set sizes that can be addressed.
Kognitio can fully utilise the resources available in a
scaled-up environment
Scale-out To increase the size of an appliance through the
addition of more nodes
Whilst utilising scaling-up to increase data sizes
addressable by databases is common, it is less
common to be able to do this by scaling-out. Kognitio
addresses larger data sizes through scaling-out. It can
often involve less capital outlay to have several
smaller nodes than one very large server and the size
limitations of a single server are removed
S3 (Simple Storage
Service)
A cost effective, secure and highly available file
storage area available on AWS cloud
infrastructure.
Provides the facility to stage data files ready to load
into a Kognitio Analytical Platform. Also provides an
environment to store readily available backups and
associated files. Kognitio, in v8, has a connector that
can map external tables onto S3 and load the entire
file into RAM
SQL (Structured
Query Language)
SQL is a language designed to manage and query
data held in a relational data store
SQL is the standard used for querying the Kognitio
Analytical Platform.
Glossary of Terms
Switch A network switch enables the linking of multiple
network devices
Kognitio appliances require the cooperative
processing of multiple nodes. As such, switches are
required to facilitate the flow of data/message
passing between nodes. Note: for appliances
involving two nodes, no switching is required as the
nodes are linked peer to peer.
Table Image A Kognitio table that is simultaneously available
in RAM and on disk. The table may be
completely or partially (only selected columns or
rows) represented in RAM
Table images enable both performant queries and
persistence
Time to Insight The time taken from the point at which the data
of interest is generated in an operational system
to the point at which it has been analysed. This
involves several aspects:
 Volume of data
 Velocity of data
 Network speed
 Need to move data
 Load speed
 Query speed
Kognitio has the ability to ingest and query data very
quickly (not just query). As such, Kognitio’s time to
insight is considerably lower than many other
competitive products such as those which rely on
accelerative structures (OLAP, indexes, columnar) to
provide acceptable query performance (as this
impacts on load speed)
Transactional
Workloads
A transactional, as opposed to analytical,
workload is one that involves a large number
(compared to analytical workloads) of small
processes that may involve locating, inserting,
updating or deleting rather than querying data.
Transaction speed and referential integrity are
critical to this workload
Whilst Kognitio can support transactional workloads,
it has been designed to manage analytical workloads.
As such, for transactional environments, it is highly
likely that OLTP databases will more appropriately
fulfil the requirement
View Image An in-memory instantiation (copy of results) of a
view in Kognitio. At the point of instantiation,
processing (such as joins, groupbys, etc.)
associated with the view is undertaken and the
results physically stored in RAM
View images considerably enhance the performance
of queries where the views are used repeatedly as the
processing in the view only needs to be carried out
once. Allows different representations of common
underlying data
XML/A XML for Analysis is a published standard
mechanism for connecting to analytical data
sources such as OLAP (via the MDX language)
and data mining. XML/A is a three tier
architecture (client, mid-tier and server) enabling
the caching of results to be incorporated which
can considerably increase the speed of satisfying
common user community queries
Kognitio, via its Simba Technologies developed MDX
provider, can support XML/A connectivity to OLAP
objects. However, note that not all OLAP clients may
necessarily be supported owing to the variability with
which tools have incorporated the standard.
Kognitio’s implementation incorporates a caching tier
that can enhance query and concurrency
performance.

More Related Content

What's hot

Netezza vs teradata
Netezza vs teradataNetezza vs teradata
Netezza vs teradataAsis Mohanty
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataAsis Mohanty
 
Netezza Deep Dives
Netezza Deep DivesNetezza Deep Dives
Netezza Deep DivesRush Shah
 
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...Alluxio, Inc.
 
SSDs Deliver More at the Point-of-Processing
SSDs Deliver More at the Point-of-ProcessingSSDs Deliver More at the Point-of-Processing
SSDs Deliver More at the Point-of-ProcessingSamsung Business USA
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFSDataWorks Summit
 
The Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCThe Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCIntel IT Center
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsDavid Portnoy
 
Hitachi overview-brochure-hus-hnas-family
Hitachi overview-brochure-hus-hnas-familyHitachi overview-brochure-hus-hnas-family
Hitachi overview-brochure-hus-hnas-familyHitachi Vantara
 
Hadoop and Netezza - Co-existence or Competition?
Hadoop and Netezza - Co-existence or Competition?Hadoop and Netezza - Co-existence or Competition?
Hadoop and Netezza - Co-existence or Competition?Krishnan Parasuraman
 
Demartek Lenovo Storage S3200 MS Exchange Evaluation_2016-01
Demartek Lenovo Storage S3200 MS Exchange Evaluation_2016-01Demartek Lenovo Storage S3200 MS Exchange Evaluation_2016-01
Demartek Lenovo Storage S3200 MS Exchange Evaluation_2016-01Lenovo Data Center
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Ed Kohlwey
 
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...inside-BigData.com
 
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...Principled Technologies
 
Blazing Fast Lustre Storage
Blazing Fast Lustre StorageBlazing Fast Lustre Storage
Blazing Fast Lustre StorageIntel IT Center
 
Make sense of important data faster with AWS EC2 M6i instances
Make sense of important data faster with AWS EC2 M6i instancesMake sense of important data faster with AWS EC2 M6i instances
Make sense of important data faster with AWS EC2 M6i instancesPrincipled Technologies
 

What's hot (20)

Netezza vs teradata
Netezza vs teradataNetezza vs teradata
Netezza vs teradata
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
 
Netezza Deep Dives
Netezza Deep DivesNetezza Deep Dives
Netezza Deep Dives
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
 
SSDs Deliver More at the Point-of-Processing
SSDs Deliver More at the Point-of-ProcessingSSDs Deliver More at the Point-of-Processing
SSDs Deliver More at the Point-of-Processing
 
Netezza All labs
Netezza All labsNetezza All labs
Netezza All labs
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
 
The Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPCThe Importance of Fast, Scalable Storage for Today’s HPC
The Importance of Fast, Scalable Storage for Today’s HPC
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 
Netezza pure data
Netezza pure dataNetezza pure data
Netezza pure data
 
Hitachi overview-brochure-hus-hnas-family
Hitachi overview-brochure-hus-hnas-familyHitachi overview-brochure-hus-hnas-family
Hitachi overview-brochure-hus-hnas-family
 
Hadoop and Netezza - Co-existence or Competition?
Hadoop and Netezza - Co-existence or Competition?Hadoop and Netezza - Co-existence or Competition?
Hadoop and Netezza - Co-existence or Competition?
 
Demartek Lenovo Storage S3200 MS Exchange Evaluation_2016-01
Demartek Lenovo Storage S3200 MS Exchange Evaluation_2016-01Demartek Lenovo Storage S3200 MS Exchange Evaluation_2016-01
Demartek Lenovo Storage S3200 MS Exchange Evaluation_2016-01
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
 
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
 
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
Scalability: Lenovo ThinkServer RD540 system and Lenovo ThinkServer SA120 sto...
 
Blazing Fast Lustre Storage
Blazing Fast Lustre StorageBlazing Fast Lustre Storage
Blazing Fast Lustre Storage
 
Make sense of important data faster with AWS EC2 M6i instances
Make sense of important data faster with AWS EC2 M6i instancesMake sense of important data faster with AWS EC2 M6i instances
Make sense of important data faster with AWS EC2 M6i instances
 
IBM Netezza
IBM NetezzaIBM Netezza
IBM Netezza
 

Viewers also liked

Alfabetización informática
Alfabetización informáticaAlfabetización informática
Alfabetización informática98vicmoviglia
 
Ab35 competition terms and conditions
Ab35 competition terms and conditionsAb35 competition terms and conditions
Ab35 competition terms and conditionsFreek Clinckemaillie
 
Manualwindows8 130523085145-phpapp01
Manualwindows8 130523085145-phpapp01Manualwindows8 130523085145-phpapp01
Manualwindows8 130523085145-phpapp01JuirleyLopezZambrano
 
Debugging IBM Connections for the Impatient Admin - Social Connections VII
Debugging IBM Connections for the Impatient Admin - Social Connections VIIDebugging IBM Connections for the Impatient Admin - Social Connections VII
Debugging IBM Connections for the Impatient Admin - Social Connections VIIMartin Leyrer
 
A Case Study: Pizza Hut
A Case Study: Pizza HutA Case Study: Pizza Hut
A Case Study: Pizza HutIggi Vargas
 
Parker Kittiwake ThrusterSCAN Brochure
Parker Kittiwake ThrusterSCAN BrochureParker Kittiwake ThrusterSCAN Brochure
Parker Kittiwake ThrusterSCAN BrochureParker Kittiwake
 
Seguridad más inteligente y analítica
Seguridad más inteligente y analíticaSeguridad más inteligente y analítica
Seguridad más inteligente y analíticanoticiascac
 

Viewers also liked (8)

Practicadrive
PracticadrivePracticadrive
Practicadrive
 
Alfabetización informática
Alfabetización informáticaAlfabetización informática
Alfabetización informática
 
Ab35 competition terms and conditions
Ab35 competition terms and conditionsAb35 competition terms and conditions
Ab35 competition terms and conditions
 
Manualwindows8 130523085145-phpapp01
Manualwindows8 130523085145-phpapp01Manualwindows8 130523085145-phpapp01
Manualwindows8 130523085145-phpapp01
 
Debugging IBM Connections for the Impatient Admin - Social Connections VII
Debugging IBM Connections for the Impatient Admin - Social Connections VIIDebugging IBM Connections for the Impatient Admin - Social Connections VII
Debugging IBM Connections for the Impatient Admin - Social Connections VII
 
A Case Study: Pizza Hut
A Case Study: Pizza HutA Case Study: Pizza Hut
A Case Study: Pizza Hut
 
Parker Kittiwake ThrusterSCAN Brochure
Parker Kittiwake ThrusterSCAN BrochureParker Kittiwake ThrusterSCAN Brochure
Parker Kittiwake ThrusterSCAN Brochure
 
Seguridad más inteligente y analítica
Seguridad más inteligente y analíticaSeguridad más inteligente y analítica
Seguridad más inteligente y analítica
 

Similar to Big Data Glossary of terms

EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperDavid Walker
 
Oracle 10g rac_overview
Oracle 10g rac_overviewOracle 10g rac_overview
Oracle 10g rac_overviewRobel Parvini
 
How to choose a server for your data center's needs
How to choose a server for your data center's needsHow to choose a server for your data center's needs
How to choose a server for your data center's needsIT Tech
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsshanker_uma
 
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaTechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaOpenNebula Project
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfpbonillo1
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...Niraj Tolia
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Cloud Computing Ambiance using Secluded Access Control Method
Cloud Computing Ambiance using Secluded Access Control MethodCloud Computing Ambiance using Secluded Access Control Method
Cloud Computing Ambiance using Secluded Access Control MethodIRJET Journal
 
Using preferred read groups in oracle asm michael ault
Using preferred read groups in oracle asm michael aultUsing preferred read groups in oracle asm michael ault
Using preferred read groups in oracle asm michael aultLouis liu
 
Future Trends in IT Storage
Future Trends in IT StorageFuture Trends in IT Storage
Future Trends in IT StorageTony Pearson
 
Application Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster InterconnectsApplication Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster InterconnectsIT Brand Pulse
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computersshopnil786
 
I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfanil0878
 
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...In-Memory Computing Summit
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsDirecti Group
 

Similar to Big Data Glossary of terms (20)

EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
Oracle 10g rac_overview
Oracle 10g rac_overviewOracle 10g rac_overview
Oracle 10g rac_overview
 
How to choose a server for your data center's needs
How to choose a server for your data center's needsHow to choose a server for your data center's needs
How to choose a server for your data center's needs
 
Challenges in Managing IT Infrastructure
Challenges in Managing IT InfrastructureChallenges in Managing IT Infrastructure
Challenges in Managing IT Infrastructure
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
 
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaTechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdf
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
 
Exadata
ExadataExadata
Exadata
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Cloud Computing Ambiance using Secluded Access Control Method
Cloud Computing Ambiance using Secluded Access Control MethodCloud Computing Ambiance using Secluded Access Control Method
Cloud Computing Ambiance using Secluded Access Control Method
 
Using preferred read groups in oracle asm michael ault
Using preferred read groups in oracle asm michael aultUsing preferred read groups in oracle asm michael ault
Using preferred read groups in oracle asm michael ault
 
Future Trends in IT Storage
Future Trends in IT StorageFuture Trends in IT Storage
Future Trends in IT Storage
 
Application Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster InterconnectsApplication Report: Big Data - Big Cluster Interconnects
Application Report: Big Data - Big Cluster Interconnects
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdf
 
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 

Big Data Glossary of terms

  • 1. Glossary of Terms Term Definition Significance 10GbE (Ethernet) Networking Network cabling capable of supporting the transmission of data at a rate of up to 10 gigabits (10bn bits) per second As Kognitio unifies the resources of multiple nodes and randomly distributes the data, heavy use is made of networking in the execution of queries so the higher the network bandwidth the better with (dual) 10GbE, as opposed to the more commonly available 1GbE, being our preferred standard ACID ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee a database transaction is processed reliably. For example, a transfer of funds from one bank account to another Kognitio is ACID compliant. As a result, even though it has been designed to carry out analytical workloads, it can also carry out transactional workloads Amazon Web Services (AWS) A provider of public cloud infrastructure as a service (IaaS), enabling the provisioning and hardware management of appliances on- demand based on an hourly charge Enables applications to be considered that were not previously possible by increasing flexibility and considerably reducing short term costs and need for capital expenditure Analytical Platform A database platform that is specifically designed and built to manage analytical workloads rather than transactional workloads Kognitio provides a scalable analytical platform to support complex analytical applications Analytical Workloads An analytical, as opposed to transactional, workload is one associated with the reporting and analysis of information. Typically analytical workloads will involve a relatively small number (compared with transactional workloads) of querying tasks on all or large subsets of the entire data set. As such, query performance is essential Kognitio has been designed to support analytical rather than transactional workloads Blade Servers A small form factor of server that enables high density compute power. Units do not carry their own power supply, cooling, networking, etc. so cannot be run independently of blade enclosures Kognitio provides high performance computing and requires a number of servers (scale-out) to achieve this. As the performance is achieved through holding data in RAM rather than on disk, compute density is essential Blade Enclosures Supplies the power, cooling, networking, etc. for blade servers. Can contain several blades to provide high density compute power Kognitio benefits from the compute density offered by the blade server form factor. Cores Each core is an independent processing unit (CPU). CPU chips now include multiple cores that are capable of processing multiple tasks in parallel Multiple cores facilitate the parallel processing of data which is a key driver of Kognitio’s performance. Kognitio can drive cores at 100% as part of providing linear scalability CPU ’Central Processing Unit’ – the area of the computer that executes instructions and processes CPUs/cores are the driver of Kognitio’s performance capabilities Cube The name given to a multidimensional (hence ‘cube’) structure built within an OLAP engine Cubes can be designed and published, without building, within the MDX designer associated with Kognitio Data Warehouse A central repository of information, created by integrating data from one or more source systems, that is used to support reporting and Kognitio’s target markets are closely associated with data warehousing
  • 2. Glossary of Terms analysis within an organisation Database Appliance A group of servers/nodes that are combined to form a pre-built and pre-configured MPP database environment that can be used ‘out of the box’ Appliances have an advantage over software as they can be brought into service quickly, ensuring a faster return on investment. Kognitio can be delivered as an appliance Dimension A group of related attributes, typically defined in one or more hierarchies, that enable the filtering and grouping of associated measures in a data warehouse Data warehousing is a key application within Kognitio’s target markets Disk Data storage device – the common format for storing data for processing. Typically a Hard Disk Drive (HDD) but may be a Solid State Drive (SSD) Provides a persistence layer for data held within or associated with a Kognitio instance. As Kognitio usually provides multiple disks in an appliance, RAID methods can be used to improve resilience Elastic Block Store (EBS) An area of persistent block storage available on AWS infrastructure that can be attached to a server. Typically used in database applications Provides the facility to persist a Kognitio platform thus enabling instances to be stopped and restarted which considerably reduces on-demand infrastructure costs ETL (Extract Transform and Load) A process for taking data from operational systems, transforming it into information by applying pre-defined processes to provide context and loading it into, typically, a data warehouse environment. A class of tools, such as Informatica, has grown up to provide sophisticated capabilities to carry out this functionality. This is a standard process within the data warehousing space, an area that is closely associated with Kognitio’s target markets. Tools such as Informatica (a Kognitio strategic partner) work effectively with Kognitio. External Scripting A Kognitio version 8 capability that enables any code capable of running under Linux to be executed in parallel within a SQL framework on the Kognitio Analytical Platform. Examples include R, Python and Perl. Enables very high performance execution of complex analytical processes by removing the bottlenecks traditionally associated with this workload, such as moving the data to a single application server for processing. Note that some processes cannot be parallelized and, as such, will not be accelerated External Tables A Kognitio version 8 capability that enables a table to be mapped onto an external data source before pulling the data into RAM. Each data source requires a connector to be defined, with initial connectors provided for Hadoop, S3 and other Kognitio instances Provides a very flexible and powerful way to access external data sources without the need for ETL tools or scripting Flash Memory/SSD (Solid State Disks) SSDs use flash memory to provide relatively faster access (than Hard Disk Drives) to persistent data without using moving parts (spinning disks/heads). Unlike RAM, data is preserved after power loss. Access is still considerably slower than RAM. As such, SSDs are NOT a direct replacement for RAM Kognitio’s disk based environment can benefit from the provision of SSDs. However, as SSDs are generally considerably more expensive than HDDs, it is recommended that systems employ RAM rather than SSDs as this will provide significantly greater performance benefits. Disk based competitors generally benefit more from the inclusion of SSDs
  • 3. Glossary of Terms Hyperthreading Intel’s technology solution for increasing the parallelization capabilities of CPU cores. Each hyperthread is ‘seen’, by operating systems that support hyperthreading, as a separate core, enabling the workload to be shared between them Kognitio can effectively utilise hyperthreading to increase the parallelization of processing, thus enhancing performance and throughput In-memory database A database specifically designed to operate within RAM rather than one that is designed for disk and utilises RAM to process data retrieved from disk blocks (caching) Kognitio has its roots as an in-memory database and gets its performance by storing data in RAM. This has advantages over caching in the fact that, if specific data values or query results are not available within the cache, there will be a ‘cache miss’ which will result in further (expensive) disk reads to acquire the data JDBC Java DataBase Connectivity is a standard API for accessing relational database management systems (RDBMS) for the Java programming language Kognitio supports the JDBC standard via a JDBC to ODBC bridge provided by Simba Technologies Latency Time delay between initiating a request and any actions associated with the request being completed. Typically this will be the time taken for a query to run. However, it could also be associated with disk access times, load times, network transmission and time to insight Kognitio holds data in RAM to make sure that it is as close to the CPUs as possible, thus reducing the latency associated with moving data and reducing query times. In many use cases, it may not be necessary to write to disk, thus reducing latency associated with data loading. Time to insight is also key to the value proposition associated with the Kognitio analytical platform Linear scalability The capability to improve performance in line with system size. For example, doubling the power of a system will result in the same query time on twice the volume of data (NOTE: this is not the same as doubling the power results in half the query time on the same data) As Kognitio has focused on reducing bottlenecks, it provides linear scalability for both query and bulk load performance (insert rather than update – referential integrity has a significant impact) Massively Parallel Processing (MPP) Parallel processing on a large scale, typically achieved through combining the processing capabilities of a number of nodes Kognitio combines the compute power of multiple nodes and CPUs to provide MPP capabilities to analytical workloads MDX (MultiDimensional eXpressions) MDX is a language developed by Microsoft to enable querying of multidimensional data stores (OLAP) in much the same way that Structured Query Language (SQL) is used for relational data stores. MDX is a supported language for querying the Kognitio Analytical Platform. It requires that a model is in place that defines the relationships between dimension and fact tables and a provider that converts the MDX code into SQL. A tool to design and build the model is available to Kognitio Measure In data warehousing, a measure is a property that can be aggregated (sum, count, average, etc.). For example, the number of units for a product in a retail basket is a measure. Data warehousing is a key application in Kognitio’s markets Memory (RAM) Random Access Memory (RAM) is referred to simply as ‘memory’ by Kognitio and is a form of memory that provides random access to data. Data does not persist in RAM when power is lost Kognitio is an in-memory (RAM) analytical platform. As such Kognitio gains its performance advantage over disk based environments when tables or images are stored in RAM as the data is kept close to the CPUs to reduce query and loading latency
  • 4. Glossary of Terms Node A modular unit of a MPP architecture = a server (physical or virtual) Nodes form the basic units for constructing a Kognitio MPP instance NoSQL Databases Originally indicating that SQL was not used to query the environment, this has since been modified to become “Not Only SQL”. NoSQL databases were designed to handle Big Data ‘volumes, velocities and varieties’ and, as such, tend to provide less rigorous integrity and metadata handling than relational database management systems. Built for scale out, they are schema less and ‘eventually consistent’ (BASE) rather than ACID compliant. Kognitio is NOT a NoSQL database but is incorporating additional scripting languages to provide NoSQL capabilities. For business intelligence and ‘repeatable’ analytics on a defined dataset, a schema is considered to be a positive asset ODBC Open DataBase Connectivity is a standard API for accessing relational database management systems (RDBMS) ODBC is the standard approach for connecting to a Kognitio instance. The majority of BI tools will support generic ODBC connectivity and, hence, will likely be able to connect to a Kognitio instance. The exceptions tend to be OLAP clients, which will typically connect via ODBC or XML/A, or tools that utilise JDBC or REST interfaces ODBO OLE DB for OLAP is a Microsoft published standard mechanism for connecting to OLAP data sources via the MDX language. OLAP sources and clients may only adopt part of the standard which can lead to connectivity and processing issues. ODBO is a two tier architecture (client and server) Kognitio, via its partner Simba, has an MDX provider interface that can support ODBO connectivity. However, note that not all OLAP clients may necessarily be supported owing to the variability with which tools have incorporated the standard. OLAP OnLine Analytical Processing is a representation of a business intelligence model suitable for consumption by non-technical users. Typically data would be stored in ‘cubes’ that contain measures and hierarchical dimensions which are logically grouped in the manner that businesses reference them (e.g. a product hierarchy consisting of product group, sub-group, family, sub-family and product). Traditional cubes would be pre-calculated at intervals with aggregated measures stored at the various levels and combinations of the dimensions to facilitate very fast access. The cubes would be accessed by purpose built clients and, typically, by the specially defined MDX language Kognitio provides the facility to view the Analytical Platform via an OLAP model utilising connectivity software provided by Simba Technologies. Rather than pre-calculating OLAP cubes Kognitio utilises the performance characteristics of the platform to provide virtual cubes which eliminates the lengthy build times associated with OLAP OLTP (OnLine Transaction Processing) A class of system designed to manage transaction oriented workloads. An OLTP database will be specifically designed to manage data entered, produced or processed by a transactional system and, hence, is designed for the rapid insertion and updating of records within a table Whilst Kognitio can support OLTP associated workloads, it was designed for analytical workloads and, hence, is suboptimal for OLTP environments
  • 5. Glossary of Terms Parallel Processing The simultaneous use of more than one CPU or core to execute a program. Operations that can be performed in parallel will execute faster within a parallel computing framework (potentially proportionate to the number of cores/CPUs available). The overall effectiveness of the parallelism may be limited by tasks that are executed serially Kognitio has a strong parallel architecture and achieves its performance through parallelism across multiple nodes, multiple CPUs and associated cores. This enables Kognitio to provide linear scalability in line with increasing memory size and core counts Persistence Layer An area provided to ensure that data is maintained when a server/appliance is powered down, typically hard disk based. Data in RAM does not persist when the hardware is powered down so if data is required to persist it should be stored within this layer. For physical devices this will typically be local disk based. However, for AWS based instances this has to be managed in a different way as local storage is ephemeral, meaning that the disk drives are wiped when a server is terminated. EBS or S3 storage are typically used to provide persistence in AWS Private Cloud Provision of non-publicly available infrastructure on-demand – see public cloud. Private clouds typically provide additional certified standards compared with public clouds Kognitio provides its own infrastructure to clients, which is referred to as a ‘private cloud’. Provisioning is done on a term basis rather than on-demand but is maintained by Kognitio or its partners offsite for customer’s use rather than on-premise. Provides a facility to customers to get environments up and running quickly without up-front capital expenditure Public Cloud The publicly available provisioning of shared computing infrastructure. Typically this is achieved through virtualization and is generally provided on-demand with no upfront capital costs Enables Kognitio to provide access to a pre-configured appliance on-demand rather than in days (private cloud) or weeks/months (on-premise appliance). Kognitio uses Amazon Web Services (AWS) to provide this facility but, in principle, any provider could be used R language R is software and its associated syntax language for providing statistical computation and graphics. It is open source and has grown to become a standard for statistical processing with particularly high penetration in the academic world and, increasingly, the data science community Kognitio has recently added support for the R language via the external scripting capability in v8 RAID Redundant Array of Inexpensive/Independent Disks is a storage mechanism to combine multiple disks into a single logical unit. Data is distributed across the disks for the purposes of improved performance or resilience. There are several levels of RAID available which provide different performance and resilience characteristics. A Kognitio appliance uses RAID 1 (mirroring) to ensure that the appliance does not lose data should a node become unavailable. Racks Physical frameworks for holding an array of servers or blade enclosures specifically designed to be mounted within the framework Kognitio appliances utilise racks
  • 6. Glossary of Terms Rackmounts Independent, fully self-contained servers. These servers are generally larger than blades and can have more RAM, CPU, Disk, etc. Whilst they are typically housed in a rack, rackmounts provide flexibility over blades in that a limited number (up to three practically) can be stacked independently (with switching) to form an appliance without the need for rack infrastructure Kognitio appliances can be based on rackmounts as well as blade servers. For certain applications, rackmounts can provide cost advantages over blade servers (e.g. small appliances up to 768Gb RAM) RAM Only Temporary Tables (ROTT) A table in Kognitio RAM with no associated storage (protection) in the persistence layer. This means that, whilst the structure is persistent, the data is ephemeral ROTTs are used for non-persistent workloads. For example, they provide the highest potential load speeds for data that needs to be processed before it is persisted. The alternative, tables and table images, would involve writing to disk with the resultant delay. Failure of an appliance will result in loss of data held in ROTTs Referential Integrity This is the process of ensuring that the data entered in a column is valid. For example, in a relational table, a column may be specified as a foreign key (i.e. the data must exist in another table) in which case, at data load time, this constraint will be checked before the data is entered. Failure of the constraint will result in the data not being entered Kognitio is fully ACID compliant and supports referential integrity. However, tables need to be in RAM to perform this task and the process has a severe impact on load performance since it results in a full table scan for each referential integrity check. Careful consideration needs to be given to application design implications Scale-up To increase the size of a server through the addition of new resources (CPU/memory) Many databases can only utilise single servers, so the ability to incorporate greater resources is necessary for them to address larger data sets. However, there are cost implications to scaling-up (e.g. larger memory DIMMS tend to be considerably more expensive) and limitations to the data set sizes that can be addressed. Kognitio can fully utilise the resources available in a scaled-up environment Scale-out To increase the size of an appliance through the addition of more nodes Whilst utilising scaling-up to increase data sizes addressable by databases is common, it is less common to be able to do this by scaling-out. Kognitio addresses larger data sizes through scaling-out. It can often involve less capital outlay to have several smaller nodes than one very large server and the size limitations of a single server are removed S3 (Simple Storage Service) A cost effective, secure and highly available file storage area available on AWS cloud infrastructure. Provides the facility to stage data files ready to load into a Kognitio Analytical Platform. Also provides an environment to store readily available backups and associated files. Kognitio, in v8, has a connector that can map external tables onto S3 and load the entire file into RAM SQL (Structured Query Language) SQL is a language designed to manage and query data held in a relational data store SQL is the standard used for querying the Kognitio Analytical Platform.
  • 7. Glossary of Terms Switch A network switch enables the linking of multiple network devices Kognitio appliances require the cooperative processing of multiple nodes. As such, switches are required to facilitate the flow of data/message passing between nodes. Note: for appliances involving two nodes, no switching is required as the nodes are linked peer to peer. Table Image A Kognitio table that is simultaneously available in RAM and on disk. The table may be completely or partially (only selected columns or rows) represented in RAM Table images enable both performant queries and persistence Time to Insight The time taken from the point at which the data of interest is generated in an operational system to the point at which it has been analysed. This involves several aspects:  Volume of data  Velocity of data  Network speed  Need to move data  Load speed  Query speed Kognitio has the ability to ingest and query data very quickly (not just query). As such, Kognitio’s time to insight is considerably lower than many other competitive products such as those which rely on accelerative structures (OLAP, indexes, columnar) to provide acceptable query performance (as this impacts on load speed) Transactional Workloads A transactional, as opposed to analytical, workload is one that involves a large number (compared to analytical workloads) of small processes that may involve locating, inserting, updating or deleting rather than querying data. Transaction speed and referential integrity are critical to this workload Whilst Kognitio can support transactional workloads, it has been designed to manage analytical workloads. As such, for transactional environments, it is highly likely that OLTP databases will more appropriately fulfil the requirement View Image An in-memory instantiation (copy of results) of a view in Kognitio. At the point of instantiation, processing (such as joins, groupbys, etc.) associated with the view is undertaken and the results physically stored in RAM View images considerably enhance the performance of queries where the views are used repeatedly as the processing in the view only needs to be carried out once. Allows different representations of common underlying data XML/A XML for Analysis is a published standard mechanism for connecting to analytical data sources such as OLAP (via the MDX language) and data mining. XML/A is a three tier architecture (client, mid-tier and server) enabling the caching of results to be incorporated which can considerably increase the speed of satisfying common user community queries Kognitio, via its Simba Technologies developed MDX provider, can support XML/A connectivity to OLAP objects. However, note that not all OLAP clients may necessarily be supported owing to the variability with which tools have incorporated the standard. Kognitio’s implementation incorporates a caching tier that can enhance query and concurrency performance.